Press enter to see results or esc to cancel.

Top 5 OCR APIs

OCR – Optical Character Recognition – is a useful machine vision capability. OCR let’s you recognize and extract text from images, so that it can be further processed / stored. This is very useful for processing scans / pictures of text – for instance, when working with invoices, scanned forms and signage.

We’ve looked at several APIs for OCR, evaluating them based on:

  • Accuracy – we tried them all with the picture bellow to make sure they clearly recognize the text.
  • Price – we outline the price per call of the different APIs.
  • Special capabilities – some of the API we’ve covered have special capabilities, making them more well suited for specific tasks like scanning invoices / recognizing logos.
  • We used the following image to try out the API as it contains a lot of text in different styles & sizes, as well as some graphics that could confuse the API.

    Advantages of Using Public APIs

    Microsoft Computer Vision

    https://rapidapi.com/microsoft-azure/api/Microsoft%20Computer%20Vision/functions

    The Microsoft Computer Vision API is a comprehensive set of computer vision tools, spanning capabilities like generating smart image thumbnails, recognizing celebrities in images and describing the content of images using AI.

    Accuracy

    The Microsoft API offers two OCR endpoints: OCR from image file and OCR from image URL. Both endpoints work the same, with the different sources.

    The text recognition works well, and returns the text divided into regions of text. Each region has lines, and each line has words, which contain the actual text. The division is convenient for understanding the structure of the content in the image, though if you just need the text as one large string and don’t care about positioning, it’ll require more code.

    Microsoft Computer Vision API

    Price

    The free tier for Microsoft’s API will give you 5,000 requests per month. The API has 3 paid plans:

    • $19.90 -> 15,000 requests / month
    • $74.90 -> 70,000 requests / month
    • $199.90 -> 200,000 requests / month

    SemaMediaData

    https://rapidapi.com/SemaMediaData/api/Image%20OCR

    This API is a dedicated OCR platform, with a single function – Image OCR. It also has a “sister” API – Video OCR – which is optimized for extracting text from videos (more on that later).

    The SemaMedia API also requires manually setting the language with each request (using the lang parameter).  In scenarios where the language is known this should actually improve the accuracy, as it lets the API compare the recognized words with the dictionary (when using the df=True option).

    Accuracy

    The API handled the supplied image very well. It returns an array of results, each a region of text with a position in the image, as well as the text result.

    SemaMediaData API

    Special Features

    The SemaMedia platform also supports video OCR with the Video OCR API. According to the docs, video OCR is an analysis cascade which includes video segmentation (hard-cut), video text detection/recognition, and named entity recognition from video text (NER is a free add-on feature). The analysis result of this method enables automatic video retrieval and indexing as well as content-based video search in video archives. A detailed example can be found in our demo website.

    Price

    The free tier for SemaMedia’s API will give you 100 requests per month. The API has 3 paid plans:

    • $50.00 -> 2,200 requests / month
    • $200.00 -> 13,500 requests / month
    • $500.00 -> 40,000 requests / month

    Taggun

    https://rapidapi.com/Taggun/api/Taggun

    The Taggun API is a unique OCR API, targeted directly at scanning invoices and receipts. This can be useful as the API not only recognizes the text in the image, it also recognizes the structure of the invoice and returns parsed data like totalAmount, taxAmoumt, merchantName etc…

    Taggun API

    Accuracy

    Calling the simple receipt processing endpoint, the API returns an accuracy score with each piece of information returned. Sometimes, that’d be 0 and the information would be missing. However, when the information is there, it is usually accurate.

    The label by label accuracy can be used to ask users for fields that are not properly recognized in the scanned invoice.

    Price

    The Taggun API has a free plan that includes 50 requests per month, and a paid plan costing $90 that includes 1,000 monthly requests.

    Cloudmersive

    https://rapidapi.com/cloudmersive/api/Cloudmersive%20OCR

    The Cloudmersive OCR API is a nifty tool for simple text extraction from images. It has only one endpoint – Image to Text , and returns all the text in the image as one string rather than by regions. This can be useful when transcribing a big blob of text (from a book / paper), and only the text itself is needed.

    Cloudmersive API

    Accuracy

    The API was pretty accurate, and successfully transcribed most words in the document.

    Price

    The free tier for the Cloudmersive API will give you 50,000 requests per month. The API has 3 paid plans:

    • $ 19.99 -> 100,000 requests / month
    • $ 49.99 -> 250,000 requests / month
    • $ 99.90 -> 500,000 requests / month

    Google Cloud Vision

    https://rapidapi.com/stefan.skliarov/api/GoogleCloudVision/functions/detectText

    The Google Cloud Vision API is a comprehensive machine vision platform, with capabilities beyond OCR such as face recognition, image labeling and landmark detection (detecting natural / man made landmark in images).

    Accuracy

    Using the /detectText endpoint with the supplied image, the API identified the text well. The response contains a textAnnotation field which has the different word segments in the image, with their text and location. This can be very handy for highlighting specific words in the image (for instance highlighting brand names / words from a list).

    Google Cloud Vision APIGoogle Cloud Vision API

    The API also returns a fullTextAnnotation field which contains the entire text in the image as a single string, as well as the detected language of the document.

    Price

    The API includes 1,000 free API calls per month, and charges $1.5 for each subsequent 1,000 requests (as of April 2018).

    Special Features

    The Google Cloud Vision API also has an OCR-related endpoint called /detectLogos . Given an image that contains brand logos, this endpoint could identify the brands they belong to. During our testing, this endpoint easily identified logos for top brands.

    Summary

     

    Auto-detect language

    Text by regions

    Text annotation (all text as one string)

    Requests in Free Tier

    Est. price per call

    Google Cloud Vision

    Yes

    Yes

    Yes

    1,000

    $ 0.0015

    Sema Media Data

    No

    Yes

    No

    100

    $ 0.013

    Taggun

    Yes

    No

    Yes (invoices)

    50

    $ 0.09

    Cloudmersive

    Yes

    No

    Yes

    50,000

    $ 0.0002

    Microsoft Computer Vision

    Yes

    Yes

    No

    5,000

    $ 0.001

Comments

Leave a Comment

Tell us your thoughts!

Spread the API ❤️