Evernote Tech Blog

The Care and Feeding of Elephants

How Evernote’s Image Recognition Works

Evernote’s ability to search for text within images is a popular feature. In this post, I’ll describe how the process works and answer some frequently-asked questions.

How images are processed

When a note is sent to Evernote (via synchronization), any Resources included in the note that match the MIME types for PNG, JPG or GIF are sent to a different set of servers whose sole job is performing Optical Character Recognition (OCR) on the supplied image and report back with whatever it finds. These results are added to the note in the form of a hidden—that is, not visible when viewing the note—metadata attribute called recoIndex. The full recoIndex node is visible when a note is exported as an ENEX file.

For example, I dug around and found an old note in my account containing only a single photo of a bottle of beer:

When I export this note as an ENEX file—a portable XML export format for Evernote notes—and jump to the bottom of the file, I’ll find the recoIndex element. Contained within recoIndex are a number of item nodes. Each item represents a rectangle Evernote’s OCR system believes to contain text.

Each item contains four attributes: x and y indicating the coordinates of top-left corner of the area represented by the item, as well as w and h representing the width and height of the item.

As an image is evaluated for textual content, a set of possible matches is created as child elements to their corresponding item. Each match is assigned a weight (represented by the w attribute of the item): a numeric value indicating the likelihood that the given match text is the same as the text in the image.

The OCR results are embedded in the note, which is subsequently synchronized back to the user’s client applications. At this point, the text found in the image is available for search.

Here’s a portion of the recoIndex element found in the note shown earlier which contains item and t (match) elements. You’ll notice that most of the item elements have multiple t elements and each is assigned the weight value we described earlier. When a user issues a search within an Evernote client, the content of the t elements is searched:

longhammer.enex______Desktop__-_VIM2_and_Evernote

How PDFs are processed

Evernote’s OCR system can also process PDF files, but they’re handled differently from images. When a PDF is processed, a second PDF document that contains the recognized text is created and embedded in the note containing the original PDF. This second PDF is not visible to the user and exists only to facilitate search. It also doesn’t count against the user’s monthly upload allowance.

For a PDF to be eligible for OCR, it must meet certain requirements:

  1. It must contain a bitmap image
  2. It must not contain selectable text (or, at least, a minimal amount)

In practical terms, this eliminates many PDFs generated by other applications from text-based formats, such as word processors and other authoring applications. PDFs that are generated by hardware scanners generally meet the above requirements. If the scanner software performs its own OCR on the PDF, it won’t be processed by Evernote’s OCR service.

If you export a note containing a PDF that has been processed by the OCR system, there will be two nodes in the document: data and alternate-data. The data node contains a base–64 encoded version of the original PDF and the alternative-data represents the searchable version of the same PDF.

Common questions

What kind of text can be recognized?

Anything that the OCR system believes to be text. Typewritten text (e.g., street signs or posters) and handwritten notes (even if your handwriting isn’t the neatest in the world) are both evaluated by the OCR service, provided the service can detect them.

The orientation of the text is a factor, as well. Text found within an image will be evaluated if it matches one of the following orientations within a few degrees:

  1. 0° — normal horizontal orientation
  2. 90° — vertical orientation
  3. 270° — vertical orientation

Text that does not match one of these orientations will be ignored (including diagonal and inverted text).

It’s important to remember that no OCR system is perfect and it’s possible that text you expect to be recognized may not be. That said, the OCR engine is being constantly refined and tuned for better accuracy.

Can Evernote’s OCR be used to create a text version of an image that contains text?

No. As described before, the matching done by the OCR system doesn’t produce one-to-one matches. Rather, there will usually be several potential matches for a given rectangle containing text and many of them will be inexact.

How long does it take for an image to be processed by OCR?

When a user syncs a note containing an image, the image is sent to the aforementioned group of servers for OCR processing. The system is queue-based, meaning the submitted image takes its place in line and will be processed after all other images ahead of it in the queue. Images synced by Premium users, however, are moved to the front of the queue ahead of all images synced by free users.

As to how long it will take, this depends on the size of the queue when the image is sent for processing. For Premium users, image processing generally completes within a few minutes (though, it can take longer in some instances). For free users, the wait can be substantially longer if there is a large number of images in the processing queue.

How many languages does the Evernote OCR systems support?

Currently, Evernote’s OCR system can index 28 typewritten languages and 11 handwritten languages. New languages are added regularly and existing languages are optimized and improved. Users can control which language is used when indexing their data by changing the Recognition Language setting in their account’s Personal Settings.

Can I use the Evernote API only to OCR images?

No. Using Evernote’s API only for the OCR capabilities is a violation of the API License Agreement.

Where can I learn more about the infrastructure that powers Evernote’s OCR system?

We have published two articles to the Evernote Tech Blog that outline the recognition architecture and processes in greater detail:

  1. When I see the way the suggestions are stored in the RecoIndex, it keeps me wondering if there is a way to “help” the system to use the right word. In your example, it would be interesting if the bottle of beer comes up in a search for “mono” and the user gives a “thumbs down” or presses an x on the image to let the system know the result doesn’t match the question.
    And it can be the other way around. Why not make an app that shows an image with the rectangle the OCR found and the user can type in (or choose) the correct word that comes with it. It could be a game to play when bored, implemented in captcha systems etc.

    Just some thoughts.

  2. @Frank: not a game.
    But if you could help the recognition system by accepting/refusing suggestions, a database of “known” keywords could be created on EN server and attached to your EN account:
    * this could speed up next recognitions of similates words (because there are great statistical chances than you often ocrized the same kind of notes)
    * this could “specialize” recognition pattern for non-english account..
    *..and more …

    All the OCR apps have this “teaching period” and if well performed, this can drastically speed up following ocrs

  3. So, can Evernote’s OCR capabilities be taken advantage of by blind or low vision users to scan and thus, read the contents of a document ported into a note?
    Traditional blind-specific OCR software costs over 1K, so having Evernote as a potential alternative would be huge. Not to mention, another incentive for people who rely upon screen-access software to purchase an Evernote Premium subscription.

  4. I have noticed that the OCR is missing a number of acronyms I have in both photos and handwritten notes (Livescribe generated). I work in a technical field with a mountain of acronyms, so this is a significant concern for me. Is the failure to find acronyms due to them not being in the OCR dictionary? If so, then could the dictionary be supplemented by looking at a user’s tags or a custom teachable dictionary?

  5. Is it possible that images which exceed a file size threshold are not getting OCR’d?
    I am having a 11MB jpg in my account for a while now and it is still not being indexed.

  6. Is Evernote’s OCR home-grown or just Adobe’s OCR? Who has the most accurate OCR?

  7. I want to extract text from the image using android device without sending it to the server.
    Can any one Help Me ?

  8. Thank you Brett, great explanation.

    What really baffles me is, why Evernote doesn’t allow users to replace the ‘original’ PDF with the ‘alternate’ PDF (with the embedded OCR result).

    A typical use case would be ‘scanning’ a document, by taking a snapshot with Evernote on a smartphone and choosing afterwards to access the OCR’ed PDF in order to copy text out of it. Even if searchable, the inability of having access to the embedded OCR’ed text is really frustrating.

  9. Hi. Great post explaining the frustration I (and perhaps others) have experienced in trying to work out why pdf’s imported into Evernote aren’t always searchable. I’m now exporting, OCRing and the importing the results back into Evernote to see what changes. It’s been a long job gettting to the bottom of this……and it works perfectly. Thankyou.


Leave a Comment

* Required fields