Stay Classified

Posted by Mark Ayzenshtat on 22 Jan 2013

Posted by Mark Ayzenshtat on 22 Jan 2013


By Mark Ayzenshtat, Head of Data Products at Evernote

Many Evernote users are (rightly) obsessed with food: where to find it, how to prepare it, and how to best enjoy it. We count ourselves among the obsessed, and to that end, we recently released the shiny new 2.0 version of the Evernote Food app.

Among its features, the new Evernote Food includes a “My Cookbook” section that collects any recipes you’ve clipped or typed into Evernote in one place. Some users put all of their recipes into a specific notebook or assign them a specific tag, while others don’t follow any particular convention. We wanted “My Cookbook” to just work for all users. In other words, no matter how you choose to group or categorize your notes, Evernote should be able to automatically identify which ones are recipes, pluck them out, and show them to you at the right time.

In this post, I’d like to give you a behind-the-scenes look at recipe classification, the new note classification pipeline that makes it possible, and what it means for users, partners, and third-party developers.

Recipe Classification

Separating recipes from non-recipes is a special case of what’s known as a supervised learning problem. It’s “supervised” because the classifier has to be fed examples of recipes and non-recipes, usually prepared manually, before it can learn how to automatically tell them apart. Most of us already encounter supervised learning daily whenever we check our email: a spam classifier, trained on numberless examples of canny solicitations from Nigerian princes, keeps the junk out of our inboxes.

For finding recipes in your Evernote account, we encountered a number of specific challenges. Good food has universal appeal, and we wanted to launch “My Cookbook” in many different languages to start: English, Chinese, Japanese, Korean, Spanish, French, Italian, German, Portuguese, and Russian. Not only does the cuisine in the places where these languages are spoken vary a great deal, but so does what counts as a recipe. For example, most Chinese recipes omit specific measurements and quantities — instead of a teaspoon of sugar or half a cup of flour, a Chinese recipe may simply instruct you to use “the right amount”. (I always imagine the recipe author delivering this phrase with a knowing wink.)

Then there’s the matter of actually gathering representative training data and preparing it for use. Here, we faced an additional challenge: how to get data that was representative of real-world notes. One way would be to peek at users’ private notes until we had the data we needed, but respect for user privacy is paramount at Evernote, and so we never considered this approach. In the end, we went about gathering the recipe and non-recipe data in two different ways. For recipes, we enlisted speakers of supported languages to clip recipes in their native language, targeting a variety of different online recipe sources. For non-recipes, we started with a random sample of Evernote notes published to public notebooks, then enlisted native speakers to pluck out any recipes that happened to make it in.

The Note Classification Flow

Now, a bit on what’s happening under the hood. Building a classifier is typically an iterative process of exploring the data, selecting the features (the attributes of the data believed to be predictive in some way), training the models, and finally evaluating them. For many of these tasks, we relied on the excellent scikit-learn package for Python. Currently, we use a variant of Naive Bayes classification to do our prediction. We picked Naive Bayes because it’s simple, fast, and performs quite well on this kind of classification problem despite a few limitations arising from its simplicity. A trained Naive Bayes model is basically a giant table in which each row consists of a single feature (e.g., the phrase “teaspoon of sugar”), followed by a list of float values representing how likely we are to encounter that feature in a note belonging to each output class (e.g., “recipe” or “non-recipe”). To classify a note, we extract its feature values, look them up in the table, and keep sums of the corresponding probabilities. What we’re left with is an overall measure of how likely the note is to belong to either output class.

Once we train the models, we load them via home-grown Java code that runs in the Evernote service backend. During any createNote() or updateNote() API call, the note passes through a gauntlet of different classifiers before being persisted. For recipe classification, we use a two-step approach. First, we attempt to automatically guess the note’s language from its content. An Evernote user can have notes in many different languages (indeed, any one note can contain text in many different languages). Once we determine a note’s predominant language, we pass it to a language-specific recipe model, trained to capture the quirks and idiosyncrasies of recipes in that particular language/locale. Finally, if the model believes the note to be a recipe, we save this classification as part of the note metadata. This process takes only a tiny fraction of a second to complete, and there should be no delay perceptible to the end user.

Where to Find Classifications

Like everything else in an Evernote account, all classifications assigned to a note are accessible via the Evernote API. Note classifications are stored in a note’s NoteAttributes.classifications field, a map of string keys to string values. A key in this map represents the type of classification. For now, the only classification used by Evernote apps is the recipe classification (identified by the key “recipe”) but the data model is extensible to allow us to classify all sorts of notes.

The classification value for recipes is one of the constants that begin with CLASSIFICATION_RECIPE_. For example, CLASSIFICATION_RECIPE_SERVICE_RECIPE means “the Evernote service believes this note to be a recipe”. The constants that begin with CLASSIFICATION_RECIPE_USER_ signify that the end user, as opposed to the Evernote service, designated a note as having a particular classification. If the classifier sees a note with this type of user-assigned classification, it will never attempt to overwrite it. This allows a user to take control when we misclassify a note. For now, Evernote Food allows you to tell us that a note shown in “My Cookbook” isn’t a recipe. In the future, we’ll give you ways to mark a note as a recipe if our classifier misses it.

The classifications map is extensible, and all classifications are public, so any third-party app can examine whether we believe a note to be a recipe. To find all of the notes with a recipe classification in a given user’s account, simply use the search API to search for “any: classifications_recipe:001 classifications_recipe:002″.

Here is how this could work in Java:

Java sample code

private void findRecipes(NoteStore.Client noteStore) throws Exception {
    NoteFilter filter = new NoteFilter();
    filter.setWords("any: classifications_recipe:" +
        " classifications_recipe:" +

    NotesMetadataResultSpec spec = new NotesMetadataResultSpec();
    NotesMetadataList result = noteStore.findNotesMetadata(authToken, filter, 0, 100, spec);

    System.out.println("Found " + result.getTotalNotes() + " recipes in " + elapsed + "ms");
    for (NoteMetadata note : result.getNotes()) {

We are looking forward to seeing how developers take advantage of this new functionality.

View more stories in 'API'

2 Comments RSS

  • Surely there are better classifiers than the Näive Bayes approach. Do you also intend to share a limited amount of recipe information to test and develop other classifiers?
    The idea is really cool!

    • Mark Ayzenshtat

      Vikram — glad you like it!

      Many classification techniques are more sophisticated than Naive Bayes, but that doesn’t necessarily make them better. As always, it’s about finding the right tool for the job. For this application (mostly a text classification problem), Naive Bayes performs great and has fewer moving parts than the alternatives. Also, gathering great training data and selecting the right features are challenges in their own right and probably affected the final user experience much more than the choice of classification algorithm.

      We currently don’t plan to share the recipe training data, but we’d definitely like to have developers build other kinds of classifiers over time, and the “classifications” field in the API was added with this in mind.