Operations

Even Grittier Details on Evernote’s Indexing System

Posted by Dave Engberg on 01 Nov 2011

Posted by Dave Engberg on 01 Nov 2011

Comment

Alex’s earlier article on Evernote’s image recognition component touched on a lot of its service-level functionality — what it is, how it works, and what it provides in relation to the Evernote platform as a whole. In this post, I’ll take you through some of the more systems-level concepts underneath this technology. [Nerd alert: I’ll be quickly running through a lot of specs and technical details without much concern for your well being. If this kind of stuff isn’t your cup of tea, please look at this picture of a cute little duckling instead with my apologies.]

HARDWARE

The Evernote image recognition service is essentially a singularly tasked compute cluster, so performance and efficiency were both driving factors when evaluating hardware. After trials with few different hardware platforms, we’ve settled on the iX1204-563UB by iX Systems. This is essentially a VAR packaged SuperMicro X8DTU coupled with the 815TQ-563UB chassis. Each of the 37 image recognition systems in the cluster are equipped as follows:

  • CPU: [2x] Intel(R) Xeon(R) CPU L5630 @ 2.13GHz (40W max TDP)
  • Motherboard: Supermicro X8DTU
  • Chassis: Supermicro 815TQ-563UB
  • PSU: [1x] 560W (80Plus Gold certified efficiency rating)
  • Storage: [1x] Low-power 5.25″ HDD
  • RAM: 12GB PC3-8500 (1066MHz)

CPU, RAM, and their ilk were chosen as a compromise of throughput and efficiency. We’d previously evaluated some denser 2U Twin² systems, but found them less than reliable under the consistently heavy workload they were tasked with. Traditional blades were also considered, but ultimately ended up being a bit too difficult to squeeze into our existing infrastructure — especially at the 100% saturation point they’d frequently reach.

OPERATING SYSTEM

Under the hood, the operating system is a very bare-bones bootstrap installation of Debian “Squeeze” (pure AMD64). Debian was chosen for its stability and ease of in-place upgradeability. The OS stack itself is fairly vanilla with a few notable exceptions:

  • Custom 3.0.4 kernel, tuned for throughput, with cflags targeted at our specific flavor of CPU
  • XFS filesystem with relatively large buffer space and things such as ‘barriers’ and ‘atime’ disabled
  • Network stack tuned to smoothly handle many parallel file transactions
  • Kernel ‘swappiness’ set to zero (from the default of 60)
  • OS-level 802.1Q trunking of network port (more on this later…)

The idea is to minimize bottlenecks wherever possible in order to free up the image recognition stack to do its thing. Kernel tuning has a surprisingly high impact in this particular case, with a 7-30% performance improvement over stock, depending on various conditions. As for XFS, it gives us the ability to minimize IO contention on a single-disk volume at the cost of a little extra RAM and additionally the capability to do filesystem reordering on the fly.

SOFTWARE

Evernote’s image recognition stack is made up of in-house software for queue handling and image processing, along with a set of image recognition engines to handle various types of text. This includes both in-house engines and also best-of-breed third-party technology from I.R.I.S. The in-house portion of the code is composed of AMP, or Asynchronous Media Processor, and ENRS, which is the Evernote Recognition Service. Since the details of the software stack are already covered in some detail in Alex’s Evernote Indexing System article, I’ll merely present a brief outline:

  • ENRS, with the help of its “AIR” child processes, is the engine by which the actual image recognition occurs
  • AMP acts as the arbiter between the Evernote service cluster and ENRS, grabbing unprocessed images as they become available and feeding them to ENRS

Inter-server AMP chatter is mitigated to its own broadcast domain, with enforced isolation via the 802.1Q tagged VLAN I mentioned earlier. This allows reco servers to tell eachother on which shards they’ve already found work without unnecessary redundancy. By preventing such overlap in the polling mechanism, incessant hammering of the primary Evernote service is largely mitigated.

I hope this has provided some level of insight to one of the more unusual aspects of the Evernote service. It’s been tricky to provide a decent level of detail on this topic without writing a novella in the process. If you’ve found that you have more questions now than when you first began reading, please feel free to detail them in the comments section below.

View more stories in 'Operations'

4 Comments RSS

  • Rich Beales

    I love the technical details and openness of these posts, I don’t think any other company does it. Keep it coming!

  • Brock Palen

    I host an HPC podcast that is getting into a lot of big data issues. I think evernote is a great example of some of these issues in a real-world product.
    if the evernote team wants to be featured on the show please contact me on the form on my website or on Twitter @brockpalen

  • Awesome-O

    I noticed you don’t have any redundancy in the PSU or the hard disks. Why not?

    • Chris Wadge

      @Awesome-O: Good question. The entire reco service is redundant by design; if one machine drops off, the others quickly pick up the slack. Thus we can get away with less robust hardware, which saves a bit in terms of both money and energy consumption.