Evernote Tech Blog

The care and feeding of Elephants

Shard Boiled

In our architectural overview post last May, we gave a high-level description of the “shard” servers we use for both data storage and application logic. Since Evernote is a personal memory service rather than a social network, we can easily partition individual user data onto separate shards to allow for fairly straightforward linear scalability. Each pair of these “shard” servers runs two virtual machines:

Old Shard Architecture

Each of these shard virtual machine stores transactional “metadata” within a MySQL database operating on a RAID-1 pair of 300GB Cheetah 15krpm drives. A separate RAID-10 of 7200rpm Constellation 3TB drives is split into partitions for storing large files and per-user Lucene text search indices. The paired virtual machines replicate each of these partitions from the current “Primary” half to the current “Secondary” VM using synchronous DRBD.

These shards have enough storage and IO capacity to comfortably handle 100,000 registered Evernote users for at least 4 years, with empty drive bays in their 4U chassis to expand later as needed. Adding dual L5630 processors and 48GB of RAM brings the cost of each box up to around $10,000 with a power draw of around 374 watts each. I.e. around $0.10 of hardware and 3.7mW per registered user.

Room for Improvement

This generation of shard hardware has given us pretty good price/performance, with the extremely high level of data redundancy that we require for our users. But we did find several areas where this design wasn’t ideal for our purposes. For example:

  1. The 15krpm drives for the MySQL database are usually 95% idle since InnoDB does a good job with caching and IO serialization, but we hit occasional bottlenecks when users with huge accounts first access their data. If their metadata isn’t already in the RAM buffers, the random IO workload may grind the disks heavily.
  2. The Lucene search indices for our users generate a lot more IO than we expected. We see twice as many read/write operations on the Lucene partition as we do on the MySQL partition. This is largely caused by our usage patterns: every time a note is created or edited, we need to update the owner’s index and flush the changes to disk so that they will take effect immediately.
  3. DRBD is great for replicating one or two small partitions, but it’s a hassle for large numbers of big partitions per server. Each partition needs to be independently configured, managed, and monitored. Various problems may occasionally force a full resync of all volumes, which may take many hours even over a dedicated 1Gbps crossover cable.

These constraints were the main factor that limited the number of users we were willing to assign to each shard. Improving the manageability and metadata IO performance would allow us to increase our user density safely. We’re addressing these issues in our next generation of shards by moving metadata storage onto SSDs and moving the logic for redundant bulk file storage out of the OS and into our application.

Shard Target

Our new design replaces the racks of ten 4U servers with racks that mix fourteen 1U metadata+app shards with four 4U bulk file storage servers.

The 1U shard heads have a pair of simpler virtual machines that each use a single partition on a single RAID-5 of 300GB Intel 320 SSDs. Those two partitions are replicated with DRBD, and the VM image is only run on one server at a time. The SSD drives are overprovisioned to 80% capacity to significantly increase write endurance and IO throughput. We include a hot SSD spare with each box rather than using RAID-6 to avoid that level’s additional 15% performance loss since rebuild times will be short and DRBD replication gives us coverage for hypothetical multi-drive failures.

Bulk resource file storage has been moved from local disks in the primary servers into pools of dedicated WebDAV servers running large RAID-6 file systems. Whenever a new resource file is added to Evernote, our application synchronously writes a copy of that file to two different file servers in the same rack before the metadata transaction completes. Offsite redundancy is also handled in the app, which replicates each new file to an offsite WebDAV server through an asynchronous background thread.

Results

This new design has enough IO capacity and storage to handle 200,000 users per shard head for at least four years of use. The rack of 14 shards and 4 file servers cost around $135k and draw around 3900 watts, which works out to about $0.05 and 1.4mW per user.

This reduces the number of future servers and per-user power consumption by 60% on the primary servers. Factoring in power consumption from other service components (switches, routers, load balancers, image processing servers, etc.), we estimate an overall 50% reduction in per-user power draw compared to our previous architecture. All of this translates into long-term reductions in our hosting costs.

I’m hesitant to make claims that smell like corporate greenwashing. [Insert picture of Phil hugging a baby harp seal.] But this 50% reduction in per-user power consumption will also reduce Evernote’s carbon footprint by a proportionate amount.

In addition to these specific savings, the process of evaluating and testing solutions has given us a better understanding of the various components and technologies that we’re using. We’ll plan to do a few more posts with some of the details of our testing and optimization for SSD RAIDs, Xen vs. KVM IO throughput, DRBD management, etc. Since some of our results are a bit counter-intuitive, we hope to provide some useful information for other folks building storage-intensive services.

24 Comments

Security enhancements for third party authentication

The security and privacy of our users’ data is our top priority. This is reflected in our three laws of data protection, and it’s reflected in the way that we design our service and the products that access it. As a result, our users trust us, which is one of the reasons that we’ve been so successful.

Since we launched the Evernote API in October of 2008, we’ve allowed third party applications to authenticate to Evernote the same way that our applications do – by collecting a user’s Evernote username and password and sending them to our web service. Username and password authentication is easy for developers to implement, but it’s not great from a security perspective. Today, we’re making some big changes to improve the security of apps built on our API, starting with a transition from username and password authentication to OAuth.

We are now requiring all new applications to authenticate to the Evernote service using OAuth, a standard authorization protocol used by Google, Twitter, Dropbox and most other major web service providers. We will no longer activate applications on the production Evernote service if they use username and password for authentication. The Evernote service has long supported OAuth, and now we’re making it mandatory.

Developers have until November 1, 2012 to modify existing applications that authenticate using username and password. At that time, we will cut off third party access to the UserStore.authenticate function. We will email developers who hold “client” API keys (those that authenticate via username and password) this week to let them know about this change, and again in September if they have not converted their application to OAuth.

Most developers are familiar with OAuth from working with other APIs, but we recognize that properly implementing an OAuth client is more work than simply prompting for the user’s Evernote username and password. To make the transition easier, we’ve taken two steps.

First, developers who are simply experimenting with the API or scripting access to their own personal account can obtain a developer token. These tokens allow a developer to access their account through the API without any additional authentication. Developer tokens make it easy to get started learning the Evernote API or automating actions for your own account. To learn more about developer tokens, visit dev.evernote.com.

Second, we’ve added OAuth functionality to our iOS and Android SDKs, which we’ve published on GitHub. The new SDK functionality implements the entire OAuth flow and can be plugged into an application by simply copying and pasting a few blocks of code. The SDKs also include sample applications that demonstrate how to use the OAuth functionality. Our SDKs for PHPPython and Ruby contain sample code showing how to use popular OAuth libraries to authenticate to Evernote. We’ll be releasing SDKs and sample code for other platforms and languages over the next few weeks.

Full documentation of Evernote’s OAuth provider is available on dev.evernote.com. As usual, our developer relations team is available to answer any questions. If you have trouble implementing OAuth, please let us know. We’re here to help.

2 Comments

Building apps, Metro style

A bit over a year ago we started working on Evernote for Windows Phone 7. The first task was to get our C# SDK working in Silverlight so that we could access the Evernote API. Our API is built on Apache Thrift, and the Thrift code generator and runtime for C# used .NET’s synchronous HTTP stack. Silverlight, however, only supports asynchronous networking, so our lead Windows Phone 7 engineer Damian spent some time monkeying around and getting everything to work. The result, which he documented in a detailed blog post, is a C# code generator and runtime for Thrift that support both networking models. You can find this code in our C# SDK.

Fast-forward a year and we’re working on Windows 8 Metro style JavaScript apps that will access the Evernote API from managed C# code. Once again, we’re facing incompatibilities, this time between the “.NET APIs for Metro style apps” and the Thrift runtime and generated code. I thought that I’d share what we did to get it all working, if only so that I can remember when it breaks someday.

Continue reading

1 Comment

WhySQL?

When we describe our overall service architecture to smart people who have been involved in other big services, the two most common questions are:

  1. Why is your structured data stored in SQL databases instead of something like [big-data, web-scale, No-SQL platform X]?
  2. Why are you running your own hardware instead of hosting Evernote in [cloud service provider Y]?

These are both valid and interesting questions. I’ll start with #1 and save #2 for a future post.

For the right application, a modern key-value storage engine may offer significant performance or scalability advantages in comparison to a single SQL instance. There are a few reasons that we’ve decided to store all of your account metadata within a single (replicated) MySQL instance instead.

Electric Kool-Aid

First, the ACID properties of a transactional database like MySQL’s InnoDB are important for our application and synchronization model.

Here’s a little snippet of the database tables for storing “notebooks” and “notes” within a shard’s SQL database:

CREATE TABLE notebooks (
  id int UNSIGNED NOT NULL PRIMARY KEY,
  guid binary(16) NOT NULL,
  user_id int UNSIGNED NOT NULL,
  name varchar(100) COLLATE utf8_bin NOT NULL,
  ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8; 

CREATE TABLE notes (
  id int UNSIGNED NOT NULL PRIMARY KEY,
  guid binary(16) NOT NULL,
  user_id int UNSIGNED NOT NULL,
  notebook_id int UNSIGNED NOT NULL,
  title varchar(255) NOT NULL,
  ...
  FOREIGN KEY (notebook_id) REFERENCES notebooks(id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

If you create a notebook named “Cooking” on your Windows client and then immediately clip a recipe for “Quick Tomato Sauce” into that notebook, your client will do something like this on the next sync:

Each of these coarse-grained API calls is implemented through single SQL transaction, which ensures that a client can completely trust any reply given by the server. The ACID-compliant database ensures, for example:

Atomicity: If an API call succeeds, then 100% of the changes are completed, and if an API call fails, then none of them are committed. This means that if we fail trying to store the fourth image in your Note, there isn’t a half-formed Note in your account and incorrect monthly upload allowance calculations to charge you for the broken upload.

Consistency: At the end of any API call, the account is in a fully usable and internally consistent state. Every Note has a Notebook and none of them are “dangling.” The database won’t let us delete a Notebook that still has Notes within it, thanks to the FOREIGN KEY constraint.

Durability:  When the server says that a Notebook was created, the client can assume that it actually exists for future operations (like the createNote call). The change is durable so that the client knows that it has a consistent reflection of the state of the service at all times.

The Durability property is the most important for our synchronization protocol … if the client can’t assume that changes made on the server will be Durable, then the protocol would become much more complex and inefficient. Each synchronizing client would need to constantly double-check whether the state of each server object matched the local state. Maintaining absolute consistency for an account with 20k Notes, 40k Resources, and 10k Tags would be very expensive if changes couldn’t assume Durability.

Big Data?

The ACID benefits of a transactional database make it very hard to scale out a data set beyond the confines of a single server. Database clustering and multi-master replication are scary dark arts, and key-value data stores provide a much simpler approach to scale a single storage pool out across commodity boxes.

Fortunately, this is a problem that Evernote doesn’t currently need to solve. Even though we have nearly a billion Notes and almost 2 billion Resource files within our servers, these aren’t actually a single big data set.  They’re cleanly partitioned into 20 million separate data sets, one per user.

This extreme locality means that we don’t have one “big data” storage problem, but rather we have a lot of “medium data” storage problems that partition neatly into a sharded architecture.

But maybe later…

We’re very interested in all of the cool new data storage systems for future projects that don’t require strong ACID transactionality and do require horizontal scalability. For example, our reporting and analytics system has gradually outgrown its current MySQL platform and needs to be replaced with something bigger/faster/cooler.

But we’re relatively satisfied with sharded MySQL storage for Evernote user account metadata, even though that’s not going to win any style points from the cool kids.

31 Comments

Even Grittier Details on Evernote’s Indexing System

Alex’s earlier article on Evernote’s image recognition component touched on a lot of its service-level functionality — what it is, how it works, and what it provides in relation to the Evernote platform as a whole. In this post, I’ll take you through some of the more systems-level concepts underneath this technology. [Nerd alert: I'll be quickly running through a lot of specs and technical details without much concern for your well being. If this kind of stuff isn't your cup of tea, please look at this picture of a cute little duckling instead with my apologies.]

HARDWARE

The Evernote image recognition service is essentially a singularly tasked compute cluster, so performance and efficiency were both driving factors when evaluating hardware. After trials with few different hardware platforms, we’ve settled on the iX1204-563UB by iX Systems. This is essentially a VAR packaged SuperMicro X8DTU coupled with the 815TQ-563UB chassis. Each of the 37 image recognition systems in the cluster are equipped as follows:

  • CPU: [2x] Intel(R) Xeon(R) CPU L5630 @ 2.13GHz (40W max TDP)
  • Motherboard: Supermicro X8DTU
  • Chassis: Supermicro 815TQ-563UB
  • PSU: [1x] 560W (80Plus Gold certified efficiency rating)
  • Storage: [1x] Low-power 5.25″ HDD
  • RAM: 12GB PC3-8500 (1066MHz)

CPU, RAM, and their ilk were chosen as a compromise of throughput and efficiency. We’d previously evaluated some denser 2U Twin² systems, but found them less than reliable under the consistently heavy workload they were tasked with. Traditional blades were also considered, but ultimately ended up being a bit too difficult to squeeze into our existing infrastructure — especially at the 100% saturation point they’d frequently reach.

OPERATING SYSTEM

Under the hood, the operating system is a very bare-bones bootstrap installation of Debian “Squeeze” (pure AMD64). Debian was chosen for its stability and ease of in-place upgradeability. The OS stack itself is fairly vanilla with a few notable exceptions:

  • Custom 3.0.4 kernel, tuned for throughput, with cflags targeted at our specific flavor of CPU
  • XFS filesystem with relatively large buffer space and things such as ‘barriers’ and ‘atime’ disabled
  • Network stack tuned to smoothly handle many parallel file transactions
  • Kernel ‘swappiness’ set to zero (from the default of 60)
  • OS-level 802.1Q trunking of network port (more on this later…)

The idea is to minimize bottlenecks wherever possible in order to free up the image recognition stack to do its thing. Kernel tuning has a surprisingly high impact in this particular case, with a 7-30% performance improvement over stock, depending on various conditions. As for XFS, it gives us the ability to minimize IO contention on a single-disk volume at the cost of a little extra RAM and additionally the capability to do filesystem reordering on the fly.

SOFTWARE

Evernote’s image recognition stack is made up of in-house software for queue handling and image processing, along with a set of image recognition engines to handle various types of text. This includes both in-house engines and also best-of-breed third-party technology from I.R.I.S. The in-house portion of the code is composed of AMP, or Asynchronous Media Processor, and ENRS, which is the Evernote Recognition Service. Since the details of the software stack are already covered in some detail in Alex’s Evernote Indexing System article, I’ll merely present a brief outline:

  • ENRS, with the help of its “AIR” child processes, is the engine by which the actual image recognition occurs
  • AMP acts as the arbiter between the Evernote service cluster and ENRS, grabbing unprocessed images as they become available and feeding them to ENRS

Inter-server AMP chatter is mitigated to its own broadcast domain, with enforced isolation via the 802.1Q tagged VLAN I mentioned earlier. This allows reco servers to tell eachother on which shards they’ve already found work without unnecessary redundancy. By preventing such overlap in the polling mechanism, incessant hammering of the primary Evernote service is largely mitigated.

I hope this has provided some level of insight to one of the more unusual aspects of the Evernote service. It’s been tricky to provide a decent level of detail on this topic without writing a novella in the process. If you’ve found that you have more questions now than when you first began reading, please feel free to detail them in the comments section below.

Tagged , , , , , , , , , , , | 4 Comments

Evernote Indexing System

Evernote Indexing System is designed to extend Evernote search capabilities beyond text documents into media files. Its task is to peruse through those files and bring any textual information into the searchable domain. Currently it processes images/PDFs and digital ink documents, with provisions to extend the service to other media types. The produced index is delivered in the form of an XML or PDF document, containing recognized words, alternative spellings, associated confidence levels and their location rectangles.

The Indexing System is implemented as a farm of dedicated Debian64 servers, each running an AMP processor and multiple ENRS processes – usually per number of the CPU cores of the server. ENRS (EN Recognition Server) is implemented as a set of native libraries wrapped into a Java6 web server application. It currently houses two components — AIR and ANR, the first of which handles various image types and PDFs and second is dedicated to the digital ink documents. AMP communicates with the servers through simple HTTP REST API which allows flexibility of the system configuration while maintaining high throughput essential for passing over large media files.

AMP retrieves resources from the user store shards and return back the created indexes. These will be included into the search index for EN Web Service, and passed over to Evernote phone/desktop clients to facilitate in-media searches locally. To minimize the extra traffic imposed on the shards already busy with user requests, AMPs broadcast queue information to each other, forming a single distributed media processor optimized for the current EN Service load and processing priorities. Evernote Indexing System is resilient enough to be operational even if only one of each type of the components will remain functional (and currently there are 37 AMP processors and over 500 ENRS server processes in operation processing around 2 million media files a day).

EN Indexing System diagram

Let’s have a closer look at the AIR part of the ENRS server. AIR’s reco philosophy is different from the ubiquitous OCR systems as its goal is to produce a comprehensive search index — instead of a printable text. Its focus is on finding as many words out of any kind and quality of an image as possible. Also, it has the flexibility to produce alternative readings for incomplete, unclear, blurred words.

To deal with the real-world images, AIR server does its processing in multiple ‘passes’, focusing on different assumptions in each of them. The image may be huge, but contain just a few words. It may contain scattered words at different orientations. Fonts may be very small and quite large in the same area. Text may alternate between black-on-white and white-on-black. It could be a mix of a different languages and alphabets. For Asian languages horizontal and vertical lines may be present in the same area. Similar-intensity font colors may blend into same gray levels under standard OCR processing. Printed text may include handwritten comments. Ad material art may be warped, slanted, changing size on the go. And that’s just to name a few problems that AIR servers currently face about two million times a day.

Inside AIR server

Below is a diagram of the main building block of the AIR server — a single ‘pass’. Depending on the call parameters, it will specialize on a different kind of processing (scale, orientation, etc), but the general scheme stays the same. It starts with the preparation of the set of images specific to the pass – scaled, converted to gray, binarized – depending on the pass. Then image graphics, tables, markup and other non-text artifacts need to be removed as much as possible to let the system focus on actual words. After candidate words are detected, they are assembled into proposed text lines and blocks.

Each line of each block will then pass through analysis by a number of recognition engines – these include ones developed internally and licensed from other vendors. Employing multiple reco engines is important not only because they specialize on different types of text and languages, but also as this allows to implement ‘voting’ mechanism — analyzing word alternatives created by diverse engines for the same word allows for better suppression of false recognitions and giving more confidence to consensus variants. Those confident answers would become pillars on which the final block of the ‘pass’ processing would base its text line reconstruction – re-deciding the structure of text lines, word segmentation, and purging most of the less-confident variants to reduce search false positives.

Diagram of a single AIR 'pass'

The number of passes to make will be determined initially by the image rendering and analysis module, but as recognition progresses this number may be increased or reduced. In case of a clean scan of a document, it may be enough to run only the standard OCR processing and be done with the whole process. A snapshot of a complex scene taken by a phone camera under poor lighting conditions may require deep analysis, with full set of passes to retrieve most of the textual data. Lots of colored words on a complex background may require additional passes specifically tailored to color separation. Presence of small blurred text will require expensive reverse-digital-filtering technics to restore the text image before attempting any reco processing. And once all passes are complete, it will be time for another critical part of the AIR processing to take stage – final results assembly. On complex images different passes may have produced wild variety of interpretations of the same areas. All these conflicts need to be reconciled, best interpretations selected, most of the incorrect alternatives need to be rejected and final blocks and lines of text built.

Once the internal document structure is finalized, it is only the last step left to create the requested output format. For PDF documents it is still PDF, where images are replaced with text boxes of recognized words. For all other input documents it is an XML index, containing the list of recognized word and their bounding boxes or stroke lists (for digital ink documents). This location info will allow to highlight the searched word over the source image or text of an ink document once a user will look for the document containing it.

Tagged , , , , | 9 Comments

Security Hang-Ups

Scenario:  In the last week or two, lots of people noticed sporadic errors when they tried to synchronize with Evernote or access our web site. The errors would disappear if they manually forced another sync or reload. The web site worked fine after that initial hiccup.

Debugging:  The symptoms pointed to a problem with establishing new HTTPS connections, since subsequent requests (over keep-alive connections) worked fine. We were able to reproduce the problem by just hitting our site with ‘curl‘ a few times:

curl -v -i -s https://www.evernote.com/robots.txt

The majority of requests worked fine, but some percentage would fail with an “SSL protocol error”:

* About to connect() to www.evernote.com port 443 (#0)
*   Trying 204.154.94.81... connected
* Connected to www.evernote.com (204.154.94.81) port 443 (#0)
* SSLv3, TLS handshake, Client hello (1):
* Unknown SSL protocol error in connection to www.evernote.com:443
* Closing connection #0

The openssl ‘s_client’ command-line tool is also useful for low-level SSL debugging, and it showed a similar error during SSL negotiation:

$ openssl s_client -ssl3 -state -debug -msg -connect www.evernote.com:443
CONNECTED(00000003)
SSL_connect:before/connect initialization
...
>>> SSL 3.0 Handshake [length 005e], ClientHello
...
<<< SSL 3.0 Handshake [length 004a], ServerHello
...
<<< SSL 3.0 Handshake [length 0004], ServerHelloDone
...
>>> SSL 3.0 Handshake [length 0084], ClientKeyExchange
...
>>> SSL 3.0 ChangeCipherSpec [length 0001]
...
>>> SSL 3.0 Handshake [length 0028], Finished
...
SSL_connect:SSLv3 write finished A
SSL_connect:SSLv3 flush data
read from 0x100119170 [0x100811400] (5 bytes => 0 (0x0))
SSL_connect:failed in SSLv3 read finished A
5023:error:1409E0E5:SSL routines:SSL3_WRITE_BYTES:ssl handshake
failure:/SourceCache/OpenSSL098/OpenSSL098-35/src/ssl/s3_pkt.c:530:

This failure deep within the SSL handshake was particularly confusing. It seemed like our HTTPS server was dying in the middle of the SSL negotiation.

As I mentioned in our Architectural Overview, we offload our SSL processing onto a pair of A10 AX 2500 load balancers that have performed well since we installed them in January. But this error made us worried that there may be some sort of deep cryptographic error within that hardware.

So we wasted a couple of days trying to fix the problem by rebooting (and cold booting) the boxes on the theory that this would “shake loose” the cryptographic errors. This caused the problem to go away for a while, but then it would be back again the next morning during our peak traffic hour (6am-7am Pacific). Our theories about the possible causes got more and more complicated, and our proposals for fixes got more and more baroque (and expensive).

Finally, we realized that the failure rate seemed to be the highest during our peak traffic times, and that the problem might actually just be a simpler capacity issue. I.e. maybe we’d grown enough to exceed the ability of this hardware. Unfortunately, we didn’t see anything in the UI for our balancer that indicated we were at any sort of capacity limit:

The CPU usage was low, and the “SSL Conns/sec” was around 14% of the rated SSL CPS for this hardware. We contacted our support representatives from A10 and they scheduled a call with several of their experts to help track down the problem or get us new hardware if needed.

They told us immediately that we were hitting a limit, but it wasn’t the “new connections per second” limit, but rather the “total open SSL connections” limit of 250,000.

We were only receiving 1105 new SSL connections per second, and we only processing 2500 HTTP requests per second over those connections, but we were holding a very large pool of idle connections. This was due to an “idle connection timeout” parameter of “10 minutes,” which meant we’d keep the HTTPS socket open for 600 seconds after the last response to the client before we’d close it.

In retrospect, this timeout setting was a bit excessive. Empirical testing with openssl shows that this is several times longer than the idle SSL connection timeouts used by other big Internet web services.

Now, we’ve lowered the idle connection timeout to 2 minutes. This means that we’re closing idle connections 5x faster, and the number of open connections has dropped dramatically as a result. As I type this:

[Yeah, I can kind of go overboard with Skitch. But, to be fair, I liked it long before the acquisition...]

The moral of the story? Sometimes, it’s easy to overlook the simple explanation for a problem when the symptoms are unusual. Our apologies to folks who were inconvenienced by the sync errors while we sorted out these problems.

3 Comments

Lucene: We Got Some Explaining to Do

[I was originally going to go with "I Love Lucene", but did a quick Google search and found that TheServerSide beat me to it...]

As we mentioned in our architectural overview post, the data from each note is spread across three different storage systems on each shard.

All of the metadata about each note goes into structured tables in MySQL. And by “metadata”, I mean all of the fields in the data model structures for a Note and its Resources, except for the Resource’s raw data body and any recognition/alternate data files.

Those Resource files are de-duplicated in software on each shard (using MD5+length) and then stored on a relatively simple hierarchical file system using a folder tree derived from the MD5 checksum.

The combination of MySQL and the file system allows us to store the full contents of the data model and support the vast majority of our API calls. Text-based searches on our servers require some sort of Full-Text Search (FTS) engine to provide any sort of usable performance across large data sets.

This first required us to define the semantics of our search specification (see Appendix C of our API Overview). For example, if I type the word “spatula” into Evernote’s search box, should that match notes that have that word in the title? In the name of a tag on that note? In a PDF embedded in the note?

We decided that we should match words that you see if you open the note in its own window … so all of those examples should match. That means that we index a combination of: note text content, note title, tag names, PDF text (up to 1MB of text), and OCR results.

For the first few months after launch, we tried to implement text search within MySQL itself using MyISAM’s fulltext indexing. We concatenated the relevant texts together in a special table with a FULLTEXT index on the big text blob column.

Unfortunately, we found that the performance of MyISAM’s FTS engine didn’t work for our application. When users create or update notes, they expect those notes to immediately match any text searches. This means that text indexing is basically synchronous for our application … we can’t batch up index updates to run every hour. Updates to MyISAM’s FTS perform poorly under concurrent environments with lots of small changes coming in from different threads. If one user added a note and flushed the index, this would block another user from adding a note until the indexing completed.

We tried a few things to batch updates in MyISAM, but finally gave up and switched everything to Lucene.

Each Evernote user has their own Lucene search index in a separate directory on the file system. When someone modifies their notes, or performs a search against their account, we open their index and perform the updates and searches against that index.

The basic text search functionality was pretty straightforward in Lucene, but there were a number of things that we have to do to improve overall performance.

First, we denormalize all searchable and sortable data for each note into the Lucene index rather than just storing the raw search text. This allows us to just ask Lucene for the list of notes matching a set of criteria without doing an in-memory merge with separate data from MySQL.

For example, the note’s “last update time” and “notebook” are stored in each Lucene document. If you have an account with 60,000 notes, and you want to find 50 notes with “spatula” in your “Cooking” notebook sorted by last update time starting at offset 100, we can perform that full query directly against Lucene and directly receive the correct 50 note identifiers. If the notebook and update time were only in MySQL, we would potentially need to compare tens of thousands of results from MySQL and Lucene to find the desired 50.

We also keep user indices open in our server until the index has been idle for too long (currently 65 seconds) rather than opening and closing the index with each request. We have a dedicated maintenance thread that closes idle indexes sequentially, waiting for each to complete any merges before continuing. We found this was necessary to keep from spiking disk activity while closing multiple indices.

Finally, we perform a fairly elaborate pipeline of transformations on both note text and search expressions in order to normalize the representation of the text for correct comparisons. We have Lucene analysis filters that operate on the sequence of tokens to:

  • Remove apostrophes and other intra-word punctuation
  • Convert upper-case letters to lower case
  • Remove English “stop words” like “the” and “and”
  • Normalize letters with diacritics so that “ñ” becomes “n”
  • Convert “narrow width” Japanese characters to “full width”
  • Reorganize Chinese/Japanese/Korean text into pseudo “words”

Overall, we’ve been happy with the power and flexibility of Lucene. We have found, however, that it has become the most expensive software component in our shard infrastructure. On a busy shard, Lucene makes twice as many IO operations as MySQL, and those operations are less sequential. This means that it’s the top priority for future software optimizations, and also for future hardware tuning to ensure that our shards continue to scale well as we grow.

3 Comments

Android moves to the tablet

Tablets and Fragments on Android

We just released Evernote for Android optimized for tablets.  The same APK runs on the phone and tablet, we’ve just customized the look and feel based on the device.  This post will explain the process we went through to convert our app to a tablet form factor. I hope some of our experiences will be helpful to you if you find yourself moving your app from the phone to the tablet.

Requirements for Evernote App

  • Same APK has to run on both phones and tablets
  • Same APK has to support Android devices with version 1.6 and above
  • Don’t lose any of the current phone features
  • Use screen real estate wisely
  • Visually appealing
  • Feel Responsive

Design

Like the Android phone design, the new Evernote for Android tablets is a design created by our Chief Designer Gabe Campodonico.  He spent a lot of time designing the app to be both visually appealing and usable on the tablet’s larger screen.  You’ll notice that the look and feel is quite a departure from the phone version and much of the theme should eventually make its way back to the phone version.

Fragments Static Library

The first step was to download the jar file and source code for the Fragments static library from Google.  We have to use the static library to be able to support pre-Honeycomb devices.  The source code was very valuable because I was able to fix some bugs in the first version of the library that otherwise would have caused problems and would be difficult to work around.   Activity results were broken so the results never made it back to the calling Fragment.  (Thank you Google for releasing the source)

Converting current Activities to Fragments and FragmentActivities

The next step was to take a couple of activities that I knew would end up being displayed in the dual pane format and convert them to fragments.  This didn’t involve the tablet, I just changed the current phone activities to use a shell activity and moved all the functionality into a fragments.  Doing so allowed me to become familiar with the basics of fragments and start to seeing some issues that needed to be addressed.

  • If I were to use full screen dialogs in the same way our app currently used them, I would have to create unique IDs for all of the dialogs and have a way to find out which fragment a dialog belonged to.  This is because the Fragment would ask to create the dialog, which would bubble up to the parent Activity. When the parent Activity’s onCreateDialog was called and there was more than one Fragment in the Activity it had to know which Fragment’s onCreateDialog to call to actually construct the dialog.  There were similar routing issues to options menu items where item selection needed to be routed to the correct Fragment for handling.
  • I needed to separate our UI initialization from our Intent handling in these new fragments.  So much of our intent handling was in the onCreate methods of our Activities, mixed in with the UI initialization, and this just wouldn’t work with fragments where the intent isn’t even initially available.  I changed the Fragments so that they could handle the cases where the intent is available during initialization, and also where it wasn’t available until later.

Now that I had our notebook list Activity and our note list Activity converted to Fragments, I combined them into the beginning of the main tablet Activity.  This went pretty smoothly and it was nice to see something that looked more like a tablet app when I launched it on the tablet.  The tablet activity itself didn’t do much more than house the fragments and in the end would handle swapping them in and out as different note browsing tabs were chosen.

List fragments such as Notebooks, Shared Notebooks and Tags  need to handle static selection, allowing an item selected on the left to remain highlighted when a user switches between tabs.  To do this, I also needed to persist the highlight information so, when the app gets cleaned up or when the user switches between main tabs the highlight can be restored.

In the end because of the Fragments static library at least 95% of the code is shared between the phone and tablet versions, and the main differences are handled in the resource folder.

Unique UI Widgets

With larger screens we wanted make better use of the screen so we designed horizontally scrolling snippets.  For example, the note list that is included in the Note View and the Maps View both contain horizontally scrolling snippet views.  To accomplish this, I created a GridView that can be configured either through xml or dynamically to scroll different directions based on screen orientation.

Transparent Pop-ups

We also incorporated transparent overlays and bubbles to give a pop-over visual effect such as the search pop-up.  This was accomplished by theming activities instead of having the pop-up layouts stacked on top of the current activity.  This kept the logic almost identical between the phone and the tablet, despite having a  full screen activity on the phone.

Action Bar

You’ll probably notice on the tablet app that our left bar and top bar are different from the now-standard tablet ActionBar.  This is because, in our design, we really wanted to optimize it for the landscape orientation. We didn’t want to have two title bars at the top.  Our design allows us to have buttons on the right and left that are clearly related to those panes and their context.  Also, on any screen that had more than one UI pane, we moved away from using the options menu at the bottom because we felt it was confusing.  What we didn’t realize (because of using API 10 as our target) was that the options menu was removed completely in API 11 unless you are using Android’s ActionBar.  Be aware of this, so even if you think you have a better design than the standard ActionBar, you should find a way to make the ActionBar work, unless you are okay getting rid of the options menu, or building your own handling of it.  Now we’ll be forced to change those navigation elements to a design that incorporates Android’s ActionBar.

Performance

After getting the main portions of the app working and even a lot of the UI polish done, it was obvious that performance had become a huge issue on the 10 inch tablets.  Having a screen full of thumbnails in the new snippet note list, required loading of a lot of different image files off the sd card and this caused everything to be very slow.  I dug into the issue and came to the conclusion that the only way to load the images quickly was to store the images together in large memory mapped files and to copy the pixels directly from a ByteBuffer to the bitmap object.  To do this, I started with the Krati datastore project shared by LinkedIn engineers and then made a lot of changes to make it fit what we needed.  The result was very fast loading of the thumbnails.  This is also used on the phone version. Try flinging that note list of snippets on the phone and watch the images fly by.  This is done with no preloading whatsoever, but does use a very small cache once they are loaded.

We hope you enjoy the new Evernote for Android Tablets because we certainly enjoyed making it.  Let us know what you think in the comments, if you are a lucky Android tablet owner.

 

 

5 Comments

Elephants love mangoes – a look behind our Windows Phone 7 client

Introduction

A year ago I was working in the IT department of a large commodities trading company, managing a team of 11 people. I’d been developing software since my early teens, but I’d wanted to try my hand at managing.  I sold my two previous start-ups rather than growing companies, and management experience was something I felt I needed to change that.

Although I enjoyed managing, and didn’t totally suck, I still felt I needed to code to maintain my sanity.

So, in my spare time, I coded up an Evernote client for Windows Mobile 6.5.  My client did a few things that were missing from the official client, such as storing offline notes.  With absolutely perfect timing, I released it just as Microsoft unveiled their mobile strategy reboot, in the form of Windows Phone 7 (WP7).

Just as I was starting to realize that full-time management was not for me, and I began to explore doing another start-up, I got a call from Phil Libin, Evernote’s CEO.  Would I like to join Evernote to create their WP7 Evernote client?  Perfect timing – of course I would.  Last week we released the first version of that client.

This is the story of what worked, what didn’t work, what makes WP7 a great platform to develop for, and what some of the challenges were.  I’ll also explore the opportunities being opened by the next release of WP7, codenamed Mango.

A great development platform

Mature roots

In choosing to base WP7 on .NET and Silverlight, Microsoft immediately made the platform attractive to the massive existing community of .NET developers.

Although I had little experience with Silverlight, I did have many years of C# development under my belt, and it felt great to be able to re-use those skills.  Once you’ve experienced developing using technologies such as Language Integrated Query (LINQ), lambdas, and garbage collection it’s painful to have to give them up.

Silverlight has been out for several years, and although it has not seen massive uptake on the internet, it remains popular for intranet development within corporate environments.

Silverlight and the Model View ViewModel (MVVM) design pattern go hand-in-hand.  You declare what your UI looks like using an XML language called XAML, which is bound declaratively to your ViewModel code.  This means that you can test your ViewModel code stand-alone, without needing to involve the UI.  Silverlight also has a fantastic animation and visual state management system.

Superb tooling

As WP7 developers, we are spoiled.  Not only do we have a powerful, mature and rich IDE in Visual Studio, with great plugins such as Resharper, but we also get a fast emulator, and also a complete, standalone design tool in Expression Blend.

The emulator actually runs as a proper Virtual Machine, using the hardware virtualization features of your computer.  The upside to this is that you get snappy performance, but the downside is that you cannot develop inside a virtual machine, since the emulator will not run inside a virtual machine.

Blend lets you design, animate and tweak your app’s UI.  Although it was primarily created as a “high-end” tool for designers, there was such an outcry from developers, that Microsoft have made it available for free for WP7 developers.  It is a joy to work with, letting you focus on the look and feel of your UI, building storyboards and animations, without worrying about code.

Community

Great community

The existence of a strong community around WP7 development is far from unique to WP7, however perhaps because Windows Phone is something of an underdog right now, the community feels particularly tight.

Beyond the usual suspects such as Stack Overflow, and Microsoft’s own forum, there are sites such as Windows Phone Geek that provide a constant stream of high quality tips and tricks and articles.

Great components

Because it is such a new platform, it seems as though everyone and their dog is putting out new component frameworks, although it looks as though people are consolidating around WP7Contrib.

In the Evernote WP7 client I make use of:

Some challenges

Lack of Database

You know how everyone complains about the Google interview process, because they ask you about the implementation details of various hashing and sorting algorithms that you’ll never have to use in the real world?

Welcome to the world where knowing about those things counts.  The initial release of WP7 has no database technology exposed to developers.  What it does have, is isolated storage, and the ability to seek through files.

The ability to store notebooks offline, coupled with the possibility of having many thousands of notes, made the lack of a database a challenge.  A note belongs to a single notebook, and can have zero or more tags associated with it. You can view your notes by notebook and by tag, and the note list can be sorted in varying ways, including based title, timestamps, and geolocation.  A database would have been nice.

A number of people addressed this gap, and provided database technologies of varying levels of completeness and complexity.  I spent many weeks in late 2010/early 2011 trying them out, never finding one that really matched my requirements.  In the end I decided I’d had enough of spinning my wheels, and went with my own, simple mechanism.

Platform bugs

Normally it’s simple arrogance to think you’ve found a platform bug – chances are normally very high that it’s actually your code that is wrong.

But in the first release of any new platform such as Windows Phone 7, sometimes it is the platform!

Can’t set the focus on any of the controls on your page after setting the focus on a web browser control?   How about getting a crash when trying to restore your position to the third pivot item in a pivot?  Or unexpected error messages in your logs?  These are all things that have tripped me up, and turned out to be platform issues – totally understandable given the youth of the platform – just be a little more open to the idea that it isn’t your code that is causing the problem.

There are also features that you’d expect to be there, but simply aren’t – for example although you can stream data down from a web site, you can’t stream up meaning that if you want to send data to a web site it must all be in memory when you send it.  This poses a challenge if you are uploading extremely large files.

Finally, the platform itself seems to require things to run on the UI thread in surprising situations, such as some IsolatedStorage access, and networking (even if initiated off of the UI thread, and of course not using the WebClient class), which can have significant performance implications.

Smooth Scrolling

Given the emphasis on providing a snappy, responsive interface, it is surprising how hard it is to implement a really smooth scrolling experience when you have many items in a list, with images, and with some content potentially being dynamically downloaded.  Oh yes, did I mention there may be many thousands of items in the list ….

I started off using the built-in ListBox.  This provides UI virtualization through its use of a VirtualizingStackPanel, meaning that only the UI for the visible elements are instanciated.  It also provides data virtualization: if the list to which the ListBox is bound implements IList<T> then it only asks the list for the elements it needs right now, meaning that you can dynamically fetch elements from your data source as required.

Despite these benefits, I was drawn to using the LongListSelector from the WP7 Toolkit.  It has some cool features such as grouping, and hooks to tell you when your list data items are in use, and when they are no longer in use.   So it provides UI virtualization.  What it lacks however is data virtualization – the first thing it does is walk through every single one of the items in the list to which it is bound …

There are plenty of articles and techniques describing what to do to get smooth scrolling, such as downloading data in background threads and pausing downloads when scrolling … it is still very tricky to get right if you have many items, and everything is not sitting in memory.

In the next release of the WP7 client I’ve decided to go back to using the ListBox- the advantages of that special handling of IList<T> items, where it only requests the list items that it really needs is hard to beat.

Some challenges, overcome

Delay loading pivot items

The WP7 Evernote UI that you are initially presented with consists of a Pivot with several pivot items.  The initial pivot item that you see to begin with contains a list of all your notes.  Next are lists of your notebooks, tags and recent notes.

In order to improve the initial load experience, I delay load all Pivot Items other than the list of all notes.  The XAML content in the other pivot items is empty, but when the load event fires for those pivot items I then dynamically load a user control containing the UI elements for that pivot item, with appropriate loading animations so that it doesn’t look as though the UI has frozen.

Textbox run binding

In the list of notes, I wanted to display the timestamp in bold, followed by a snippet of text in another colour – all part of the same flow of text.

In order to create formatted text you can define a TextBlock, with child Run elements. Each Run defines formatting for the text within that Run.

Everything about XAML is about binding.  You define the UI using this declarative markup language, and bind properties of UI elements to properties of .NET objects, for example the Text of a TextBlock item to the Title of a Note .NET object.

To cut a very long story short, you can’t data-bind TextBlock Runs, meaning that I couldn’t simply bind the text of the first Run to the timestamp property of my note object, and the second to the snippet property of that object.

I ended up writing my very own binding mechanism, which was a lot of fun, seems to be pretty popular on Stack Overflow, but isn’t really where I want to be focusing my development time.

Wave file header

Recording from the Microphone in WP7 is pretty straight-forward, but if you want to actually do something with the recorded data stream, such as save it as a file, you are out of luck.  If you simply save the raw data stream, then that’s all you’ll have in the file.

In order for playback software to make sense of that data, there needs to be an appropriate header, which probably explains why the post on my blog, explaining how to add the WAV header is one of the most popular.

Isolated Storage performance

Although  I store the metadata (title, timestamps, etc) for all the notes, notebooks, tags, etc. in a single “database” file, the actual content is stored as individual files (the note body and associated “resource” data such as attached files and embedded images and audio clips).

I also cache the note content that I transform into HTML for viewing locally.  So for a note with a three embedded images there would be six files in isolated storage:

  • The note’s thumbnail for viewing in the list of notes
  • The note’s content
  • The note’s content transformed into HTML
  • Each of the three image files

The performance of opening a file from isolated storage degrades the more files there are in a single folder, so instead of storing the files for all notes in a single folder, with the note’s unique ID as the file name, I instead store the content in a subfolder, the name of which is derived from a two digit hash of the note’s unique ID.

One other issue I faced is that when a user signs out, I’m faced with the task of deleting several thousand files from isolated storage.  I don’t want to force the user to wait while this is happening – it could literally take hours (there is no ‘delete folder’ API call that does a recursive delete – you need to walk the directory tree and delete each file individually).

My solution was to store all the data under a root folder with a unique ID (Guid), and refer to that folder in the settings dictionary.  When you sign out, I clear that reference, and when you sign in again, you get a new unique ID (and corresponding folder).  I then have a background thread that deletes all files that are not under the current user ID – it can plug away in the background removing all traces of previous logins.

Looking forward to “Mango”

Wikipedia reliably informs me that Mangos are the most consumed fruit in the world, and while I’m not sure that the next release of Windows Phone, codenamed “Mango” will project Microsoft to quite that level of success with Windows Phone, it is clear that this is a major release, not only from a user perspective, but also from a developer perspective.

As well as addressing most, if not all of the issues I listed above, there are a number of features that will not only make our lives easier as developers, but also help us provide features and usability far beyond what we can provide right now.

Here are some of the things I’m very much looking forward to taking advantage of:

Database

Although this should be largely transparent to the user, as a developer the presence of native supported database will make a massive difference to me – I will take great pleasure in replacing several large chunks of code with a couple of lines of code: less code to support, less potential bugs.

Silverlight 4

Many of the issues I had to work around in the current version of Windows Phone are addressed in the move to Silverlight 4, such as not being able to data bind TextBlock Runs.  As a developer the experience will be a lot less frustrating, with things that you think should be easy actually being easy.

Integrated Search

One of the most puzzling things for users of the Evernote application on Windows Phone, is that when they press the Search hardware button when running the Evernote client, they get taken to Bing search rather than Evernote’s search.  Instead they must press the dedicated search button within the Evernote UI.

This situation may get a little less confusing with Mango, in that apps can indicate that they can handle specific kinds of search (products, movies, places and events).  The most obvious integration for Evernote will be places, since notes can be geotagged.  When you are looking for information to book a return trip to that fantastic holiday destination, perhaps you will also see the list of notes you created last time you were there.

Deep linking

In order to create an new note right now using Evernote, you first need to fire up the app, then press the “New Note” button.  This can be reduced to a single click using deep linking, whereby we’ll be able to create a tile that links to specific functionality (take a snapshot note), or specific content (a notebook or a note).

Tiles

We’ll also be looking to take advantage of the new live tile capabilities of Mango, perhaps to show that there is unsynced content, or the number of outstanding todo items.

Background Sync

The current release of Windows Phone has no capabilities to run apps in the background, and although the Mango release will open up some capabilities in this space, Microsoft are rightfully being frugal about how long apps can run for when running in the background.  Nevertheless, when your phone is plugged in, and within Wifi range, there is no real reason why you shouldn’t take full advantage of the abundant power and bandwidth to synch up your Evernote account in the background.

Conclusion

Windows Phone 7 has in many ways been a delight to program for, given the use of the .NET framework and C#, combined with the power of Silverlight.  Nevertheless, it has been challenging to create a data-centric app, with many thousands of interrelated entities, without fundamental OS features such as a database.

I am really excited about the potential of the next release of Windows Phone, codenamed Mango – it will open many more opportunities for us as developers and designers to make what we imagine, real, making it easier to bring those ideas to fruition.

Thanks

I’d like to say a special thanks to Reza Alizadeh from Microsoft, who helped us steer the product to release, and to Robby Ingebretsen who did wonders sprucing up our UI and adding that extra bit of magic.


8 Comments