News

Evernote’s July 1st Server Problem

Posted by Phil Libin on 09 Aug 2010

Posted by Phil Libin on 09 Aug 2010

Comment

Evernote experienced a series of hardware failures on one of our servers between July 1st – July 4th, which potentially affected 6,323 users worldwide. As a result of the failure, some of the notes created and edited by these users between July 1st and July 4th were not properly recorded on the Evernote servers. We immediately contacted all affected users via email and our support team walked them through the recovery process. We automatically upgraded all potentially affected users to Evernote Premium (or added a year of Premium to anyone who had already upgraded) because we wanted to make sure that they had access to priority tech support if they needed help recovering their notes and as a partial apology for the inconvenience.

Less than one fifth of one percent of our users were in the potentially affected group and we were able to identify 100% of them from the server logs. Because of this, we decided not to post a wider announcement to give our support staff the time to work with the actual people affected instead of fielding a flood of requests from the more than 99% of users who were not in the affected group but had no way of determining that themselves.

If you did not receive an email from us in early July, then you were not affected.

Because of the redundancy inherent to Evernote (copies of notes saved locally, email and browser history), the majority of the affected users were able to recover all their notes.

We want to assure you that this was a one-time issue. We have significantly improved our reporting and redundancy infrastructure to ensure that it does not happen again. We sincerely apologize for the inconvenience to our affected users. Even though most of them didn’t wind up losing any data they still had to read through a lengthy and potentially worrying email. For the ones that did lose data, we hope that knowing exactly which notes were effected over a four day period is enough information to recover or recreate the most important ones.

We received replies from several hundred of the affected users, and we are extremely grateful for their understanding and continued support. We are posting this now because of erroneous information that we’ve seen popping up on the web.

For the technically minded, here’s what happened

Every user’s data is stored on a “shard”. A shard is made up of a server together with a redundant fail-over server. If there is any problem with a server, the system automatically fails over to the second server in the shard. We currently have 37 shards. Shard 22 was the one that had problems last month. The data in each server is stored on a RAID 1 (fully redundant) array. All data is also backed up on-site and off-site. A full copy of your notes are also stored on the Windows and Mac clients (and the iPhone and iPad clients for Premium users who enable that option). This means that every note in Evernote is stored in at least six redundant locations: the disk on the primary server, the RAID mirror, the fail-over server on the shard and it’s RAID mirror, the on-site backup and the offsite backup. Most users also have another one or two full copies on their local clients. This makes data loss in Evernote extremely rare. The problem with shard 22 was a very idiosyncratic intermittent combination of hardware problems with both the primary server and the fail-over mechanism. Basically, the shard kept failing over back and forth between two servers over the time period causing some of the data created during that time to get overwritten. Everything created before the failure was easy to recover from backup. The chance of this particular sequence of failures happening again is extremely low, but we’ve modified the fail-over mechanism, just in case, to make sure that it is impossible to override data even in the worst-case scenario.

Premium

Evernote Premium

Upgrade for features to help you live and work smarter.

Go Premium
View more stories in 'News'

24 Comments RSS

  • Kin Lane

    I wasn’t one of the users affected by the issue. Reading your open response to the issue as well the deeper technical dive makes me feel very confident in your approach.

    Thanks for such great service.

  • Alain

    Great transparency. Thanks for sharing.

  • Mikko

    Thanks for good explaining what happened. The problem was still that your website was down and I was not able to access it when I needed. And there was no information at all from your side. No backup website, no tweet, nothing. This is just unacceptable, specially for Premium customers. Technical problems happens, that’s understandable but not notifying clients immediately when some problems occurs, that is something I don’t want to see again. Otherwise, keep up good work :)

    • Andrew Sinkov

      Mikko, we now have a status site, which sends an automatic tweet: http://status.evernote.com

      • Mikko

        Ahh, how I missed that twitter link… Was that from the beginning…?
        But thanks for that, now that looks exactly what it needs to be :) Great work, and thanks for great service, I still find more and more uses every day.

  • Vance

    I think it was the right decision to keep it under wraps to avoid a panic, even though you knew you would incur the wrath of those effected until you could identify and contact them. When bad things happen, it is about limiting the damage, not eliminating it. The free upgrade (or extension) should help mollify them.

  • Alexandru

    I subscribe 100% to what Kin Lane said in the first comment! Great job guys and keep up the good work!

  • Marc

    I agree Evernote data is stored on multiple locations. But this is no insurance against mistakes like one of the servers missing one or more notes, or a user throws away some of his/her notes. If a note gets lost somewhere, the ‘deletion’ of this note will propagate through all locations and you will still lose that note. It’s better to make an offline copy once a while.

    Is this correct?

    • Dave Engberg

      Marc –

      We keep multiple levels of backups to allow us to recover from a server-related data loss. We also maintain historic versions of existing notes in case you ever make an accidental change to a note:
      http://blog.evernote.com/2010/04/14/new-premium-features-note-history-and-50mb-notes/

      However, if you choose to put a note in the Trash, then empty the Trash (and say “yes” to the scary confirmation dialog), then we permanently remove the contents of that note from our servers. This is an intentional decision for user privacy. If someone accidentally put something sensitive into their account and then went through the multiple steps to completely delete it, we felt that it would be inappropriate to keep a copy of it on our servers. That policy is the reason that we make it relatively difficult to go through the whole process of completely deleting notes (including a dialog box that explicitly warns you what is happening).

      Your own desktop client for Mac or Windows has a full copy of your notes, so you could always use your own backup solution to maintain archival copies of your database as well.
      On the Mac, your database is stored within your home directory under:
      Library / Application Support / Evernote
      On Windows, you can find your database location via: Tools > Options > General

      Thanks

      • Marc

        Dave, thank you for th explanation. Forgot to mention that I am a happy Evernote Premium user, and recommend the product!
        http://mchangsp.blogspot.com/2010/01/my-second-brain.html

        Thanks!

  • maht

    Sorry but having a system where data is overwritten is bad design. RAID is *not* backup. You need schooling in WRITE-ONCE media.

    Allow me to introduce you to Venti

    http://swtch.com/plan9port/man/man7/venti.html

    Nothing is ever deleted, your RAID is a cache. $20k will get you a 20Tb tape juke box.

    Still, never mind, they are only users.

    • Dave Engberg

      Yes, RAID on a single box is just our first level of redundancy. Block level replication to a second RAID on another box is our next level of redundancy. Our third level of redundancy is a nightly snapshot (on the secondary box) with an incremental file system backup onto a separate local drive. The fourth level is a nightly network backup to separate backup media. The fifth level is a weekly offsite rotation of backup media. The sixth level is dedicated “cold storage” for file system volumes that are “full” and no longer receiving new data. (We actually use high-capacity hard drives instead of tapes, for faster recovery times.)

      In this incident, the first four levels of redundancy failed due to a combination of hardware, software, and operational errors. One of the many changes we’ve made as a result of this incident is to change from keeping a single nightly backup to keeping nightly backups for the last 7 nights. So a single night’s corrupted backup won’t overwrite last night’s backup.

      Thanks

      • EverydayIpad

        Seems reasonable enough to me.

      • Messyta

        LOL, Andrew…. Thanks for the heads up on the link memeganant…I’m testing out your latest creation as I type….

  • Healy

    Seems like you took care of the issue quickly and in a way that shows you really cared about those people impacted.

    I will continue to wear my Evernote t-shirt today with pride… :)

  • Shannon Wagner

    Evernote continues to rock – if I could buy y’all an ice-cream cone, I would!

    :-)

    By the way, the only other companies who are on my “ice-cream list” are Dropbox and Google.

  • Ragnar

    reading this:

    “[…] We automatically upgraded all potentially affected users to Evernote Premium (or added a year of Premium to anyone who had already upgraded) because we wanted to make sure that they had access to priority tech support if they needed help recovering their notes and as a partial apology for the inconvenience.[…]”

    made me fall in love with Evernote all over again.
    I wasn’t affected by the outage and am glad i didn’t get a worrying email, so good job, too, on keeping it covered ;)

    I also agree with my fellow commenters that your level of transparency is great and it makes me feel even more confident in the product.

  • Anne H

    Nice write up even for those of us who are technically challenged. Also glad to see the addition of a status page.

    And Shannon if you do send them ice cream let me know and I’ll send some Smucker’s Cover Up! ;-)

  • evermullah

    Good Work. Everythings is said in the first response of Kin Lane.
    Keep it that way ;-)

    shit happens…and *then* you step in ;)

  • modifiyeli arabalar

    thank you dude

  • erickleslie

    I can see all my pictures but I can’t see them any longer individually. I named and tags them. I synced them etc. Cab anybody help?

  • Servers

    doesn’t sound like evernote is having fun. maybe they should be looking into some more reliable servers. downtime is not good.

  • Charity Cambodia

    I wonder if you’re still having issues nowadays, or whether all these issues have been patched and sorted now.

  • Jason Alexander

    I hate idiotic SEO spam companies. Can a mod please clean the crud posts above, delete this post and lock the thread?

    PS. Ironically I found this when searching for your deletion policy. Thanks for a fabulous product and the tech transparency – You are a credit to good business.