Evernote Tech Blog

The Care and Feeding of Elephants

Recap: Evernote at TechCrunch Disrupt EU Hackathon 2014

tc_disrupt_evernote

Each year across the globe, the Evernote Developer Relations team supports millions of designers and engineers build projects on the Evernote Platform: our combination of APIs, tools, and partner apps that help you remember everything.

This past weekend, we’ve had the opportunity to meet over 800 developers that are taking part in TechCrunch Disrupt Hackathon Europe at Old Billingsgate Vault in London.

What’s a Hackathon?
It’s the best place to meet future team members, learn a new programming language, test out multiple APIs, and set aside 24 hours to complete that long neglected side project!

hackathon_photo_tryptic

Also, hackathons are weekend events that challenge tech and designers to build a project over the course of a weekend, with the top projects getting acknowledged for their achievements.

Hackathons tests a teams’ coding abilities, UX prototyping design, and decision making for features over a tight schedule. Not to mention a over consumption of pizza, red bull, and NERF dart launchers.

Evernote @ Hackathons

matt_api_workshop

We look for teams that are reading, creating, and editing notes using our API in interesting ways. Apps that solve problems and succeed in delivering a working prototype that is well designed, developered, useful, and most of all original.

Watch the TechCrunch Disrupt Hackathon Pitches:
Here are some of the teams that accepted the challenge of building tools and apps that help you do more with Evernote and our API:

Startup Tracker – Evernote Prize **WINNER**
Harness the power of CrunchBase and Evernote to keep tabs on cool, up-and-coming and competing startups.
Watch the pitch >>

MILKZUP – Evernote Prize **CO-WINNER**
The Intelligent Milkbot for YO! and Evernote
Watch the pitch >> 

Vocablurry
Learn without learning! We increase the frequency with which you see certain words whilst browsing.
Watch the pitch >>

ExtenderNote
A cooperative way to use your Evernote’s notes with your friends or others
Watch the Pitch >>

Evergreen
Revisions for Evernote
Watch the pitch >>

Quotesy
Collaborative study tool for sharing and commenting on book quotes
Watch the pitch >>

Shrug Project
Shrug: enter your ingredients; get recipes!
Watch the pitch >>

Locado
Your smart location-based ToDo notifier.
Watch the pitch >>

Travelnote
The travel notebook for Evernote
Watch the pitch >>

Thanks for all that participate in our hackathons — we love seeing new projects built with our API that solve problems. To learn our how to code on our API for yourself, checkout dev.evernote.com

Tagged , , , , | Leave a comment

Evernote and POODLE

Yesterday, Google researchers announced a vulnerability in version 3.0 of the SSL protocol. Google’s advanced acronym-generation algorithm dubbed this issue POODLE (for “Padding Oracle On Downgraded Legacy Encryption”).

Even though the SSL 3.0 protocol has been superseded by secure alternatives for at least a decade, most existing operating systems and Internet applications are willing to speak this old dialect for backward compatibility. Unfortunately, this willingness could be exploited by attackers to force modern web browsers and servers to communicate insecurely.

The researchers found that an attacker with control over your network connections (for example, on a public wifi network) could trick your web browser into leaking your personal “cookies.” These cookies could be used to assume your identity on secure web services like Evernote.

Web browser vendors are working to push updates that would mitigate this risk by removing SSL 3.0 support from their software, but it may take months for these changes to trickle out to the majority of Internet users. Until that time, users of any service that still offers SSL 3.0 communications will be vulnerable to attack.

Evernote has determined that the only way to ensure that our users are protected from this vulnerability is to disable SSL 3.0 support on all of our servers so that they will only communicate with secure TLS. This will prevent attackers from tricking your browser into using the insecure protocol and stealing your identity.

Tomorrow morning (October 16th), we will disable SSL 3.0. The majority of Evernote users should not see anything different after the change. Unfortunately, there are two types of users who may have problems connecting to Evernote after SSL 3.0 is disabled.

First, people who access Evernote through extremely old web browsers like Internet Explorer version 7 or earlier may see security errors on www.evernote.com, as well as other sites like Twitter that have made this change. To fix this problem, install a more recent web browser.

Second, people who have installed Evernote on Windows XP may see networking errors during synchronization if they never installed Service Pack 3 and Internet Explorer 8 on their computers. These people should be able to fix the problem by installing Service Pack 3 and Internet Explorer 8 via Windows Update (or from Microsoft’s web sites).

We apologize in advance for the disruption this will cause to users of those old browsers and operating systems, but we feel that this is the best way to protect all Evernote users from attack.

18 Comments

Inside Evernote: Garrett Plasky

How is Operations handled here at Evernote?

Operations at Evernote is fast-paced and rewarding. We have a project-oriented workflow and superb coverage of infrastructure supporting Ops. We routinely collaborate for problem assistance via white-boarding (on the walls, no less!) and drop-in meetings. Our team has grown significantly yet we have managed to remain agile while maintaining excellent change-management and documentation coverage. We employ a belt-and-suspenders approach for managing our infrastructure and we are encouraged use whichever tool is best suited for the task at hand. This leads to a very fluid team dynamic and a “get things done” mentality which makes Ops @ Evernote a great place to be.

What role do you play on the Operations team?

Ops Rock Star Garrett Plasky

Garrett Plasky – Evernote Operations

Our Operations team is unsiloed by design so I have the opportunity to wear many hats on any given day. One of my commonly-donned hats is working with our core service development team to sequence and perform our weekly service releases. Outside of the normal release cycle, I also spend part of my day supporting our development team with operational guidance, debugging, and troubleshooting (a la DevOps). When I’m not working on either of those, I can usually be found wrangling Puppet code for our ever-growing service catalog.

What are the big challenges?

One of our biggest challenges will be to keep our agility as we grow rapidly without sacrificing reliability and responsiveness. This may include rethinking assumptions about our infrastructure and further automation of processes for which our scale will dictate as a necessity. I am particularly excited about a project we recently kicked off to rework our infrastructure development lifecycle. The project, by design, is likely to touch nearly all of the major components of our service infrastructure and will be a significant win for our team and the company as a whole.

What’s the most satisfying part of your job?

Coming into work every day! In all seriousness, working at Evernote is a rewarding experience where we combine a professional Ops environment with individual ability to affect change. For me, with the culture of openness and collaboration at Evernote, taking ownership of a project and seeing it from inception to completion is a highly rewarding process. The free lunches aren’t too shabby either!

What’s your background? Who’s your biggest mentor?

I’ve been fortunate to work with some very smart people in my career and Evernote is no exception. My future as an Ops guy started when, through a CCNA course I was taking at the time, I managed to land a part-time job as a junior sysadmin in my sophomore year of high school. Under the tutelage of a few smart fellows at NDC Host, I managed to continue to grow my prospect for a career in Ops while I finished my bachelors degree. Now at Evernote I have the pleasure of working with a talented team of individuals, each of whom I highly respect and from whom I learn something new every day.

What’s your favorite Evernote feature?

As an Android user, I am enamored with the Evernote widget. When it comes to getting my thoughts, a picture, or a reminder into Evernote, there simply is no better or faster way than a single tap from my lock screen. It also works very well with my own organization structure, mentioned below. Our new Android web clipper is also quickly becoming my single favorite feature in any of our clients.

How do you use Evernote?

I am constantly working to bring more of my life into my Evernote account in order to more effectively manage the daily flood of information. As a result, I am a heavy user of our web clipper, email gateway, and 3rd party Evernote integrations. My Evernote workflow uses the concept of a ‘timeline’ default notebook to hold a chronological entry of my notes; each note being a discrete thought, tagged free-form with terms I might use to search for the note in the future. Using Evernote’s powerful search grammar for things like time and place, I can find notes in my timeline even if I don’t quite recall the exact tag or contents. My wife and I also share an integrated workflow used for everything from keeping track of our heads-up board game scores (Carcassonne, Lost Shores, et al!) to trip planning and long-term goals.

Leave a comment

Inside Evernote: Dean Rzonca

What is your role on the Evernote Business team?

As a developer, I do everything – working with the product manager and designers to refine requirements for new features, helping QA form test plans, and of course lots of coding. I do full-stack development, everything from CSS animations to database schema changes.

Dean Rzonca

Evernote developer Dean Rzonca

 

Why is Evernote Business an exciting product to work on?

Evernote has been around for a while, but Business is a relatively new thing for us, so it’s exciting to be a part of something this early on. Even now, a lot of really awesome companies are using Evernote Business, so there’s a lot of impact.

 

Can you describe a few of the Evernote features you are working on now and/or you have helped build in the past?

When I started at Evernote there were only a handful of web developers, so I’ve done work in almost every part of the web service.

One of my first projects was to help get us ready to launch our service in China, which was a huge effort for almost everybody in the company. Later, I worked on Reminders in the web client, which was a lot of fun. We have really great designers here, and I enjoyed working with them to get the drag & drop interface just right.

I also did a lot of the initial work for Business, before we had a dedicated team for it. I worked with a couple of other developers on pages for administration and signup, and added support for viewing business notebooks side-by-side with personal notebooks in our web client.

What I’m working on right now, however, is top-secret.

 

What is your biggest challenge at present?

I think we have two major challenges right now.

First, our scaling model has done a great job handling millions of users who each have their own collection of notes, but there’s a lot of work to do for business users who may have access to thousands of notebooks and hundreds of thousands of notes. We need to make sure everything is fast and helps you find what you need.

Second, companies are concerned about knowing where and how information in their business can be accessed, but we also respect user’s privacy. We don’t make users keep track of two separate logins if they use Evernote at work as well as on their own, because that’s the best experience, but it’s definitely been more challenging technically. At the beginning, most Business users were already using Evernote on their own, so we tailored Business to that. But as we grow, we need to think more and more about people who are getting introduced to Evernote because they work at a company that uses it.

 

What is the most satisfying part of your job?

I get to work on a product that helps millions of people. All the time, when people hear that I work at Evernote they tell me how much they love our product and what a difference it’s made for them. We’re really focused on getting features done quickly, so almost every week there’s something new in production that I worked on.

 

What is your background?

I went to RIT and I have a degree in Software Engineering, which is basically Applied CS. We focused a lot on software architecture and process, and had some really great professors that were actively working in the industry while they were teaching. While I was there, I focused on real-time and embedded systems, but decided it wasn’t really what I wanted to be doing.

I’ve been doing full-stack web development, along with some iOS on the side.

What’s your favorite Evernote feature?

I love Skitch. It’s so much easier than trying to explain things with words. Also OCR and Document Camera. Being able to haphazardly add pictures and always find them later is sort of amazing.

 

How do you use Evernote?

We do all of our work in Evernote Business, so we have notebooks for planning and design handoff that are shared across teams. I also use it to keep a reading list of clipped articles, and plan trips with shared notebooks. I’ve been getting really into home brewing, so I keep a notebook with recipes for everything I’ve brewed, along with notes about how everything turns out. If the homebrew supply store writes down a recipe for me, all I need to do is take a photo of it.

 

If you could only use 3 adjectives to describe Evernote’s culture what would they be?

Fast, challenging, fun.

 

What is the best part about working for Evernote?

I’m surrounded by really smart, motivated people.

 

Why did you choose to work at Evernote?

I’ve been an Evernote user since it first launched, so I jumped at the chance to come work here.

Leave a comment

Using DMARC to Fight Email Spam

Since January of this year, we’ve observed spammers launching campaigns using our name. Early versions included links to pharmaceutical sites, but later versions included malicious attachments.

The spammers started by addressing these emails from legitimate, non-Evernote email addresses, but were using a visible name that said “Evernote Service” like the email below:

Screen Shot 2014-08-14 at 3.13.07 PM

It didn’t take long for the spammers to change their methods and start impersonating legitimate @evernote.com addresses in the “From” field. If you were on the spammer’s list of email targets, you started receiving emails that looked like they came from one of our email addresses. We didn’t send them, but there was no obvious way for you to know.

We want you to be confident that emails from Evernote really come from us. We have made positive steps toward ensuring this by publishing an enforcing DMARC policy. Any email sent using a @evernote.com sender address must be cryptographically signed using DKIM and originate from an IP address we publish in our SPF record.

Not all email providers support DMARC, but many large mail service providers do. When they receive an email that tries to impersonate us, they will block it before it hits your inbox.

What is DMARC (and DKIM and SPF)?

DMARC is an email delivery policy that domain owners can publish to instruct mail servers how to handle email security violations for their domain. The action can be “none, quarantine, or reject” and you can set a sampling percentage so that you can ramp up your policy gradually.

To pass a DMARC policy check, the email must first contain a valid DKIM signature. DKIM uses public key cryptography to sign the email message, which allows the receiving server to verify it. The sending mail server signs using a private key and adds that signature as a header. The receiving mail server retrieves the public key from a DNS record and verifies the signature. Next, the receiving mail server verifies that email originates from an IP address listed in that domain’s Sender Policy Framework (SPF) DNS record. If either of these fail, the DMARC check will fail and the receiving server will take the action you specified in your DMARC policy.

The road to a reject policy

We rely on a lot of service providers for business functions like customer service tickets, recruiting, marketing, discussion forums, and corporate email. Tracking all of these down and getting them compliant with DMARC took a significant amount of effort. In some cases, we were unable to get them compliant and had to change our approach and turn off impersonation or route email through a secondary service provider that would DKIM sign on our behalf.

This isn’t meant to be a detailed HOWTO, but the main steps you should follow are:

  1.  Setup your DMARC reports email accounts (rua and ruf)
  2.  Publish a DMARC record with a policy of “none”
  3.  Test each of your service providers for DKIM signing
  4.  Verify each of your service providers is listed in SPF record
  5.  Review the RUA reports to identify service providers you may have missed
  6.  Update your DMARC policy to “quarantine” with a low percentage
  7.  Slowly increase your percentage to 100%
  8.  Change your DMARC policy to “reject”

The result for us is a DMARC record that looks like the following:

$ dig txt +short _dmarc.evernote.com
“v=DMARC1\; p=reject\; pct=100\; rua=mailto:dmarc@evernote.com\; ruf=mailto:dmarc-ruf@evernote.com\; fo=0:s”

Forwarding breaks deliverability

As a part of this process, we learned that forwarding can break DKIM and SPF and not all mail service providers are doing so in a way that supports DMARC. Let’s start with DMARC and canonicalization.

We were originally signing our service emails with a DKIM canonicalization of “simple/simple”. It turns out that “simple” doesn’t mean flexible and some email services would add blank space or line breaks that would cause the signature check to fail. A mail service provider clued us into this and we switched the canonicalization to “relaxed/relaxed”. That resolved many of the failures we were seeing due to failed message body hashing.

Forwarding also breaks SPF. Let’s take the example of you registering your Evernote account with a university email account (.edu). You want to continue delivering email there, but forward to a different account. If your email provider adheres strictly to RFC 5321, they won’t rewrite the “MAIL FROM” address. Instead it preserves the return-path header as it forwards it along. The destination mail server sees the return-path is an @evernote.com address, but sees the IP address of the .edu, which isn’t in our SPF record. The destination mail server rejects the message.

To address this issue, a number of email service providers have adopted Sender Rewrite Scheme (SRS). They rewrite the return-path to their own domain, validating the SPF check, and resulting in better email deliverability. A significant number of services don’t support this and forwarded emails from our service get rejected. If you are an email service provider and do not support SRS, you should strongly consider implementing it.

Leave a comment

Import Your Fotopedia Notes to Evernote

by Tom Charles, App Reviews @ Evernote

 

fotopedia / evernote

On Sunday, August 10, all user-uploaded photos will be erased from the servers of photo encyclopedia Fotopedia as the company ceases operations. In order to save your cherished memories, we’ve built a tool to easily transfer all your files from Fotopedia to Evernote.

To import your photos, head to fotopediatoevernote.com and follow the three steps listed. In doing so, a new “Fotopedia” notebook will be created, complete with an individual note for each photo in your Fotopedia account.

It’s always sad to see a valued company shut its doors, but we hope this tool can help mitigate the effects.

Related Articles:

Leave a comment

Evernote Strengthens Privacy Position with New Security Capabilities

We believe your data is yours and should be protected.  As part of that commitment, we’ve added two new encryption capabilities that improve the security of your data when it travels across our network and the Internet.  We’ve launched inter-data center encryption, which means we are encrypting the network links that connect our US data centers and are supporting STARTTLS for secure mail delivery to your Evernote account.

Inter-Data Center Encryption
We operate two data centers in the US and transmit data between them using a dedicated network link that isn’t connected to the Internet. Because we don’t own or operate that link, we decided to take extra steps to prevent unauthorized access to data – including note content – transmitted between data centers on this network connection. As a result, in April 2014 we enabled AES encryption for all traffic flowing between our US data centers.

Email encryption in transit (STARTTLS)
We give all Evernote users a way to create notes in their account by sending emails to a unique Evernote email address. Prior to enabling STARTTLS, emails you sent to our service were transmitted unencrypted across the Internet. With STARTTLS enabled, they are encrypted in transit if the sending service supports TLS. For example, all mail sent from gmail.com and yahoo.com accounts will now be encrypted. We also support TLS for outbound emails, which means that emails you receive from our service, such as password resets, are also encrypted in transit if your mail service provider supports TLS.

These new security capabilities complement our existing HTTPS and HTTP Strict Transport Security (HSTS) support to protect your data in transit from unwanted interception. We plan to continue improving our transport security posture to support our commitment to protecting your data.

Leave a comment

Inside Evernote: Kevin Fahy

What is your role on the Web team?

I am a developer. At Evernote, this means that you’re either writing code, bouncing ideas off of fellow developers, writing ideas on the walls, discussing requirements with PMs and designers, or figuring out what you broke with QA’s help. But most of the time you are writing code with minimal distractions. The few meetings we have are always useful to us. In our weekly sprint planning meeting, for example, our product manager makes sure that you’re taking on interesting projects that you genuinely want to work on – you never feel like just another resource here.

 What product(s) do you work on?

Evernote developer Kevin Fahy

Evernote developer Kevin Fahy

On the web team you often write code that users can find in multiple products. For example, if a user wants to upgrade to premium, then some of our apps display a webview that the web team implements and styles. And of course, all of our apps communicate with our cloud API, ultimately calling backend code that we develop and maintain. You can be as full stack as you like in the web team, and you get the chance to write code that’s used in many products. Personally, I spend most of my time developing our web application, where we implement new features, optimize the caching layer, and style visual components, among other things.

 Can you describe a few of the Evernote features you are working on now and/or you have helped build in the past?

 The cool thing about Evernote is that you get to code a lot of new features. In the web team, we push a new release every week, which usually contains new features. I’ve had the good fortune of being a part of some interesting ones. I implemented reminders in our web app in a team of three – we implemented everything in two fun weeks. While not a feature per se, I’ve been refactoring our web app’s caching layer recently – it’s always satisfying to push a big commit and then observe that the system runs a bit more efficiently, or at least no worse than before! Even my first project at Evernote turned out well: the web app’s image gallery (though a lot of credit goes to our designers – they make extremely well-designed mockups and are very easy to work with).

 What is your biggest challenge at present?

 Developing at Evernote is very fast-paced – you try to write good code for new features as fast as you can, while also fixing the most important bugs in your backlog and refactoring older code. We have a lot of freedom in what we choose to code everyday, and so prioritizing properly is very difficult.

 What is the most satisfying part of your job?

 Knowing that, every week, you are changing something about how a hundred million people use Evernote is exciting, even a bit scary sometimes. But the most satisfying part is that the code you push is ultimately yours – seeing your decisions and design choices live on production gives you a weekly shot of pride.

 What is your background?

 Evernote is my first full-time job out of school. I’ve been here for almost two years now, but even after my first few months I had felt like I learned a year’s worth of skills. Prior to Evernote, I had spent too much time in school pursuing an undergraduate computing science degree, but I had the opportunity to try out product management and development as an intern at a couple of great companies.

 Who has been your biggest mentor?

 There are many people at Evernote whom I look up to and have learned from, but it’s hard to label one person as a mentor. This is because of the flat structure at Evernote – as a developer, nobody stands over you and tells you what to do; instead, it is easy for you to simply turn around and ask somebody for advice. In the same way you take ownership over your projects at Evernote, you take ownership over your own learning. From interns all the way up to our CTO, I’ve learned something from everyone.

 What’s your favorite Evernote feature?

 OCR (Optical Character Recognition) is a killer feature for me. If I take a picture or scan a document, I can trust that it’ll always be searchable by its textual contents. This, combined with the web clipper, makes it easy for me to save everything that is important to me, no matter whether it’s physical or digital.

 How do you use Evernote?

 One of the coolest things about our product is that we support many different use cases. Even internally, we don’t all use Evernote the same. Personally, I cram as much stuff as I can into my account, e.g. pictures of receipts, scanned documents, random incoherent notes, more coherent but private journal entries, custom memes (created with Skitch!). I have many notebooks and tags that I apply to “important” notes, but a fair number of my notes get added to my default, unsorted “Heap” notebook. Even though I have a grand plan to someday organize all the notes in this notebook, Evernote’s search and “related notes” features make it easy to find these unorganized notes.

 If you could only use 3 adjectives to describe Evernote’s culture what would they be?

 Inspirational – you can listen to our CEO’s vision in weekly all-hands meetings, see beautiful mockups that our designers create, review elegant code that solves hard problems – there are many sources of inspiration at Evernote.

 Fast – without good documentation, you start forgetting about how your own code works pretty quickly because you write so much of it! New features, a backlog of bugs, TODOs and FIXMEs you want to get around to… there’s always something that you want to code.

 Empowering – you are always just a few keystrokes away from changing the way a hundred million people use Evernote.

 What is the best part about working for Evernote?

 As a developer, it feels like a startup because your team is small, you make many decisions behind the code that gets pushed to production, and you naturally take a lot of ownership over your work. But at the same time, we have a CEO with an awesome vision, and the resources to quickly make it a reality. We have fantastic PMs and designers who put new features in a really good state before we start talking about them as developers. We have a heroic ops team that makes it so that developers hardly ever worry about anything besides code. And we have a thorough QA team that will always find your bugs. The best part for me is having all of these resources at my disposal, while still feeling that I can hack away at interesting problems with a bunch of friends.

 Why did you choose to work at Evernote?

 I was first attracted by Evernote’s mission and business model – there’s a level of honesty about the way Evernote treats users’ data, and they are transparent about how they collect money from users. The company was also a good size for me – after experiences with larger companies in internships, I had really wanted to try working at a startup.

Leave a comment

Stages of Denial

At 2:33pm on Tuesday afternoon (Pacific Time), an attacker began a Distributed Denial of Service (DDoS) against Evernote’s servers. At normal times, Evernote receives around 0.4 Gbps of incoming traffic and transmits out around 1.2 Gbps. We have a diverse set of pipes to the Internet that can handle several times that volume. During this attack, we experienced over 35 Gbps of incoming traffic from a network of thousands of hosts/bots. This quantity of bogus traffic exceeded the aggregate capacity of our network links, which crowded out most legitimate users.

By 3:25pm, our Operations team was able to diagnose the problem and enable a DDoS mitigation service that we had previously established with CenturyLink, one of our network providers. This required moving traffic away from our other feeds to CenturyLink via BGP and then enabling mitigation for our main IP address. Their filters were able to remove virtually all of the bogus packets and permit normal user traffic to flow again.

A few minutes later, the attackers shifted the nature of the attack to send different types of network payloads and to target other addresses that happened to share common infrastructure. This resulted in a couple hours of back-and-forth between our network engineers and CenturyLink to adapt to each attack while minimizing the impact on legitimate users (and our own incident response).

As our network links recovered, we received an unusually high volume of pent-up sync activity. This traffic was about 80% higher than we would have seen during a comparable time on another day. The extended outage had caused many of our servers to expire various cached records, so this initial stampede of client traffic led to a temporary spike in query volume on our central accounts database that was unsustainable.

This required us to perform a rolling service restart to allow individual shards time to handle their users’ pent up synchronizations and repopulate their caches without overloading our accounts database. (This problem was not directly related to the DDoS, it was just an unanticipated side effect of having an hour-long outage followed by an immediate resumption of full service without tuning our database and Couchbase cluster for that scenario.)

The ongoing network attacks and service after-effects persisted for nearly three more hours until the last components were restored to full functionality at 6:14pm.

CenturyLink’s DDoS mitigation service was able to scrub out invalid traffic to restore access, but it took a while for us to enable and configure the solution. This process was a bit haphazard because we had not yet completed our deployment configuration and testing before the incident began. Our networking team had only recently contracted for this service, and they were carefully working through a deployment plan to ensure smooth operation during a future incident. All of the procedures and runbooks were still being drafted, so we hadn’t yet determined exactly which rules would need to be applied to block an attack while permitting all legitimate traffic.

Our final days of testing and configuration were compressed down to a few hours, so the initial DDoS mitigation heuristics were not tuned for our particular application characteristics. This was successful at scrubbing out virtually all of the bogus traffic, but led to a moderate level of “false positives,” which blocked some legitimate users (and partner services like Livescribe) from connecting to Evernote.

Over the following day, we saw another wave of network attacks, which were fully mitigated. Our network engineers worked with CenturyLink to incrementally refine our filtering heuristics to reduce the number of legitimate users that were blocked. As of 4pm Wednesday, we felt that we had addressed virtually all of the incorrect blockages to restore service to the remainder of our customers.

Post-Mortem

Overall, our Operations crew handled their DDoS trial-by-fire extremely well, but we have work ahead to minimize the disruption to our users in future incidents.

The network engineers get to complete the DDoS procedures, configurations, runbooks, automation, etc. so that they can trigger the full set of mitigations in minutes rather than hours. The systems group has a set of improvements planned to make the service handle “recovery stampedes” after extended outages more gracefully. And our client teams have a couple of tickets to reduce those stampedes in the first place.

Ultimately, we know that every minute of outage for the Evernote service may prevent important tasks for thousands of our users, so we will make every effort to reduce or eliminate the impact of such attacks in the future.

6 Comments

Securing Impala for analysts

We’ve previously described the Hadoop/Hive data warehouse we built in 2012 to store and process the HTTP access logs (450M records/day) and structured application event logs (170M events/day) that are generated by our service.

This setup is still working well for us, but we added Impala into our cluster last year to speed up ad hoc analytic queries. This led to the promised 4x reduction in query times, but access to data in the cluster was basically “all or nothing” … anyone who could make queries against the cluster would have visibility into every table, row, and column within the environment.

Our engineering team works hard to make sure that our logs don’t contain sensitive data or personally identifying information, but we always want to operate under the principle of least privilege for all access into our production systems and data. (E.g. Phil Libin has no logins to our admin/support tools and no permission to crawl our HTTP access logs.) This principle means that we had to restrict Impala query privileges to a very small handful of staff who absolutely needed to go back to the primary data sources.

Recently, we spent some time trying to figure out how we could give a slightly wider group of analysts the ability to access a subset of the data stored within Impala. A few constraints and goals:

  1. The analysts access our reporting environment through a VPN that performs strong, two-factor authentication and then restricts access to a minimal whitelist of IP:port endpoints in the environment. We’d like to enable Impala queries via the smallest possible expansion of that ACL (ideally, one new TCP port on one host).
  2. The analysts do not currently have or need any shell accounts on the Debian servers that run our Hadoop cluster, and we’d really prefer not to create Linux logins for them just to permit Impala queries.
  3. They should be able to perform Impala queries using their existing desktop SQL clients. (We use Razor due to its JDBC support.)

We flailed around for a couple of weeks trying to figure out some way to do this before stumbling across a solution using a mix of Hive/Impala views, SASL authentication to a local DB file, and user/group/role definitions via a Sentry policy file.

Hive/Impala views

Many databases rely on views to provide variable levels of access to data stored within tables. Access to a full table may be restricted, but you can create views giving access to a subset of the rows and/or columns in that table, and permit a different set of consumers to access those views.

Here’s an example table in Hive that contains a hypothetical sequence of events. Each event has an IP address, country code, client identifier, and an “action” that was performed by that client. The full ‘events’ table is in the database named ‘sensitive’, and we create two views in the ‘filtered’ database to give restricted access onto that table:

$ cat /tmp/events.csv
10.1.2.3,US,android,createNote
10.200.88.99,FR,windows,updateNote
10.1.2.3,US,android,updateNote
10.200.88.77,FR,ios,createNote
10.1.4.5,US,windows,updateTag

$ hive -S
hive> create database sensitive;
hive> create table sensitive.events (
    ip STRING, country STRING, client STRING, action STRING
  ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
hive> load data local inpath '/tmp/events.csv' overwrite into table sensitive.events;
hive> create database filtered;
hive> create view filtered.events as select country, client, action from sensitive.events;
hive> create view filtered.events_usonly as
  select * from filtered.events where country = 'US';

The ‘filtered.events’ view gives access to all rows, but removes access to the IP address column. The ‘filtered.events_usonly’ view further restricts access to only rows that have a country of ‘US’.

These views work great, but now we need to tell Impala to restrict groups of people to only access the correct views.

SASL username/password database

Impala’s daemon officially supports two mechanisms for authentication: kerberos and LDAP. We don’t particularly want to set up a Kerberos infrastructure, and it’s not clear how that would work for people who are just connecting to the Impala daemon over TCP from a SQL tool on their laptops.

We do have LDAP, but the current support in Impala is extremely preliminary (“only tested against Active Directory“) and we couldn’t get it to work against our TLS-only OpenLDAP infrastructure with the limited set of configuration options available today.

While flailing around with LDAP, we decided to try the undocumented –ldap_manual_config option. It turns out that if you tell Impala that it should perform authentication with LDAP (–enable_ldap_auth) using this “manual configuration” option, that means “don’t use LDAP at all, just try to match the client’s username+password against a BerkeleyDB file sitting at /etc/sasldb2.”

We created that file using the ‘saslpassd2‘ command to enter each username and password on our desired impala-server host. As an example, the following shows three different accounts being created in the sasldb2 file:

# saslpasswd2 sysadmin1
Password:
Again (for verification):
# saslpasswd2 analyst1
...
# saslpasswd2 analyst2
...

# ls -al /etc/sasldb2
-rw-r----- 1 root sasl 12288 Jun  4 17:26 /etc/sasldb2
# usermod -a -G sasl impala

(These usernames do not correspond with any shell accounts in /etc/passwd … they are a standalone authentication database.)

Sentry policy file

To specify the set of permissions for various groups of users, we need to tell Impala to use a Sentry policy file in HDFS. This file contains sections for mapping users into groups and groups onto roles. Roles specify which operations can be performed against which objects in Impala. Here we show our three example SASL users mapped into groups that can either perform any Impala query, perform SELECT operations against any of our ‘filtered’ views, or only SELECT from the ‘events_usonly’ view:

$ cat /tmp/impala-policy.ini
[groups]
sysadmins = any_operation
global_analysts = select_filtered
us_analysts = select_us
[roles]
any_operation = server=testimpala->db=*->table=*->action=*
select_filtered = server=testimpala->db=filtered->table=*->action=SELECT
select_us = server=testimpala->db=filtered->table=events_usonly->action=SELECT
[users]
sysadmin1 = sysadmins
analyst1 = global_analysts
analyst2 = us_analysts

$ hdfs dfs -put /tmp/impala-policy.ini /user/hive/warehouse/

Impala server arguments

Finally, we need to tell Impala’s daemon to use the SASL database for authentication and the Sentry policy file for authorization by adding the following arguments to IMPALA_SERVER_ARGS in /etc/default/impala:

-server_name=testimpala \
-authorization_policy_provider_class=org.apache.sentry.provider.file.LocalGroupResourceAuthorizationProvider \
-authorization_policy_file=/user/hive/warehouse/impala-policy.ini \
--enable_ldap_auth=true \
--ldap_manual_config=true \

Then restart the Impala daemons on that host and confirm that there are no errors in /var/log/impala/*

Testing …

Using the impala-shell command-line query tool, we can now confirm that the ‘sysadmin1′ user can query the sensitive source table:

$ impala-shell --quiet -l -u sysadmin1
LDAP password for sysadmin1:
[debian-virtualbox.rwc.etonreve.com:21000] > select * from sensitive.events;
+--------------+---------+---------+------------+
| ip           | country | client  | action     |
+--------------+---------+---------+------------+
| 10.1.2.3     | US      | android | createNote |
| 10.200.88.99 | FR      | windows | updateNote |
| 10.1.2.3     | US      | android | updateNote |
| 10.200.88.77 | FR      | ios     | createNote |
| 10.1.4.5     | US      | windows | updateTag  |
+--------------+---------+---------+------------+

The first analyst can’t query that table, but can use the ‘filtered.events’ view to see everything but the IP addresses:

$ impala-shell --quiet -l -u analyst1
LDAP password for analyst1:
[testimpala:21000] > select * from sensitive.events;
ERROR: AuthorizationException: User 'analyst1' does not have privileges to
execute 'SELECT' on: sensitive.events
[testimpala:21000] > select * from filtered.events;
+---------+---------+------------+
| country | client  | action     |
+---------+---------+------------+
| US      | android | createNote |
| FR      | windows | updateNote |
| US      | android | updateNote |
| FR      | ios     | createNote |
| US      | windows | updateTag  |
+---------+---------+------------+

And the second analyst can only see the US events:

[testimpala:21000] > $ impala-shell --quiet -l -u analyst2
LDAP password for analyst2:
[testimpala:21000] > select * from filtered.events;
ERROR: AuthorizationException: User 'analyst2' does not have privileges to
execute 'SELECT' on: filtered.events
[testimpala:21000] > select * from filtered.events_usonly;
+---------+---------+------------+
| country | client  | action     |
+---------+---------+------------+
| US      | android | createNote |
| US      | android | updateNote |
| US      | windows | updateTag  |
+---------+---------+------------+
[testimpala:21000] > select client, count(*) as c from filtered.events_usonly group by 1;
+---------+---+
| client  | c |
+---------+---+
| android | 2 |
| windows | 1 |
+---------+---+

We also use this via the Impala/Hive driver from Razor and JasperServer via the JDBC URL (jdbc:hive2://testimpala:21050/).

Futures…

The ‘sasldb2′ file is not a perfect long-term solution. There’s no UI for self-management of passwords by our analysts, so a sysadmin will need to help them every time they want to change their password. The flat file representation relies on root security on the box, so obviously wouldn’t be appropriate in otherwise-poorly-secured environments.

We’re sure that the LDAP capabilities will improve over time, although I’d be a bit nervous about using LDAP passwords for database connectivity, since desktop tools with SQL integrations would tend to manage and store passwords insecurely.

The same applies for the Sentry policy file. Manually loading this into HDFS whenever we add a user is manageable for now, but not a long-term solution. We could reduce the churn by creating OS-level accounts in OS-level groups and leveraging those, but that’s replacing one clunky group management solution with another.

We couldn’t figure out how to get Hue to talk to Impala properly with SASL+Sentry enabled, so we currently have Hue/Impala configured to talk to the Impala daemon on one of our data nodes, which does not have this enabled. (We’re using network-level ACLs for isolation for the time being.)

But, overall, this solution will meet our needs for a year or two while we’re still dealing with access from only a small number of analysts.

2 Comments