As a network engineer responsible for Evernote’s data center infrastructure, I help make sure the network is fast, secure, and always running. I also help debug intermittent and obscure network problems that affect a population of users, like synchronization failures and timeouts. For these types of issues, we usually need to take a closer look at the interaction between the client and server. Unfortunately, in distributed web service environments like Evernote, there are a lot of places in the request chain where an issue can occur. After gathering the basic information about the client and how the problem is triggered, I usually start by taking a packet capture from the client point of view to see what is happening at the TCP/IP level.
Three Laws of Data Protection
Before I go any further, I want to assure readers that Evernote takes great care protecting user data. The packet captures we work with are used only to help solve problems with the service. Most of the time, we perform this analysis in our sandbox or staging environment, which runs on a completely separate network. This allows us to run tests from our developer clients without involving traffic containing real user data. For the times we need captures in our Production environment, we work primarily with the encrypted traffic between the client and our load balancer SSL virtual IP.
Client-Side Packet Captures
Most of the time it is easy to just use the mobile client SDK and simulate traffic from a developer workstation. Sending the client through a debugging proxy, like Charles or mitmproxy, is also effective. This is usually enough if you only need to see application level activity from the client. But there are times when abnormal behavior only shows up from a real mobile device, especially if the problem is at the network layer.
Getting a client-side packet capture from a desktop or laptop computer is fairly straightforward. You can just install a packet analyzer or run tcpdump if it’s available on the operating system. But when I first started with mobile clients, getting a reliable capture took a little more setup than usual. There are ways to install packet capturing applications by jailbreaking the device, but I went with some practical options that worked out well.
Enabling a Wi-Fi hotspot on a laptop running Wireshark is the simplest approach I found. There are a number of articles describing how to set this up step-by-step on both Mac OS X and Windows, so I won’t cover it in detail here. Here is my setup:
- Macbook Pro 2011, 2.2 GHz i7, 8GB RAM, SSD
- Mac OS X Lion
- Xcode, latest version, from Mac OS X App Store or from https://developer.apple.com/downloads/
- Hardware IO Tools for Xcode, which has the Network Link Conditioner, also from https://developer.apple.com/downloads/
- X11, required by Wireshark on Mac OS X
This allows Wi-Fi clients I am debugging to connect to my laptop’s SSID. Wireshark monitors traffic on the Macbook’s wireless adapter, normally identified as en1, and client traffic is forwarded out the wired adapter. It is also convenient to use the Network Link Conditioner to simulate high latency, limited bandwidth, and lossy networks.
There are a few drawbacks with this approach. First, Internet Sharing on Mac OS X only supports Open and WEP security, which is not ideal if you are on a secure corporate network. Second, traffic from the client can compete with the laptop’s own network activity across the wired port. So it’s a good idea to stop or limit background applications that might try to use the network. Third, Internet Sharing introduces an additional layer of NAT, which can make tracing the end-to-end connection back to the server a bit more challenging. I often have to follow the packet flow all the way to the server to make sure nothing in between is affecting the communication.
Another approach is to set up a dedicated Wi-Fi access point hooked up to either to a network tap or mirrored switch port. A copy of all traffic through the access point’s wired port is forwarded by the network tap to a packet analyzer. Excluding the router or firewall connection to the Internet, this gets as close as you can get to a mobile client on Wi-Fi without an intermediate host or proxy involved in the traffic flow. I am in the process of getting this set up in our office to use whenever we need to debug a problem on any Wi-Fi capable device. Here is a simple diagram on the planned setup.
Mobile Network Debugging
Unfortunately, capturing traffic from a device on Wi-Fi does not always catch problems that occur on mobile networks. Packet loss, spotty latency, and out-of-order TCP segments are common in many of the packet captures I observed over 3G and 4G networks. In addition, carriers are still assigning private IPv4 addresses on their networks with Carrier Grade NAT. As a result, the publicly routable IP that arrives at the server tends to change, making it even more difficult to fully trace an end-to-end TCP conversation.
Without sniffing cellular airwaves, iOS has a feature that creates a remote virtual interface and basically mirrors all network traffic over a USB connection to a workstation. Here’s a useful article that shows this step-by-step. With the same Macbook Pro setup I described earlier, it becomes just as easy to get packet captures with Wireshark monitoring the rvi0 interface. I have not found a similar solution for Android yet, other than installing a packet capture app that writes to the device’s local storage.
These packet capture techniques help immensely when debugging network problems between Evernote clients and servers. The data allows us to isolate whether a bug is triggered by Evernote software, the mobile operating system, network factors over the Internet, or components in our data center. I plan on writing a follow up that describes some of the more recent issues I have worked on where packet captures helped us resolve troublesome problems.