Applications

The Herding and Migrating of Sample Code

Posted by Brett Kelly on 09 Apr 2013

Posted by Brett Kelly on 09 Apr 2013

Comment

Here at Evernote, we’re big fans of Github (and Git in general, really). Github hosts all of our SDKs and makes it easy for our developer community to help us produce the best software we can. Plus, who doesn’t love Octocat? Nobody, that’s who.

We also make rather heavy use of Github’s Gist service for hosting code samples to embed in our documentation. If you’ve read through any of our Quick-start Guides or Core Concepts articles, you’ve probably seen that they’re peppered with Gists. This makes us happy.

There was a slight problem, though: as I started producing more documentation and sample code, I was using my personal Github account to create the Gists (upwards of one hundred of them). Wanting to keep our Bus Factor as high as possible in this situation, we decided it would be best to move all of the relevant Gists from my personal Github account to one owned and operated by Evernote. After several lively meetings, we decided on evernotegists (since Gists can’t be owned by an organization).

Having just recently performed this migration, I thought it might be fun to share how I did it.

The Requirements

As I mentioned, we have oodles of these gists spread throughout the Evernote Developer website. So, in a nutshell, my migration solution had to satisfy the following requirements:

  1. Locate all Gists throughout the developer website.
  2. Recreate each Gist belonging to my personal Github account under the evernotegists Github account.
  3. Replace each old Gist URL on our site with the new Gist URL generated in the previous step.

Here’s how I did it.

Gist-gathering with Grep

Using the old Unix stalwart grep, I collected all URLs from the site that began with https://gist.:

grep -r -h -o "https:\/\/gist[^\"]*" *.php . > gisturls.txt

After a quick spot-check to make sure the data looked good (and removing a couple of matches I hadn’t intended to be included in the migration), I had a complete list of all of the Gists that appear on our site.

Gist duplication with Python

This is where the bulk of the work—and magic—happens. Python is where I cut my programmer teeth a little over a decade ago and is the language I know best, so it was the obvious choice for this particular task. Here’s a quick breakdown of the semi-quick and not-overly-dirty solution (which you can view, rather poetically, as a gist):

  • Authenticate with Github’s API using a stored username and password; this will result in an authentication token that will be used for subsequent operations.
  • Read in the list of Gist URLs collected via grep earlier.
  • For each URL in the list:
    • Extract the Gist ID (either a number or an alphanumeric string) from the URL (e.g., http://gists.gitub.com/12345.js would produce an ID of 12345).
    • Grab the Gist from Github’s API and get the owning user, filename and file contents. We’ll use this info to generate the new Gist.
    • Create a new Gist with exactly the same file (name and contents).
    • Save the new Gist URL in a Python dictionary where it’s mapped to the original Gist URL
  • Write the mapping of old to new Gist URLs to a simple text file, one pair of URLs per line, separated by a single space.

This whole operation results in a text file whose contents look like this:

https://gist.github.com/12345.js https://gist.github.com/67890.js

The single remaining step was to do a site-wide search and replace for each URL in our site’s source code…

Vim regular expressions and Perl one-liners

With a large number of subsitutions ahead of me, potentially spread across ~170 PHP files, I had a couple of options:

  1. Write one script to handle all of the substituions in an efficient, robust manner.
  2. Employ a slightly more brutish tactic that would get the job done more quickly, despite making non-trivial concessions in terms of computational efficiency.

I went with option 2.

Using a couple of regular expression-powered search/replaces in my trusty Vim, I turned this line:

https://gist.github.com/12345.js https://gist.github.com/67890.js

Into this:

perl -pi -e ‘s{https://gist.github.com/12345.js}{https://gist.github.com/67890.js}g’ `find . -name '*.php'`

The Vim regex operations used:

  • :%s/^/perl -pi -e 's{/g — replace the beginning of each line with an invocation of perl with a couple of flags and begin the regular expression delimiter (curly braces, in this case).
  • :%s/js /js}{/g — replace the space between the two URLs with closing/opening curly braces.
  • :%s/$/}g' `find . -name '*.php'`/g  — add the closing regex delimiter to the end of the line plus a backtick-wrapped call to the find command.

After doing this globally to each line of my Gists file, I had a huge list of Perl one-liners that would search all of the PHP files within the site for the old Gist URL and replace it with the new one:
perl -pi -e 's{https://gist.github.com/inkedmn/4725142.js}{https://gist.github.com/5313509.js}g' `find . -name '*.php'`
perl -pi -e 's{https://gist.github.com/inkedmn/4732642.js}{https://gist.github.com/5313510.js}g' `find . -name '*.php'`
perl -pi -e 's{https://gist.github.com/inkedmn/4741551.js}{https://gist.github.com/5313511.js}g' `find . -name '*.php'`

This file became replaceGistUrls.sh.

Quasi-inelegant? Maybe. Slower than it could be? Absolutely. But, the whole thing took a couple of minutes to run and did the job it was made to do. Success!

View more stories in 'Applications'

One comment RSS