An Introduction to Applied Bioinformatics at #mozsprint

This is a guest blog post by Greg Caporaso, Assistant Professor at Northern Arizona University, author of An Introduction to Applied Bioinformatics, and the project lead of scikit-bio. For more information on these projects, follow Greg on Twitter.

My group recently participated in the Mozilla Science Lab Global Sprint, where our goal was to improve support for Jupyter (IPython) Notebook-based books, with a focus on our Sloan-Foundation-funded text, An Introduction to Applied Bioinformatics (or IAB). We primarily wanted to simplify the workflow for getting and reviewing content modifications, which has been challenging for a couple of reasons.

First, many IAB readers are novice programmers. While a great learning experience, the process of forking a GitHub repository, adding and committing changes, and issuing pull requests is a barrier for some readers (who are trying to learn bioinformatics, not necessarily git, after all). So, I wasn’t getting as many content modifications submitted as I was hoping.

Next, because all of the content of IAB was developed as Jupyter Notebooks, it was very hard to review those changes. The notebooks, being written in JSON, are difficult to diff in a way that supports quickly understanding what has changed. So, when I did get submissions, this translated into me not reviewing submissions in a timely manner, which in turn would let merge conflicts would sneak in. Because diffs are hard, resolving merge conflicts was also hard. You can imagine the problem…

We made a lot of progress toward simplifying the submission and review workflow during the sprint. We addressed this by modifying the content to be written in markdown, and then using ipymd (which we helped get pip-installable during the sprint) to convert that content into Jupyter Notebooks at build time. So now instead of IAB’s content being written in Jupyter Notebooks, it’s written in markdown so diff’ing changes is now just a diff of markdown. There are other obvious benefits as well including that I can now edit content directly with my favorite text editor, spell checking is easier, and I can ultimately publish in formats other than ipynb if we ever wanted to. The markdown files can also be written at different levels of granularity: a long chapter can have its markdown split into different files, while a short (e.g., single chapter) unit could be condensed into a single markdown file. This also reduces the the likelihood of merge conflicts.

Now this is the really cool part: When we build the notebooks, we insert wikipedia-like edit links that take readers directly to the GitHub online editor where they can make their change and submit a pull request all without using the command line, knowing git, etc. While reading, if they come across content they’d like to change (for example, a typo) they click a link, and can edit the text and submit their changes. You can watch a 5-minute screencast where I illustrate the process of submitting a change here.

Taken together, this means it’s now easy for users to submit changes, and for me to review and merge changes, so it should open the door to a lot more content submissions and fixes from users. These changes are all integrated in the 0.1.1 release of IAB.

Our IAB build scripts (where the edit link insertion and markdown to Jupyter Notebook conversions happen) are currently very clunky, but we plan to clean these up and make them more general if there is interest from others developing books with Jupyter Notebooks.

These developments were a team effort between myself, Evan Bolyen, Jai Rideout, John Chase, Anders Pitman, Arron Shiffer, and William Mercurio, our team during the Mozilla Science Lab Global Sprint. We’d like to thank Kaitlin Thaney and Bill Mills for getting us involved in the sprint, Cyrille Rossant for his work on the ipymd package, which is a large part of what makes this possible, Fernando Perez and Min RK for pointing us ipymd (and of course for Project Jupyter itself), and the Alfred P Sloan Foundation for its support of An Introduction to Applied Bioinformatics.