phageParser: analyzing open genetic data | #mozsprint 2016

Madeleine Bonsma is a PhD student in the Department of Physics at the University of Toronto. A fellow Torontonian, we met at the Toronto Open Science Code Sprint last year when Madeleine brought phageParser. phageParser helps us understand how bacteria and archaea use CRISPR systems in natural environments.

Since then, phageParer has grown and Madeleine launched UofT coders, a Mozilla Science Study Group. I interviewed Madeleine to learn more about phageParser, CRISPR, and how you can help during our Global Sprint 2016, June 2-3.

mozsprint_interview

 

First, what is CRISPR?

CRISPR has gained a lot of attention in the last three years because it has inspired a powerful DNA-editing technology called CRISPR-Cas9. CRISPR systems are also interesting from a more fundamental science perspective: bacteria or archaea that have CRISPR systems can “remember” viruses that have attacked them in the past by storing bits of virus DNA in their own genome.

This means that the CRISPR regions of bacteria and archaea can teach us a lot about the kind of environment those organisms came from and what kinds of foreign DNA attacks they were seeing. Comparing CRISPR systems across many organisms can also help us see if any are using CRISPR in ways we didn’t expect, or if any lifestyle features may be shared between very diverse and unrelated organisms.

CRISPR_locus_diagram

What is phageParser and how did you start working on it?

We noticed that there are many published bacterial and archaeal genomes that have CRISPR systems, and they could be a huge resource for finding patterns across the many examples of CRISPR. phageParser is a tool to gather information that’s already out there related to CRISPR systems and present it in a way that other researchers and interested people can work with and use.

My supervisor, Sid Goyal, launched the project in 2014 with the help of Bill Mills. I got involved in February 2015 and things really got going at the Toronto Open Science Code Sprint, where several contributors helped get a first version of our processing workflow going.

How will phageParser help research be more collaborative and accessible?

Almost all the data we hope to incorporate into this database is already publicly available, but this tool will enable researchers without advanced bioinformatics knowledge to effectively use that data. Collaboration is already occurring in the development of the project, and we expect that being able to re-use published data more effectively will create more research and collaboration opportunities as well.

What kind of open data are you working with?

Most of the data comes from the National Center for Biotechnology Information (NCBI). We are also making use of open software such as CRISPRfinder and BLAST.

What problems have you run into while working on phageParser?

An ongoing challenge is to make sure the project goals align with what people would actually want to use. Talking to CRISPR researchers has helped us reframe the project to be more user-friendly, with most of the analysis happening behind the scenes. My own lack of expertise in almost all the areas covered has been a difficulty but also a great opportunity to learn from collaborators. On a more technical note, it can be hard to deal with changing standards of data formatting, and it is often hard to decide how best to approach a particular step of the analysis.

What kind of skills do I need to help you build phageParser?

We’re looking for all kinds of skills from database design and management to user interface design, writing bioinformatics scripts in Python, interacting with APIs, consulting on scientific aims, writing documentation, and more.

How can others help you during the Mozilla Science Global Sprint, June 2-3?

During the Global Sprint we’ll be working on things from all the categories mentioned above, and specifically we’ll probably be looking at assembling a database and linking it to a GUI.

Bonus question: Where is your Research Fox sticker?

I have one on my laptop and one on my desk drawer!
IMG_20160419_113428046
IMG_20160419_113458998

 

Come join us wherever you are June 2-3 at the Mozilla Science Global Sprint to work on phageParser and pick up your own Research Fox! Have your own project or want to host a site? Submissions are open for projects and site hosts.