Pathogens & Disease Immunity | Toronto Open Science Code Sprint

Join us in Toronto this March at our first ever Mozilla Toronto Open Science Code Sprint on Sat/Sun March 07-08 2015. At this 2-day event we will be bringing together researchers working on open science projects with developers, designers and other scientists from the community to collaborate on tools helping further science on the web. Every day this week, we’ll feature a guest post on one of the projects we’ll be sprinting on: Contributorship Badges, Cytoscape.js, Matplotdash, Pathogens & Disease Immunity, WormBase.

Today’s guest post is from Madeleine Bonsma, an MSc student in Dr. Sid Goyal’s group at the University of Toronto. Don’t forget to register to collaborate with us on open science!

Microorganisms are all around us, and they have profound impacts on their environments in ways we are only beginning to understand. An individual human is host to more microorganisms than human cells, and research has linked our microbiome to such things as immune disorders, cancer, and obesity. Thanks to improved sequencing technology, much progress has been made in characterizing microbiomes in many environments, but of the available data, only a small fraction has been utilized to its full potential.

Bacteria and bacteria-targeting viruses (called bacteriophages) are co-evolving under intense pressure, each trying to out-manoeuver the other in a constant evolutionary arms race. Just under ten years ago, an adaptive immune system was discovered in bacteria, called CRISPR (clustered regularly interspaced short palindromic repeats). Bacteria possessing the CRISPR system incorporate small pieces of phage DNA in their own genome, allowing them to easily recognize and target repeat phage invaders.

The Pathogens and Disease Immunity project is seeking to track down and organize publicly available CRISPR data that provides an “immunization record” of interactions between bacteria and phages. The goal of this project is to make a database of CRISPR genome regions across many bacteria and archaea. Such a database will be a valuable starting point for a host of researchers studying microbiomes, enabling them to find patterns across large amounts of data and a wide variety of organisms and environments.

The main challenge of this project is to consolidate data from multiple sources and to automate data gathering. There may also be genetic data assembly challenges, depending on specific data sets. The input of developers, testers, and bioinformaticians would all be welcome. For more information on this project and on previous work in the same theme, visit and

Madeleine Bonsma is an MSc student in Dr. Sid Goyal’s group at the University of Toronto. She loves biology, physics, data analysis, and writing short scripts in Python, but she knows hardly anything about development and bioinformatics. She is excited to make connections in the wider community of people who love science.