Matplotdash | Toronto Open Science Code Sprint

Join us in Toronto this March at our first ever Mozilla Toronto Open Science Code Sprint on Sat/Sun March 07-08 2015. At this 2-day event we will be bringing together researchers working on open science projects with developers, designers and other scientists from the community to collaborate on tools helping further science on the web. Every day this week, we’ll feature a guest post on one of the projects we’ll be sprinting on: Contributorship Badges, Cytoscape.js, Matplotdash, Pathogens & Disease Immunity, WormBase.

Today’s guest post is from Chris Ing, a PhD candidate at University of Toronto working at the Hospital for Sick Children. Don’t forget to register to collaborate with us on open science!


Who are you?

I’m Chris Ing, a PhD candidate at University of Toronto working at the Hospital for Sick Children. I study the dynamic motions of proteins using simulations that run on supercomputers. A simulation gives you a “virtual microscope” to get a glimpse at the 3D structure and function of proteins at an atomistic level. By running a simulation for months or years, computational biophysicists can study how disease and drugs affect proteins in a way that is not accessible by any lab bench experiment. Over the course of my studies, I’ve made it a personal mission to use as much of the scientific python stack as possible to accelerate my research.

What is your project?

My project is an analytics dashboard designed for real-time visualization of scientific data. Analytics dashboards are ubiquitous in business and marketing analytics, made famous by the boom of software start-ups. However, these dashboard solutions are either closed-source “software as a service” solutions, or not designed with scientific visualization in mind. The aim of this project is to develop a science-agnostic web dashboard to assist computational scientists in performing routine analysis and rare event detection with arbitrary time series data sources.

Why do you enjoy working on this project?

Biomolecular simulations are inherently high-dimensional datasets, but they can be reduced to extremely simple metrics for meaningful monitoring. It’s enjoyable to work on this project because our lab can immediately reap the benefits. With the availability of large-scale computing resources, a significant portion of my days are spent manually doing routine analysis and plotting for a number of datasets. This is a task completely devoid of scientific value, and begs for automation.

How does this project benefit the research community?

Even with basic functionality, Matplotdash could greatly improve research efficiency over a range of scientific disciplines. A dashboard saves time by regenerating plots and summarizing a collection of data in one view, but it also encourages a more open and transparent form of research. There’s no question that lead scientists and principal investigators would be interested in tracking simulation progress, but public dashboard views may also be possible. Future functionality of this software could allow web visitors to share in the task of monitoring, potentially flagging unusual or anomalous regions of plots, assisting research. More generally, unexpected insight may be extracted from a dataset by observing its time evolution rather than analyzing it once as a static entity.

What is your technical stack (if you have one)?

The project is codenamed “Matplotdash” as an homage to matplotlib and the general python-driven nature of the project. Python is my language of choice, and I’ll be attending PyCon 2015 later this year if you want to find me. However, in this project, Python resides on the back-end with the RESTful APIs that provide our dashboard with data. A bootstrap based dashboard framework called KeenIO (https://github.com/keen/dashboards)/Django or perhaps just Django-Dashing (https://github.com/talpor/django-dashing) will be used on the front end. As we must exceed the limitations of a traditional dashboard, support for widgets with one or more large data-set visualization libraries would be required. Those packages include: Bokeh+Bokeh Server (http://bokeh.pydata.org/en/latest/), VisPy (http://vispy.org/), and Lightning-Viz (http://lightning-viz.org/).

What do you hope to accomplish during the code sprint?

Getting a working prototype is certainly achievable within the timeframe of a sprint. It’s unlikely that is project could be deployed on a real compute cluster during the sprint, but remote plot/data servers with test data could be used to illustrate the functionality of the code.

What kind of volunteers are you looking for (developers, designers, biologists, etc)?

Developers, designers, and scientists would all be appreciated.

How can people get involved? (link to repo, issue tracker, etc)

People can get involved by watching our code repository (https://github.com/pomeslab/matplotdash) and keeping an eye on our issues to help us make some better informed design decisions at this early stage of development. Note that we haven’t entirely transitioned away from the previous project name: HPCDash.