Planning for Data Reuse

If you are a regular reader of this blog, you are well aware of our Mozilla Fellows for Science (see recent blog posts about their work) and the Working Open Workshop (WOW) we recently hosted in Berlin. I am also leading work on building out training and resources to support data sharing and reuse. Over the past few months, we’ve begun testing some of our initial work with the Fellows and workshop participants. Below is a recap of some of the work (with links to the resources) and what we’ve learned.

Mozilla Science Fellow Christie Bahlai has been developing a course on “Open Science and Reproducible Research” (read more about this project and her goals here).  Members of the Mozilla Science Lab team have been invited to lead various sessions of this course and in January I led the session on “Understanding Other People’s Data”.

This class provided an opportunity to test our first pass at a Data Reuse Plan exercise, a twist on the “data management plan”. (For those outside the world of research grants, a data management plan is a brief document explaining how data will be shared and is required by many funding agencies when researchers ask for money for a research project.) The exercise is designed to walk someone through the process of thinking about her or his data through the eyes of someone seeing it for the first time.  It focuses on what is needed for reuse of the data by asking the researcher to describe their data in terms of:  who collected the data, what was collected, as well as where, when, why and how.  This “metadata” is written up in a text file that is kept with the data files to assist other researchers in using the data.

Remote view of graduate students in Christie's class

Remote view of graduate students in Christie’s class

This class was also the impetus for the development of a quick info sheet on Challenges to Open Data (and how to respond).  The class had a lively discussion with students voicing their own experiences with resistance to open data, several of which corresponded to examples in the handout.

Christie presenting on open data challenges at WOW.

Christie presenting on open data challenges at WOW.

Incorporating the feedback from the test-run with the class, Christie and I co-taught this exercise at WOW in February.  Participants joined from all over the globe, spanning four continents, and consisted of developers, researchers and community organizers. Participants’ daily work with data varied, some focusing more on software development, but each having experience collecting, curating or analysing data. Feedback was positive, with one researcher asking if his research group could use the template as a checklist for the data they store on their servers.  The experience has me thinking about how to modify the exercise to broaden it to include reuse of project materials rather than just reuse of data.

Data planning is more efficient than data forensics.

Data planning is more efficient than data forensics.

Finally, Stephanie Wykstra at Innovations for Poverty Action and I put out a call for data reuse case studies.  This research will result in a report that provides insight into how and which open data gets reused as well as identifying where there are challenges and barriers and how they can be resolved.  You can read this blog post to learn more as well as submit your story of data reuse.

We’re developing plans for future events and open data curriculum materials and resources.  What support would you like to see for open data?  What are your thoughts on the data reuse plan exercise?   Tweet us or email us with your thoughts and ideas.  In the meantime, feel free to look at the materials we’ve already developed in our Open Data Training repository and use them for your own purposes.