Sharing Our Story of Research Data Reuse

As you may recall, back in February, Stephanie Wykstra and I put out a call asking you to share your stories of research data reuse. That call closed on March 10th and we promised to report back what we learned. Well, we did learn, though not what we expected or in the areas we hoped. This blog post is a condensed version of a more thorough blog post written by Stephanie and posted on the Berkeley Initiative for Transparency in the Social Sciences blog. Head over there for more details on the results of our call.

We received 14 responses to our call, including 10 responses to our survey and 4 emailed responses. There weren’t enough responses complete with the information we needed to learn a great deal in terms of cases of real world reuse and what made the data particularly useful, however we wanted to share what we were able to learn, for two reasons:

This call and response could be informative to those who are considering putting out a similar survey or doing related research, and
We think our findings do provide some evidence which confirms our initial feeling, which is that this is an area which warrants further work and research.

The Responses

Our 10 survey respondents came from political science, psychology, education and biochemistry. Of those, only three examples were explicitly of the kind that we had requested e.g. data that had been collected by other researchers for their study, and then reused for further research. (You can see our spreadsheet of de-identified responses with more details here.)

Though they didn’t fit our purposes for this particular project, the helpful folks in this community gave us some great resources that may be of use to others doing research in this area:

Global Biodiversity Information Facility (GBIF), a database on global biodiversity
International Polar Year (IPY), a coordination of research on the Polar regions
Uppsala Conflict Data Program and the Correlates of War Program. Both sites offer data which are widely used by scholars within international relations and include variables which are constructed by scholars for their own research and then submitted to the databases for others to reuse
Dissemination Information Packages for Information Reuse (DIPIR) is a study of data reuse in three communities (quantitative social scientists, archaeologists, and zoologists)
ICPSR’s bibliography of data-related literature, which is a searchable database of “over 70,000 citations of published and unpublished works resulting from analyses of data held in the ICPSR archive.”
UK Data Archive’s list of case studies of data reuse

We were also given examples of other cases that could be found in databases such as ArrayExpress and Protein Data Bank, as well as government data from Open Data Toronto and Statistics Canada.

Room for Further Investigation

Our collection of data reuse cases from our call was quite small and there could be several reasons for that.

Maybe this was due to our methods of marketing the call? We blogged, tweeted and emailed listservs but maybe we weren’t getting a far enough reach or it was the wrong methods to reach the audience we needed for stories.
Did we not leave the call open long enough? We thought February to March would be a less busy time of year but perhaps we just needed to leave the call open longer and market it more.
Was our ask too onerous? Or perhaps too specific? Is this an area where folks just aren’t doing much data reuse?

We know the data-sharing movement is gaining steam. From funders requiring data-sharing to new guidelines for journals and journal requirements, to the rise of many data repositories, there is plenty of effort going into requiring and supporting data-sharing. Yet there are huge issues to confront, as we move forward. One of the biggest is how to change from a culture in which data sharing is viewed with fear and skepticism to one where data sharing is the norm among researchers.

Cartoon illustrated by John R. McKiernan. Found on whyopenresearch.org

The question of how to promote and encourage data reuse is of clear importance. Yet, as practitioners in the open science movement, we have many questions.

When it comes to re-using data from colleagues’ studies, what factors make datasets particularly helpful to researchers?
What challenges arise in re-using data?
As data curators and open data advocates, what could we do better to facilitate reuse?
Is there something we can do to encourage others to look at and reuse existing data when they are considering new research projects?
How can we increase opportunities and decrease barriers?

Stephanie and I would love to see someone take our experience with this foray into researching data reuse and run with it and make it better (bigger?). Perhaps there are lessons that can be gleaned from the reuse case studies collected by the UK Data Archive or ICPSR mentioned above. Although those are collections of larger studies from highly curated data sets, there may be some answers to our questions hidden in there. What do you think? What should be the next steps toward moving this research forward? Tell us your thoughts in the comments below or through Twitter (Steph Wright, Stephanie Wykstra). Even better, start discussions around these questions in your own community and let us know what you learn.

Mozilla Foundation

Mozilla Foundation Blog Archive

Sharing Our Story of Research Data Reuse

The Responses

Room for Further Investigation