Code as a research object: (new phase) standardizing software metadata

At the Science Lab, we want to help research thrive on the open web. Part of this is working with other community members to build technical prototypes that move science on the web forward. Earlier this year we saw several prototypes come out of the ‘Code as a Research Object’ collaboration. Since then, there’s been more conversation and effort in this space and we wanted to share our progress and invite the community to give input.

First, a quick look at ‘Code as a Research Object’

Late last year, “Code as a Research Object” was first announced as a new collaboration between the Science Lab, GitHub, figshare and Zenodo to help explore how to better integrate code and scientific software into the scholarly workflow. Since then, we’ve seen community members come together to build prototypes allowing users to easily get a DOI for their code, making it citable and easier to incorporate into the existing credit system.

Next Steps: Standardizing Metadata

At the NCEAS Open Science CodeFest, Sept 2014
NCEAS Open Science CodeFest, September 2014

Coming into the conversation, there’s still room for best practices for code reuse and citation. In particular, some form of standardized metadata would help other repositories understand how they can integrate with current systems.

At the NCEAS Open Science CodeFest, Sept 2014
Matt Jones at OSCodeFest, September 2014

When I was at NCEAS Open Science CodeFest (OSCodeFest) last month, I led a discussion around the work being done here. I was joined by Matt Jones, Carly Strasser and Corinna Gries, and we agreed that some standards need to be set to help more groups store software in a citable and interoperable manner.

Building on the existing discussions and proposals in the community, we compared the exiting schemas for code storage to help create a metadata standard that allows for discoverability, reuse and citation. You can see the notes from our discussion here.

This led to the creation of the codemeta GitHub repo to store a minimal metadata schema for science software in code in JSON-LD and XML. Since then, we’ve worked on refining the proposed metadata schema and creating mappings between some existing popular data stores. Coming soon: Matt Jones will be blogging on some of the more technical aspects of this project.

How to get involved

We’re looking for feedback on our current proposed metadata schema for code discovery, reuse and citation.

1 response

  1. Matt Jones wrote on :

    Work surrounding this idea of minimal metadata for software is growing. Today I saw another report from NIH about software identification and citation, and they have a useful Appendix 1 (http://softwarediscoveryindex.org/report/) that lists a minimal set of metadata for software. I will try to include this in our crosswalk mapping document.