What else is needed for code reuse?

When we first started discussions around our latest “Code as a Research Object” project, one of the main topics that arose was reuse. It’s one thing for code and software to have an identifier that the community trusts so that it can be integrated into scholarly publishing systems. But what about the researchers looking to use that information to build or reuse that code in their own work? What information is needed for the code to be picked up, forked and run by someone else outside of their lab?

We started to look at other attempts across software engineering and the sciences to provide metadata for materials angled towards reuse. There’s the MIAME standard for microarray experiments and the MIBBI set of minimal information standards for everything from flow cytometry to mouse phenotypes. Both are widely used and pointed to by major publishers. There’s DOAP, an RDF schema for describing software projects.  And more data standards than there’s room to list.

Metadata has long been used to help others catalog, discover, and link information on the web. It has also been used to capture the essential information needed for someone to reuse content on the web, be it an article, webpage, image, software or data file. But metadata forms can be onerous, time-consuming, and long. How can we best surface information immediately of use to the researcher, that gives them the necessary information to understand, use and build on the code made available?

We’ve started a thread over in the repository listing a few ideas and would love to hear your thoughts. The goal is to move towards a best practice that we can, as a community, use across platforms, extensions (like our own) and services. Making information – be it data or code – available is step one. Providing the context needed for others to meaningfully understand it and reuse it is the next challenge.

(Please add your thoughts over in the repository on this issue.)