ContentMine: extracting facts from literature | #mozsprint 2016

Have you ever struggled to keep up to date with scientific literature? Christopher Kittel is working on ways to help!

Chris develops training materials and documentation for, a set of tools for extracting information from papers. A researcher himself, studying Environmental System Sciences at the University of Graz, Chris is looking for ways to cope with the increasing amount of scientific knowledge today.

I met Chris at MozFest 2015 and again at our first Working Open Workshop where he continues to work on ContentMine and help new contributors get involved. I interviewed Chris to learn more about ContentMine and how you can help during our Global Sprint 2016, June 2-3.


How did you find out about ContentMine? Why did you start contributing?

Stefan Kasberger, a fellow Open Science advocate and founder of openscienceASAP, and I were looking for partners to develop Open Science Trainings aimed at young researchers. We contacted ContentMine and found they had similar needs and started contributing. Also, the idea of ContentMine aligns with my personal research interests.

What is ContentMine?

ContentMine aims to liberate the factual knowledge condensed into research papers, make it accessible and re-usable for other researcher and the general public. For this we mainly develop tools, but also help interested people new to the topic navigate the landscape of text and data mining.

How has ContentMine helped you do research?

There are three areas where ContentMine helped. One is that ContentMine develops tools that solve some of my problems, such as creating collections of papers. The second one is that during the development process many ideas surface and many are discarded again, but they help me look on my problems and possible solutions from different angles. The last one is that the community involved is great to ask for feedback, and they know what’s state-of-the-art, what could possibly work, what not, and why.

Can you walk me through the different pieces of ContentMine I can contribute to?

There are two main areas our tools work in:

  1. getpapers and quickscrape help with the large-scale creation of an initial collection of papers, by either accessing search APIs from repositories, or by scraping content.
  2. The second area is processing and analyzing the collection. For this we have norma, which normalizes different input formats into a standardized one, and ami, which does fact extraction, can run controlled vocabularies, and basic text mining.

I see you run a lot of workshops teaching ContentMine. How can I get involved here?

You can set one up as a host from your organisation – get in contact with us for details – or you can contribute to the development of training materials on GitHub. We have materials which are useful for workshops, but also a section with self-guided tutorials that interested people can work with to get ContentMine running locally.

What problems have you run into while working on ContentMine?

We are a very distributed team, so communication and coordination are very important. Good documentation of milestones, issues, and decisions is always a challenge, and we need to constantly remind us to keep this up.

What kind of skills do I need to help you work on ContentMine?

You don’t need to be a fulltime developer to be able to contribute. Basic knowledge of working in the command-line is a good start if you want to use or test the system. If you can read and interpret code (in Java, JavaScript or Python) you can help with identifying bugs or improving the documentation, and if you can write code, then you can contribute by fixing bugs or working through the issues.

How can others help you during the Mozilla Science Global Sprint, June 2-3?

We’ll be working on usability and also creating some nice visualizations, so people can help by using the system for their research, feedback on their experience and send us nice visualizations of their results. If they already know their way through the system we can coordinate our efforts to solve some bigger development issues.

Bonus question: Where is your Research Fox sticker?

The Research Fox is on my laptop, right next to Octocat.




Come join us wherever you are June 2-3 at the Mozilla Science Global Sprint to work on ContentMine and pick up your own Research Fox! Have your own project? Submissions are open for projects.