Text analysis with the Python Natural Language Toolkit

The following is a guest post from Fiona Tweedie of the University of Melbourne, and host for one of our upcoming MozFest sessions on Text as Data in the Humanities and Social Sciences. You can follow her on Twitter at @FCTweedie.

The increased availability of digital texts means that text mining techniques are now relevant to more researchers than ever. The ability to analyze huge bodies of text quickly and efficiently has expanded the possibilities of text-based research. But humanities and social science researchers often lack the tools to make the most of these new data sources. A little Software Carpentry, however, has the potential to expand the research they are able to do and increase their ability to share and reproduce that research. To address this need, we’re developing course materials to introduce the Python Natural Language Toolkit and help historians, anthropologists, sociologists and political scientists (to name a few) analyze text. In our MozFest workshop, we’ll be testing out some new training materials for the first time and we’re especially looking for help from:

  • Humanities and social science people – would you find this training useful?
  • Python users – are the concepts introduced in a way that makes sense?
  • Linguists – are we explaining enough linguistic theory to ensure that the results are sound?
  • Anyone else with an interest in text mining and a willingness to play-test new training materials

The existing course materials can be accessed at: https://github.com/resbaz/lessons/tree/master/nltk

Please bring a laptop, ideally with access to IPython notebooks. See http://ipython.org/install.html to access.