Worries & Critical Mass

Welcome to the Mozilla Science Lab Week in Review, a new feature on who’s been making notable contributions to the Science Lab in the past week, a roundup of what we got up to, and some thoughts on the same. Publishing most Sundays – comments always welcome!

Shoutouts This Week

Abby Cabunoc and Bill Mills spent the week dispensing high fives and pull requests at NCEAS’ first Open Science Codefest, and we’re very glad we did; it was tons of fun hacking on projects together and talking about how we can make open science work in the wild. Special shout-outs to Kevin Wu for being a hero developer, up for anything, who contributed to both the projects I floated from the MSL community (more on this initiative soon), and Joe Mudge for guiding us through building a collective model of bee behavior – we always learn tons from our team-ups with contributors, and I hope you had as good a time as I did!

Beating the Fear

When we weren’t heads down and hacking, I got to hear people’s stories about their adventures and ideas in open science, and some really interesting patterns began to appear.

The first thing a group of us managed to collaboratively hack together was a dinner plan for the night before. On the walk back, Naupaka Zimmerman brought up a key problem that new researchers face; to paraphrase: “There are so many technology choices out there that sound the same to beginners – Zshell, Cshell, OMGshell – not to mention all the options they might not have noticed – how can they feel like they are making informed decisions?” The initial bewilderment this describes is a thread we as a community should tug on. My instincts say that worrying about technology choices and the nebulous concept of ‘best’, especially for beginners, is a trap of infinite anxiety; the focus needs to be on goals, and whatever tool lands your particular plane is *good enough for now*. Making mistakes and iterating next time is better than okay – it’s how ‘best’ happens. But it’s easy for me to say that, and Naupaka’s point is crucial; how can we make a really warm and practical welcome to people jumping in for the first time?

Much to my delight, Stephanie Hampton convened a project at Codefest to write a paper, intended for traditional journal publication, on how people can spool up from zero on open science. Tons of great ideas got exchanged in the session, but one thing that stood out to me after reviewing the notes was that the hurdles discussed were largely not technical – they were psychological. As a broad community, scientists coming to open practices are often fearful of getting scooped, or as the draft points out, afraid of being wrong in public. I have seen this over and over when I introduce researchers to open source version control, for example; there is always a lot of anxiety over putting ‘bad’ code up on GitHub, and a lot of time gets sunk into cleanup beforehand. Meanwhile, the majority of the code I put up on GitHub is terribletruly hilariously bad in some cases – but I maintain this doesn’t matter; it’s all part of the process. GitHub isn’t my publication journal; it’s my lab book. But again – it’s easy for an open source stalwart to have that internalized; it’s a whole different story for people new to the process, and their worries are very real. Helping resolve the anxieties that Naupaka, Stephanie et al so astutely described should be a key point of study for our community, beyond the merely technical.

What Discoverability Problem?

I think one of the biggest home runs we get out of an event like Codefest or the MSL Sprint is a critical mass of interaction. By putting all that knowledge together with low barriers to exchange, the discoverability problem that is one of open science’s biggest bugbears is completely blown away. I saw this over and over at Codefest this week; from advising Lauren Hallett and Andy Lyons on contributing to a parser for extracting seed data from Kew Gardens, to hovering around the edges of Ben Best’s conversation on GIS in R and Python to sitting with Ralph and Matthew at the Sprint to imagine Trillian, the sheer volume of ‘hey have you heard of X’ creates a shared pool of knowledge that leaves bad discoverability no where to hide.  And when a collection of appealing tools are on the table, the next thing to get crushed is bad interoperability; if an opportunity appears to solve a major problem with a tool that already exists and is available for free, the incentive to make everything play nice together is huge.

So how can we copy/paste this critical mass of interaction?  Stack Overflow has done it for mainstream software development.  Can we build a similar distributed community for the sciences? I think so – that’s the community I’ve been talking about building lately, and that’s the prize on the horizon; the end of the discoverability problem, natural pressure towards interoperability, and a clearer onramp for new participants.  As always, we hope you’ll join us to make it so.

When the event was wrapping up, an excellent point was made about what NCEAS was trying to do with their first ever Codefest: NCEAS hosted an unconference lead by participants because no institute or working group can or should hand down a formulation of open science to the world at large; like all cultures, research has to be lead by you the community; and if we want to build something for researchers, then it has to be built by researchers. The Mozilla Science Lab couldn’t agree more; we’re looking forward to many more events that do just that.

3 responses

  1. Ted Hart wrote on :

    ” GitHub isn’t my publication journal; it’s my lab book. ” I wish I could believe in this sentiment. However I’ve applied for jobs where they look at your GitHub account (one I got, one I didn’t). I agree that in theory we should have no worries about what we post on GitHub, but as much as I wish I could say we liven a judgement free world, I’m too cynical about human nature. Colleagues and potential employers will judge your code and think more or less of you for it, and that will have impacts on your professional life. I just think that’s an unfortunate reality of posting things online. (Full disclaimer: I haven’t looked at any of your code Bill and am passing no judgement on it :) )

    1. Bill Mills wrote on :

      I would hope that an employer savvy enough to look at your GitHub content also knows they should be looking at tagged releases (or at least master) if they want to see what a prospective employee is actually going to ship, and not some wacky development branch. But, I admit that me hoping they are that savvy does not make them that savvy :)

      I’d like to understand just what employers are looking for in such reviews; my instinct is that it must be relatively superficial (what languages does this person write in, how often are they pushing code, are their projects forked by anyone, are there any tests *at all*), since in depth code review ‘from orbit’ is notoriously difficult, particularly for something as context dependent as scientific code. Concerns over ‘what will prospective employers think’ are a really important point – I think we could relieve a lot of anxiety by getting a clearer picture of what meter sticks scientific and academic employers are using in their hiring practices.

      1. Ted Hart wrote on :

        I totally agree. I have no idea what employers look for, but I suspect you’re right. The flip side is that I’ve pointed people to specific repos I’m proud of, and I think that can help you get a job. And perhaps the anxiety would inspire better habits too (I definitely have way more repos w/o tests that w/ tests). Thanks for a great post Bill, and a thoughtful reply.