Q&A: The Moore/Sloan Data Science Environment Project

During last week’s Mozilla Science Lab community meeting, Ed Lazowska from the University of Washington and the eScience Institute talked about the new Moore/Sloan Data Science Environments project. It was a great opportunity for the community to ask questions about the rationale for the project and how it will be implemented. (For more on the project, check out the recent New York Times blog post about it.)

You can read the meeting notes in their entirety on our meeting’s Etherpad (different colors represent different community members). We’ve teased out some of the discussion here, as well. Note: due to the overwhelming (amazing!) number of participants attribution can be tricky; we attributed below as best we could. Feel free to get in touch if you’d like your name added.

What does ‘success’ look like for this program? I.e., how will you know in 1/2/5 years whether it’s working or not?

“A part of the M-S project is about assessment, which includes trying to clarify that question as well as trying to answer it.” —Kyle Cranmer (Note: Kyle is involved with the center at NYU.)

Do the data science centers have an official stance on open access, or is that up to individual participants?

“Open access is a fundamental goal, yes.” —Ed Lazowska

“But there are often reasonable restrictions (e.g., data related to human subjects cannot be made open access).” —Greg Wilson

Do you see the Data Science Centers as playing a role in shifting the academic reward structure to reward software development on par with journal publications?

“Yes, that is part of the scope/goal of the project. There is strong buy-in at the Provost level of each institution.” —Kyle Cranmer

How will the Data Science Centers interact with the “classical” departments/institutes which are organized by application domains?

“The Moore-Sloan effort at NYU will definitely be strongly integrated with the ‘domains’. NYU also has a Center for Data Science, which will also focus on methodology. We anticipate tenure-track hires will often be tied to a department like physics, biology, etc.” —Kyle Cranmer [Ed Lazowska described half-tenure positions which free researchers to be associated with their department and the data science centers.]

How will this project include non-university data-intensive science? For example, agency data archives? Government labs? Private labs at for-profit institutions/firms?

“It will certainly include things like census data.” —Ed Lazowska

On data science career paths: How are you going to lure data scientists graduating away from Industry? I studied bioinformatics and most of my classmates went to industry with high paychecks and perks.

We need to rely on a reward system that offers a set of “strokes” that a certain type of person finds fulfilling, like being recognized within the institution.

“At NYU we have a ‘Careers’ working group focusing on this question. I think each of the three universities has a similar working group.” —Kyle Cranmer

“UW has half-faculty lines to promote multi-disciplinary/interdisciplinary appointments: half in the discipline science area and half in data science methods areas.” —Ed Lazowska

These positions also might represent a nice bridging position between academia and industry.

Can you comment on the “Novelty Squared” problem: the challenge of doing research which is novel both to the science researcher and the computer scientist?

“In the fullness of time, people with both abilities will be doing the interesting new science. For example, the postdocs from the Sloan Digital Sky Survey with data-intensive experience were in a position to land excellent faculty positions becuase of the eperience.” —Ed Lazowska

“I will push back slightly on this. I personally agree that data science skills are as essential to future success in science (at least my science, astro) as calculus… but I will say that academia still lags way behind in how it appreciates those people. Many of the people working with Andy (Connoly) have non-secure staff positions. Problem’s not solved, that’s all.” —Lucianne Walkowicz

“Thanks. I agree with your sentiment. We remember those who succeed in this adventure, we bury the careers of those who don’t.” —John Cobb

An interesting aspect of these roles could be the potential to have slack in the schedule of the investigator, so that they be given the time to discover the value that may lie in the data.

For More Information

The slides used at the White House rollout of the project are available.

Watch the 20 minute video of the presentation by Ed Lazowska, Saul Perlmutter, Yann LeCun, Josh Greenberg (Sloan), and Chris Mentzel (Moore).

And NYU is hiring, as are the other two centers!

Unanswered Questions

(We just didn’t have time to get to these ones in the meeting.)

How much will the center be a clearing house/meeting place for existing work, and how much brand new work will be done?

The Moore website implies that new work will be done:

Build on current academic and industrial efforts to work towards an ecosystem of analytical tools and research practices that is sustainable, reusable, extensible, learnable, easy to translate across research areas and enables researchers to spend more time focusing on their science.

What sort of interactions with commercial companies/startups do you anticipate?

Good question! We couldn’t find an answer to this one — feel free to chime in in the comments.