Teon Brooks on Open-Sourcing Data Science at Mozilla

At many companies, data science challenges stem from the sheer volume, velocity, and variability of information collected from users. In contrast, Mozilla collects only the data that’s truly needed—and gives users complete control of what they share. Below, Teon Brooks explains the creative challenges and field-defining opportunities data scientists face when privacy, transparency, and openness inform every decision.

What do you do at Mozilla?

I work on the Data Science team. We’re part of Operations, which allows us to work with a lot of other teams, as well—Emerging Technologies, Marketing, Firefox, anywhere data lives. In part, that means figuring out which questions we can ask to better understand how people are using Mozilla’s products, and then analyzing different metrics to find the answers.

Tell us about your background. What led you here?

I’ve always been interested in technology. I got my first computer in elementary school and immediately took it apart to try to understand how it worked. I didn’t grow up knowing how to program, though—other than a little HTML to modify my Neopets website! Then in college, I started out in math and physics and changed my major probably five times, including to information and library science, pre-med, and speech and hearing science. I’m curious by nature. Eventually, I landed on psychology. I’m fascinated by how people think.

I did my master’s and Ph.D. in cognitive science, and that’s when I learned to program in Python and contributed to my first open source project. It was a mind-blowing experience for me to not only create something that made my own research easier but also to be able to share it and help other people. Then as I was finishing my doctorate, I heard about the Mozilla Foundation’s fellowship for open science. I applied and ended up spending a year and a half working on an open neuroscience project at Stanford, building tools to help scientists share their data with each other. Eventually that led me to the data science role here.

What kinds of projects do you work on?

My focus recently has been on the address bar. One of the first steps for the engineering team was changing from XUL to HTML, because the legacy code made most innovations impossible. The data science team started by running experiments on several metrics to make sure the transition wouldn’t do any harm. Then we rolled out HTML, and now our designers can have a field day—they’re reshaping the Firefox address bar to provide personalization and context for users as they search.

There’s another cool project called Iodide, which essentially lets you do data science in your browser. I’m not part of the team that’s actually building it, but I’m a beta tester and one of their biggest fans. Iodide is a great way to tell a compelling story with data without losing all the richness of the analysis you did to get there. Rather than copying and pasting figures into a Google doc, it lets you blend Python, CSS, and JavaScript together and use the browser’s rendering engine to design a report in any way you like. It’s really helping me share my work with the rest of the data science community, because people can see all the nitty-gritty details—the method, the analysis, my conclusions—and then follow up with questions.

How do Mozilla’s values around privacy and open data inform your work?

Being a data scientist here is very different compared to most companies, in part because Mozilla has very lean data policies. Rather than just collecting everything we can, we’re thoughtful about protecting privacy. We collect only what we actually need to improve the product. With the address bar, for example, we had very little user data; based on a user’s bookmarks or a search engine, we knew how many autocomplete results someone would get for a search—but not what they were (we still don’t know). We also didn’t know how often people were abandoning their searches in the address bar. So we decided on specific metrics, such as the time it takes to complete a search or whether they completed it—to track so we’d know whether the user experience was improving, and then our proposal went through what’s called data stewardship, which is a Mozilla program that ensures users know about and can control any changes.

It definitely takes a lot of creativity to design experiments with as little data as possible, but it makes my job much more interesting. And I think this is how all companies should operate. User data shouldn’t be like a faucet that’s always on.

What are you most excited about right now?

The Data Science team is very interested in building up our research wing, and I’ve personally been doing a lot of research lately on knowledge discovery—how people find information and then communicate or share it. I think that’s the common thread in everything I’ve studied, from linguistics to cognitive science and now data science. It’s all about understanding the user and how they think. If we can develop a better understanding of that, we can create better products. I’m pulling from the existing literature and starting new collaborations, both internally with our UX researchers and externally with researchers at universities.

Our team is also really excited about finding new ways to share and provide context for the work we do. In part, that’s for our users. They own their data, and we want to help them understand how we’re using it. But I think we have an opportunity to lead the broader data science community, as well. When you work in the open and let more people contribute, you build better products and the entire ecosystem of the field gets richer. More companies are realizing that now—even Facebook and Google are getting on the open-source bandwagon—but Mozilla has always been open by design. So as the field of data science evolves, I think our team will really have a voice in the discussion.

Life@Mozilla