It’s your data, we’re just living in it

Let me ask you a question: How often do you think about your Firefox data? I think about your Firefox data every day, like it’s my job. Because it is.  As the head of data science for Firefox, I manage a team of data scientists who contribute to the development of Firefox by shaping the direction of product strategy through the interpretation of the data we collect.  Being a data scientist at Mozilla means that I aim to ensure that Firefox users have meaningful choices when it comes to participating in our data collection efforts, without sacrificing our ability to collect useful, high-quality data that is essential to making smarter product decisions.

To achieve this balance, I’ve been working with colleagues across the organization to simplify and clarify our data collection practices and policies. Our goal is that this will make it easier for you to decide if and when you share data with us.  Recently, you may have seen some updates about planned changes to the data we collect, how we collect it, and how we share the data we collect. These pieces are part of a larger strategy to align our data collection practices with a set of guiding principles that inform how we work with and communicate about data we collect.

The direct impact is that we have made changes to the systems that we use to collect data from Firefox, and we have updated the data collection preferences as a result.  Firefox clients no longer employ two different data collection systems (Firefox Health Report and opt-in Telemetry). Although one was on by default, and the other was opt-in, as a practical matter there was no real difference in the type of data that was being collected by the two different channels in release.  Because of that, we now rely upon a single system called Unified Telemetry that has aspects of both systems combined into a single data collection platform and as a result no longer have separate preferences, as we did for the old systems.

If you are a long-time Firefox user and you previously allowed us to collect FHR data but you refrained from opting into extended telemetry, we will continue to collect the same type of technical and interaction information using Unified Telemetry. We have scaled back all other data collection to either pre-release or in situ opt-in, so you will continue to have choices and control over how Firefox collects your data.

Four Pillars of Our Data Collection Strategy

There are four key areas that we focused on when we decided to adjust our data preferences settings.  For Firefox, it means that any time we collect data, we wanted to ensure that the proposal for data collection met our criteria for:

  • Necessity
  • Transparency
  • Accountability
  • Privacy

Necessity

We don’t collect data “just because we can” or “just because it would be interesting to measure”.  Anyone on the Firefox team who requests data has to be able to answer questions like:

  • Is the data collection necessary for Firefox to function properly? For example, the automatic update check must be sent in order to keep Firefox up to date.
  • Is data collection needed to make a feature of Firefox work well? For example, we need to collect data to make our search suggestion feature work.
  • Is it necessary to take a measurement from Firefox users?  Could we learn what we need from measuring users on a pre-release version of Firefox?
  • Is it necessary to get data from all users, or is it sufficient to collect data from a smaller sample?

Transparency

Transparency at Mozilla means that we publicly share details about what data we collect and ensure that we can answer questions openly about our related decision-making.

Requests for data collection start with a publicly available bug on bugzilla. The general process around requests for new data collection follows this process: people indicate that they would like to collect some data according to some specification, they flag a data steward (an employee who is trained to check that requests have publicly documented their intentions and needs) for review, and only those requests that pass review are implemented.

Most simple requests, like new Telemetry probes or experimental tests, are approved within the context of a single bug.  We check that every simple request includes enough detail that a standard set of questions to determine the necessity and accountability of the proposed measurements.  Here’s an example of a simple request for new telemetry-based data collection.

More complex requests, like those that call for a new data collection mechanism or require changes to the privacy notice, will require more extensive review than a simple request.  Typically, data stewards or requesters themselves will escalate requests to this level of review when it is clear that a simple review is insufficient.  This review can involve some or all of the following:

  • Privacy analysis: Feedback from the mozilla.dev.privacy mailing list and/or privacy experts within and outside of Mozilla to discuss the feature and its privacy impact.
  • Policy compliance review: An assessment from the Mozilla data compliance team to determine if the request matches the Mozilla data compliance policies and documents.
  • Legal review: An assessment from Mozilla’s legal team, which is necessary for any changes to the privacy policies/notices.

Accountability

Our process includes a set of controls that hold us accountable for our data collection. We take the precaution of ensuring that there is a person listed who is responsible for following the approved specification resulting from data review, such as designing and implementing the code as well as analyzing and reporting the data received.  Data stewards check to make sure that basic questions about the intent behind and implementation of the data we collect can be answered, and that the proposed collection is within the boundaries of a given data category type in terms of defaults available.  These controls allow for us to feel more confident about our ability to explain and justify to our users why we have decided to start collecting specific data.

Privacy

We can collect many types of data from your use of Firefox, but we don’t consider them equal. We consider some types of data to be more benign (like what version of Firefox you are using) than others (like the websites you visit). We’ve devised a four-tier system to group data in clear categories from less sensitive to highly sensitive, which you can review here in more detail.   Since we developed this 4-tier approach, we’ve worked to align this language with our Privacy Policy and at the user settings for Privacy in Firefox.   (You can read more about the legal team’s efforts in a post by my colleagues at Legal and Compliance.)

What does this mean for you?

We hope it means a lot and not much at the same time.  At Firefox, we have long worked to respect your privacy, and we hope this new strategy gives you a clearer understanding of what data we collect and why it’s important to us.  We also want to reassure you that we haven’t dramatically changed what we collect by default.  So while you may not often think about the data you share with Mozilla, we hope that when you do, you feel better informed and more in control.