Data Publishing @ Mozilla

Introduction

Mozilla’s history is steeped in openness and transparency – it’s simply core to what we do and how we see ourselves in the world. We are always looking for ways to bring our mission to life in ways that help create a healthy internet and support the Mozilla Manifesto. One of our commitments says “We are committed to an internet that elevates critical thinking, reasoned argument, shared knowledge, and verifiable facts”.

To this end, we have spent a good amount of time considering how we can publicly share our Mozilla telemetry data sets – it is one of the most simple and effective ways we can enable collaboration and share knowledge. But, only if it can be done safely and in a privacy protecting, principled way. We believe we’ve designed a way to do this and we are excited to outline our approach here.

Making data public not only allows us to be transparent about our data practices, but directly demonstrates how our work contributes to our mission. Having a publicly available methodology for vetting and sharing our data demonstrates our values as a company. It will also enable other research opportunities with trusted scientists, analysts, journalists, and policymakers in a way that furthers our efforts to shape an internet that benefits everyone.

Dataset Publishing Process

We want our data publishing review process, as well as our review decisions to be public and understandable, similar to our Mozilla Data Collection program. To that end, our full dataset publishing policy and details about what considerations we look at before determining what is safe to publish can be found on our wiki here. Below is a summary of the critical pieces of that process.

The goal of our data publishing process is to:

Reduce friction for data publishing requests with low privacy risk to users;
Have a review system of checks and balances that considers both data aggregations and data level sensitivities to determine privacy risk prior to publishing, and;
Create a public record of these reviews, including making data and the queries that generate it publicly available and putting a link to the dataset + metadata on a public-facing Mozilla property.

Having a dataset published requires filling out a publicly available request on Bugzilla. Requesters will answer a series of questions, including information about aggregation levels, data collection categories, and dimensions or metrics that include sensitive data.

A data steward will review the bug request. They will help ensure the questions are correctly answered and determine if the data can be published or whether it requires review by our Trust & Security or Legal teams.

When a request is approved, our telemetry data engineering team will:

Write (or review) the query
Schedule it to update on the desired frequency
Include it in the pubic facing dataset infrastructure, including metadata that links the public data back to the review bug.

Finally, once the dataset is published, we’ll announce it on the Data @ Mozilla blog. It will also be added to https://public-data.telemetry.mozilla.org/

Want to know more?

Questions? Contact us at publicdata@mozilla.com

Data@Mozilla

Data Publishing @ Mozilla

Introduction

Dataset Publishing Process

Want to know more?

This Week in Data: There’s No Such Thing as a Normal Month

Glean Memory Usage Reporting

This Week in Data: Cosmic Rays From Outer-Space! (What comes next?)

This Week in Glean: Page Load Data, Three Ways (Or, How Expensive Are Events?)

This Week in Glean: Your personal Glean data pipeline

This Week in Data: There’s No Such Thing as a Normal Month

Incident Report: A compiler bug and JSON

Glean Memory Usage Reporting

Data and Firefox Suggest

How do we preserve the integrity of business metrics while safeguarding our users privacy choice?

This Week in Data: There’s No Such Thing as a Normal Month

Never Look at the Data: Why did we start getting so many pings from Korea?

This Week in Data: Python Environment Freshness

This Week in Glean: What Flips Your Bit?

This Week in Glean: Designing a telemetry collection with Glean

Data and Firefox Suggest

Documenting outages to seek transparency and accountability

Announcing Mozilla Rally

Data Publishing @ Mozilla

Understanding default browser trends

Data and Firefox Suggest

This Week in Glean: Data Reviews are Important, Glean Parser makes them Easy

Comparing data-stewardship at Mozilla with Lauren Maffeo’s book “Designing Data Governance from the Ground Up”

This Week in Data: Cosmic Rays From Outer-Space! (What comes next?)

This Week in Data: Reading “The Manager’s Path” by Camille Fournier

This Week in Glean: What Flips Your Bit?

Detecting Internet Outages with Mozilla Telemetry Data

Making your Data Work for you with Mozilla Rally

This Week in Glean: Fantastic Facts and where to find them

Welcome (back) to Data@Mozilla