Mozilla Opens Access to Dataset on Network Outages
The internet doesn’t just have a simple on/off switch — rather, there are endless ways connectivity can be ruptured or impaired, both intentionally (cyber attacks) and unintentionally (weather events). While a difficult task, knowing more about how connectivity is affected and where can help us better understand the outages of today, as well as who (or what) is behind them to prevent them in the future.
Today, Mozilla is opening access to an anonymous telemetry dataset that will enable researchers to explore signals of network outages around the world. The aim of the release is to create more transparency around outages, a key step towards achieving accountability for a more open and resilient internet for all. We believe this data, which is anonymized and aggregated to ensure user privacy, will be valuable to a range of actors, from technical communities working on network resilience to digital rights advocates documenting internet outages.
While a number of outage measurements rely on hardware installations or require people experiencing outages to initiate their own measurements, Mozilla’s data originates from everyday use of Firefox browsers around the world, essentially creating a timeline of both regular and irregular connectivity patterns across large populations of internet users. In practice, this means that when significant numbers of Firefox clients experience connection failures for any reason, this registers in Mozilla’s telemetry once a connection is restored. At a country or city level, this can provide indications of whether an outage occurred.
In addition to being able to see city-specific outages, Mozilla’s dataset also offers a comparatively high degree of technical granularity which allows researchers to isolate different types of connectivity issues in a given time frame. Because outages are often shrouded in secrecy, researchers can sometimes only estimate the exact nature of a local outage. Combined with other data sources, for instance from companies like Google and Cloudflare, Mozilla’s dataset will be a valuable source to corroborate reports of outages.
Whenever internet connections are cut, the safety, security and health of millions of people may be at stake. Documenting outages is an important step in seeking transparency and accountability, particularly in contexts of uncertainty or insecurity around recent events.
“Mozilla is excited to make our relevant telemetry data available to researchers around the world to aid efforts toward transparency and accountability. Internet outages can be hard to measure and it is very fortunate that there is a dedicated international community that is focused on this crucial task. We look forward to interesting ways in which the community will use this anonymous dataset to help keep the internet an open, global public resource,” says Daniel McKinley, VP, Data Science and Analytics at Mozilla.
Over the course of 2020 and 2021, researchers from Internet Outage Detection and Analysis (IODA) of the Center for Applied Internet Data Analysis (CAIDA), Open Observatory of Network Interference (OONI), RIPE Network Coordination Center (RIPE NCC), Measurement Lab (M-Lab), Internews and Access Now joined a collaborative effort to compare existing data on outages with Mozilla’s dataset. Their feedback has uniformly stated that this data would be helpful to the internet outage measurement community in critical work across the world.
“We are thrilled that Mozilla’s dataset on outages is being published. Our own analysis of the data demonstrated that it is a valuable resource for investigating Internet outages worldwide, complimenting other public datasets. Unlike other datasets, it provides geographical granularity with novel insights and new research opportunities. We are confident that it will serve as an extremely valuable resource for researchers, human rights advocates, and the broader Internet freedom community,” says Maria Xynou, the Research and Partnerships Director of OONI.
In order to gain access to the dataset, which is licensed under the Creative Common Public Domain license (CC0) and contains data from January 2020 onward, researchers can apply via this Google Form, after which Mozilla representatives will reach out with next steps. More information and background on the project and the dataset can be found on Mozilla Wiki.
We look forward to seeing the exciting work that internet outage researchers will produce with this dataset and hope to inspire more use of aggregated datasets for public good.
This post was co-authored by Solana Larsen, Alessio Placitelli, Udbhav Tiwari.