As a data engineer at Mozilla, my colleagues and I study how internet connectivity changes over time and across regions. Like inclement weather, network outages are simply a fact of life: equipment that powers the internet can fail for numerous reasons in any country. As we know from reports of internet shutdowns and throttling by governments in different parts of the world, sometimes outages can also be intentional. But in terms of data, Mozilla measures outages and connection issues through a series of different metrics, including telemetry upload failures.
Today, we are releasing an aggregate open dataset (italy_covid19_outage) to show what one example of an outage looks like.
In Italy, for several hours on March 11, 2020 – just days after a nationwide COVID-19 lockdown was declared – life on the internet came to a partial halt for a portion of the country. From news reports (1, 2) we know that one of the biggest Italian internet service providers, TIM, experienced an internet outage. Many customers of the ISP were unable to use a number of applications requiring network connectivity, and many websites were unreachable.
As someone living in a small town in Italy, I experienced firsthand the anxiety of losing my connection to the internet in the middle of the pandemic. When out stocking up at the pharmacy a few days after the lockdown was declared, I tried to reach out to my wife over the phone. She wasn’t picking up, so I messaged her over WhatsApp. That didn’t work either. I tried to video call, and that failed. Because my wife’s phone was connected to our WiFi router at home, and due to the technical details of our setup, the outage brought down most of the connectivity on her phone without her noticing.
So what happened in Italy, exactly? TIM released a statement to the press mentioning “a failure on a “foreign network” resulting in some websites and apps hosted abroad to be unavailable. Because of that failure, people like me experienced congestion on the internet – as though suddenly hitting traffic on a highway that is normally clear. In my case, it wasn’t that WhatsApp was down. It was that the VPN at my house stopped working properly due to connectivity issues, leaving me feeling isolated at the pharmacy during a tense moment.
In a crisis like this one, any internet outage can be truly unsettling. It adds an edge to the anxiety we may already be experiencing. As the COVID-19 pandemic is forcing entire countries to lock down and practice social distancing, the internet offers people around the world the comfort to stay connected. At Mozilla, we have witnessed this firsthand: our telemetry from the Firefox browser shows how more people are spending far more time online.
Mozilla can measure outages and connection issues through telemetry upload failures – we send small amounts of telemetry called “health ping” that report errors in uploading our normal telemetry data, but does not identify the users who have contributed this information. You can think of this small amount of data as a motorcycle on a congested highway, weaving between cars stuck in traffic. If there is an internet congestion that keeps WhatsApp or Facebook from working properly, for instance, the health pings sent to our servers often still get through, giving us information about whether the browser has connectivity issues.
In Italy on March 11, we can see clear signals that Firefox users experienced an increase of network timeouts. This means a device sent a request to a server, and waited beyond what the browser allows, meaning the server didn’t respond in a timely fashion.
This is how it looked on our systems:
The series in the graph represent two reasons why network connections to Mozilla’s telemetry server failed (see detailed description below). For example, on March 11, we saw a sharp increase in reported “timeouts” (up to ~6%), and terminated connections (up to ~11%). As it happens, this wasn’t the only date with disruptions in Italy. We detected two other spikes in terminated connections, one in January and one in February, which is consistent with reports from the crowdsourced site Downdetector.it.
The outage data
The data we are releasing today includes aggregated Firefox Desktop data for Italy from the “health” ping and some fields of the “main” ping that were created between January 1, 2020 up until March 31, 2020. To make all of this possible while respecting the privacy of our users, all of our metrics go through an extensive public data-review process before collection. Moreover, the different types of failures are aggregated by day. These counts are then normalized by the total number of active daily users, and this gives an indication of how broadly a network problem is affecting a larger percentage of Firefox desktop clients. To further make aggregates safer, days for which there are less than 5,000 samples are discarded and not reported.
This article was written in collaboration with Saptarshi Guha, Solana Larsen, Jochai Ben-Avie, Hamilton Ulmer. Thanks to Chris Hutten-Czapski, Jesse McCrosky, Jeff Klukas, Mark Reid and others for reviewing the initial investigation queries and to Anna Scholtz for the help with the dataset generation. And to Andrea Marchesini, for help navigating the Firefox network code and explaining the behaviour of each failure type.