Something happened on January 5, 2023. All of a sudden we abruptly started receiving a number of pings from Firefox Desktop clients in Korea equal to two times the size of the entire Korean Firefox Desktop population.
What happened? How did we notice it? What did we do about it?
Let’s back up.
I can’t remember where I learned it, but I’d already started reciting as dogma in my first year of University: “The most important part about any feature is the ability to turn it off”. It’s served me well through my studies and my career. I’ve also found it to be especially true for data collection systems where, for whatever reason, as a user you might decide you no longer want the software you’re using to continue to send data. In some places this is even enshrined in laws where you can request the deletion of data that has already been collected.
Law or not, Mozilla has before, does now, and will always make it easy for you to decide whether to send data to Mozilla. We may not understand why you make that choice, and it definitely will make it harder for us to ensure our products meet your needs, but we’ll respect the heck out of your choice in our processes and in our products.
This is why, when Mozilla’s data collection system Glean is told the user went from allowing data upload to forbidding it, we send one final “deletion-request” ping before shutting down. The “deletion-request” ping contains all the internal identifiers we’ve used to longitudinally group data (if we receive ten crash reports it’s important to know whether it’s the same Firefox crashing ten times or if it’s ten Firefoxes crashing once), and we use those identifiers to (well) identify what data we’ve collected that we’re now going to delete.
For the purposes of this story you’ll need to know that there’s two times when Glean notices the product’s gone from “data upload: on” to “data upload: off”: while Glean is running, and during Glean startup. If Glean’s running, then we just handle things – we were told the setting changed from “data upload: on” to “data upload: off” and away we go. But Glean knows that it isn’t always listening to the data upload setting, so if it it starts up with “data upload: off” and the last time it shut down we were “data upload: on” we’ll send a specific “at_init”-reason “deletion-request” ping.
We in the Data Org monitor how Glean is behaving. One thing we’ve learned about how Glean behaves is that the number of “deletion-request” pings is roughly constant over time. And the proportion of “deletion-request” pings that have the “at_init” reason should remain a fairly fixed one.
What shouldn’t happen is for Firefox Desktop-sent “at_init”-reason “deletion-request” pings to spike like this on January 5:
What we do when we notice things like this is file a bug. As the one responsible for Glean’s integration in Firefox Desktop, and as someone with a long history of looking into anomalies, I took a look. At this initial point I was pretty sure it’d be a single actor (a single user, a single company, a single internet cafe) doing something odd… but alas, the evidence was inconclusive:
Evidence consistent with a single actor being responsible for it all:
- All the pings were coming from the same internet provider. Korea Telecom is responsible for a bare majority of Firefox Desktop data delivery from Korea, but the spikes were entirely from that ISP.
- The Mozilla Community in Korea could offer no explanation of any wide-spread computer or software event that matched the timeline.
- “at_init”-reason “deletion-request” pings could be a result of automation changing the files on disk to read “data upload: off” between runs of Firefox Desktop.
Evidence inconsistent with a single actor being responsible for it all:
- The data came from a mix of Firefox Desktop versions: versions 101.0.1, 104.0, and 108.0.2.
- The data came from a range of different regions, more or less following the population density of Korea itself.
- “at_init”-reason “deletion-request” pings could instead be the result of users changing the setting to “data upload: off” early enough during Firefox Desktop startup that Glean hasn’t yet been initialized.
Regardless of why it was happening, it quickly became more important that we learn what we needed to do about it. We spun up an Incident, which is how we organize ourselves when there’s something happening that requires cross-functional collaboration and isn’t getting better on its own. Once there we ascertained that we could respond very quickly and decisively and do
Nothing at all.
The volume of these pings vastly eclipsed any other “deletion-request” pings we would otherwise have received, so you’d be forgiven for thinking that it was terribly expensive to receive, store, and process them all. In reality, we batch these requests. And even before this spike, every batch of requests required editing every partition of every table. Adding another list of identifiers to delete equal in size to two times the peak Firefox Desktop population in Korea just doesn’t matter all that much.
The pressure was off. Even if it got worse… which it did:
On March 26, when it reached and maintained a peak of five times the volume of the Firefox Desktop population in Korea, it still wasn’t harming our data platform’s ability to serve business needs or costing us all that much in operational spend. We didn’t need to invest effort into running down the source, so we didn’t.
And so I just kept an occasional eye on it until, just as suddenly but not quite as abruptly as it began, on April 12 the ping volumes began to decrease. By April 18, we were back to normal levels.
We had successfully ignored it until it went away.
So what happened to Korean Firefox Desktop users from Jan 5 to April 12, 2023? We never figured it out. If you know about something happening across those dates in Korea: please get in touch. As little as it needed solving for the sake of business needs, it still needs solving for the sake of my curiosity.
(( This is a syndicated copy of the original post. ))