Two Days, or How Long Until the Data is In

Two days.

It doesn’t seem like long, but that is how long you need to wait before looking at a day’s Firefox data and being sure than 95% of it has been received.

There are some caveats, of course. This only applies to current versions of Firefox (55 and later). This will very occasionally be wrong (like, say, immediately after Labour Day when people finally get around to waking up their computers that have been sleeping for quite some time). And if you have a special case (like trying to count nearly everything instead of just 95% of it) you might want to wait a bit longer.

But for most cases: Two Days.

As part of my 2017 Q3 Deliverables I looked into how long it takes clients to send their anonymous usage statistics to us using Telemetry. This was a culmination of earlier ponderings on client delay, previous work in establishing Telemetry client health, and an eighteen-month (or more!) push to actually look at our data from a data perspective (meta-data).

This led to a meeting in San Francisco where :mreid, :kparlante, :frank, :gfritzsche, and I settled upon a list of metrics that we ought to measure to determine how healthy our Telemetry system is.

Number one on that list: latency.

It turns out there’s a delay between a user doing something (opening a tab, for instance) and them sending that information to us. This is client delay and is broken into two smaller pieces: recording delay (how long from when the user does something until when we’ve put it in a ping for transport), and submission delay (how long it takes that ready-for-transport ping to get to Mozilla).

If you want to know how many tabs were opened on Tuesday, September the 5th, 2017, you couldn’t tell on the day itself. All the tabs users open late at night won’t even be in pings, and anyone who puts their computer to sleep won’t send their pings until they wake their computer in the morning of the 6th.

This is where “Two Days” comes in: On Thursday the 7th you can be reasonably sure that we have received 95% of all pings containing data from the 5th. In fact, by the 7th, you should even have that data in some scheduled datasets like main_summary.

How do we know this? We measured it:

Screenshot-2017-9-12 Client "main" Ping Delay for Latest Version(1).png(Remember what I said about Labour Day? That’s the exceptional case on beta 56)

Most data, most days, comes in within a single day. Add a day to get it into your favourite dataset, and there you have it: Two Days.

Why is this such a big deal? Currently the only information circulating in Mozilla about how long you need to wait for data is received wisdom from a pre-Firefox-55 (pre-pingsender) world. Some teams wait up to ten full days (!!) before trusting that the data they see is complete enough to make decisions about.

This slows Mozilla down. If we are making decisions on data, our data needs to be fast and reliably so.

It just so happens that, since Firefox 55, it has been.

Now comes the hard part: communicating that it has changed and changing those long-held rules of thumb and idées fixes to adhere to our new, speedy reality.

Which brings us to this blog post. Consider this your notice that we have looked into the latency of Telemetry Data and is looks pretty darn quick these days. If you want to know about what happened on a particular day, you don’t need to wait for ten days any more.

Just Two Days. Then you can have your answers.

:chutten

(Much thanks to :gsvelto and :Dexter’s work on pingsender and using it for shutdown pings, :Dexter’s analyses on ping delay that first showed these amazing improvements, and everyone in the data teams for keeping the data flowing while I poked at SQL and rearranged words in documents.)

(This is a cross-post from chuttenblog. I have quite a few posts on there that you might like, including a series on Windows XP in Firefox, a bunch of Satisfying Graphs, and Reasons Why Data Science Is Hard)

Recording new Telemetry from add-ons

One of the successes for Firefox Telemetry has been the introduction of standardized data types; histograms and scalars.

They are well defined and allow teams to autonomously add new instrumentation. As they are listed in machine-readable files, our data pipeline can support them automatically and new probes just start showing up in different tools. A definition like this enables views like this:

The distribution view for the max_concurrent_tabs scalar on the TMO dashboard.

The distribution view for the max_concurrent_tabs scalar on the TMO dashboard.

This works great when shipping probes in the Firefox core code, going through our normal release and testing channels, which takes a few weeks.

Going faster

However, often we want to ship code faster using add-ons: this may mean running experiments through Test Pilot and SHIELD or deploying Firefox features through system add-ons.

When adding new instrumentation in add-ons, there are two options:

  • Instrumenting the code in Firefox core code, then waiting a few weeks until it is in release.
  • Implementing a custom ping and submitting it through Telemetry, requiring additional client and pipeline work.

Neither are satisfactory; there is significant manual effort for running simple experiments and adding features.

Filling the gap

This is one of the main pain-points coming up for adding new data collection, so over the last months we were planning how to solve this.

As the scope of an end-to-end solution is rather large, we are currently focused on getting the support built into Firefox first. This can enable some use-cases right away. We can then later add better and automated integration in our data pipeline and tooling.

The basic idea is to use the existing Telemetry APIs and seamlessly allow them to record data from new probes as well. To enable this, we will extend the API with registration of new probes from add-ons at runtime.

The recorded data will be submitted with the main ping, but in a separate bucket to tell them apart.

What we have now

We now support add-on registration of events from Firefox 56 on. We expect event recording to mostly be used with experiments, so it made sense to start here.

With this new addition, events can be registered at runtime by Mozilla add-ons instead of using a registry file like Events.yaml.

When starting, add-ons call nsITelemetry.registerEvents() with information on the events they want to record:

Services.telemetry.registerEvents(“myAddon.ui”, {
  “click”: {
    methods: [“click”],
    objects: [“redButton”, “blueButton”],
  }
});

Now, events can be recorded using the normal Telemetry API:

Services.telemetry.recordEvent(“myAddon.ui”, “click”,
                               “redButton”);

This event will be submitted with the next main ping in the “dynamic” process section. We can inspect them through about:telemetry:

The event view in about:telemetry, showing that an event ["myAddon.ui", "click", "redButton"] was successfully recorded with a timestamp.

The event view in about:telemetry.

On the pipeline side, the events are available in the events table in Redash. Custom analysis can access them in the main pings under payload/processes/dynamic/events.

The larger plan

As mentioned, this is the first step of a larger project that consists of multiple high-level pieces. Not all of them are feasible in the short-term, so we intend to work towards them iteratively.

The main driving goals here are:

  1. Make it easy to submit new Telemetry probes from Mozilla add-ons.
  2. New Telemetry probes from add-ons are easily accessible, with minimal manual work.
  3. Uphold our standards for data quality and data review.
  4. Add-on probes should be discoverable from one central place.

This larger project then breaks down into roughly these main pieces:

Phase 1: Client work.

This is currently happening in Q3 & Q4 2017. We are focusing on adding & extending Firefox Telemetry APIs to register & record new probes.

Events are supported in Firefox 56, scalars will follow in 57 or 58, then histograms on a later train. The add-on probe data is sent out with the main ping.

Phase 2: Add-on tooling work.

To enable pipeline automation and data documentation, we want to define a variant of the standard registry formats (like Scalars.yaml). By providing utilities we can make it easier for add-on authors to integrate them.

Phase 3: Pipeline work.

We want to pull the probe registry information from add-ons together in one place, then make it available publically. This will enable automation of data jobs, data discovery and other use-cases. From there we can work on integrating this data into our main datasets and tools.

The later phases are not set in stone yet, so please reach out if you see gaps or overlap with other projects.

Questions?

As always, if you want to reach out or have questions:

Firefox data platform & tools update, Q2 2017

The data platform and tools teams are working on our core Telemetry system, the data pipeline, providing core datasets and maintaining some central data viewing tools.

To make new work more visible, we provide quarterly updates.

What’s new in the last few months?

Beta “main” ping submission delay analysis by :chutten, showing a clear and significant downwards trend..

Beta “main” ping submission delay analysis by :chutten.

A lot of work in the last months was on reducing latency, supporting experimentation and providing a more reliable experience of the data platform.

On the data collection side, we have significantly improved reporting latency from Firefox 55, with preliminary results from Beta showing we receive 95% of the “main” ping within 8 hours (compared to previously over 90 hours). Curious for more detail? #1 and #2 should have you covered.

We also added a “new-profile” ping, which gives a clear and timely signal for new clients.

There is a new API to record active experiments in Firefox. This allows annotating experiments or interesting populations in a standard way.

The record_in_processes field is now required for all histograms. This removes ambiguity about which process they are recorded in.

The data documentation moved to a new home: docs.telemetry.mozilla.org. Are there gaps in the documentation you want to see filled? Let us know by filing a bug.

For datasets, we added telemetry_new_profile_parquet, which makes the data from the “new-profile” ping available.

Additionally, the main_summary dataset now includes all scalars and uses a whitelist for histograms, making it easy to add them. Important fields like active_ticks and Quantum release criteria were also added and backfilled.

For custom analysis on ATMO, cluster lifetimes can now be extended self-serve in the UI. The stability of scheduled job stability also saw major improvements.

There were first steps towards supporting Zeppelin notebooks better; they can now be rendered as Markdown in Python.

The data tools work is focused on making our data available in a more accessible way. Here, our main tool Redash saw multiple improvements.

Large queries should no longer show the slow script dialog and scheduled queries can now have an expiration date. Finally, a new Athena data source was introduced, which contains a subset of our Telemetry-based derived datasets. This brings huge performance and stability improvements over Presto.

What is up next?

For the next few months, interesting projects in the pipeline include:

  • The experiments viewer & pipeline, which will make it much easier to run pref-flipping experiments in Firefox.
  • Recording new probes from add-ons into the main ping (events, scalars, histograms).
  • We are working on defining and monitoring basic guarantees for the Telemetry client data (like reporting latency ranges).
  • A re-design of about:telemetry is currently on-going, with more improvements on the way.
  • A first version of Mission Control will be available, a tool for more real-time release monitoring.
  • Analyzing the results of the Telemetry survey, (thanks everyone!) to inform our planning.
  • Extending the main_summary dataset to include all histograms.
  • Adding a pre-release longitudinal dataset, which will include all measures on those channels.
  • Looking into additional options to decrease the Firefox data reporting latency.

How to contact us.

Please reach out to us with any questions or concerns.

Measuring Search in Firefox

Today we are launching a new search data collection initiative in Firefox. This data will allow us to greatly improve the Firefox search experience while still respecting user privacy.

Search is both a fundamental method for navigating the Web and how Mozilla makes much of its revenue. Our research shows users have complicated search workflows. We know from internal user research studies that users often start a search from places like the Awesome Bar or search bar and then continue to refine their search on the search engine results page. We call these additional searches follow-on searches.

Firefox telemetry already includes a count of the searches users perform in all Firefox search bars. Firefox does not yet count follow-on searches. This is a real challenge for Mozilla, because we don’t understand how well the Firefox search experience works for our users.

A new experiment launching today will measure follow-on searches. When you search with one of the search engines that we include in Firefox, we will increment a counter for each follow-on search. Our telemetry system will count follow-on searches the same way we already count direct searches from our search bars. We won’t collect search queries (the words you type into the search box) nor any other Web browsing activity.

We will roll out the new experiment to a random sample of 10% of Firefox release users. If successful, we will extend these follow-on search measurements to our entire release population as a part of our normal telemetry system.

We seek these new measurements to gain missing insight into a crucial browser interaction. These new measurements are consistent with our data collection principles. Data helps us decide where to apply our limited resources to improve Firefox, while also safeguarding user privacy.  Mozilla will continue to provide public documentation and user controls for all telemetry collected within Firefox. With better insight into search behavior, we can improve Firefox and continue to sustain Mozilla’s mission.

Javaun Moradi,  Sr. Product Manager, Firefox Search