Telemetry and What It Is Good for: Part 1: Nuts and Bolts

Telemetry has been in production for about year. However, it turns out that many Mozillians do not know what it is good for. I presented about Telemetry at FOSDEM 2012, but have not had a chance to reach out to the core Mozilla developers because we haven’t had a Mozilla All-Hands since Telemetry got useful.

Why Would One Use Telemetry?

Telemetry exists for a single purpose: matching developer expectations with real Firefox behavior. My experience working on startup lead me to believe that is unreasonably complex to try to model real-world behavior in a lab setting and that it was actually easier to just measure real world behavior.

Anything that varies with IO, system configuration, user input, user workloads is easier to measure with Telemetry than to develop a useful finite benchmark for.

Nuts & Bolts

Telemetry consists of two parts: client-side collection code + serverside frontend.

Client-side Telemetry currently records:

  • simple measures: discrete numbers such as amount of ram, various startup times, flash version, etc
  • histograms: efficient one-dimensional means of gathering a range of values such as memory usage, cycle collection times, types of events occurring, etc. These are all specified in TelemetryHistograms.h. You can view your local histograms by enabling telemetry and installing about:telemetry.
  • slow sql statements: We record SQL statements that take over 100ms and whether they occur on main thread to prioritize Snappy SQL work.
  • chromehangs: Nightly builds ship with frame-pointers so we can detect when Firefox pauses for over 5 seconds. Every time Firefox pauses, we record the backtrace. We started sending those a month ago, processing them on the serverside is a work-in-progress. These should be very handy for prioritizing work on making Firefox more responsive

One current limitation is that histograms are on-dimensional, there is no way to relate cycle collection times to uptime, memory usage, etc. We also go to great lengths to avoid collecting any personal identifiers. As a result we have no user UIDs and no ability to track how a user’s performance changes over time.

Telemetry Frontend is a public dashboard that can be seen at arewesnappyyet.com. Anyone can get a BrowserID login and look at our browser stats. Telemetry dashboard consists of two views:

  • Telemetry Histograms: this is basically the same data as displayed in about:telemetry, but aggregated from our userbase. This was our original view and is likely to get folded into evolution in the future.
  • Telemetry Evolution: This view tracks how medians/percentiles gathered by histograms change over time. This is the view that most developers use.

Telemetry is not a technology unique to Firefox. I borrowed a lot of code from the Chromium implementation to get caught up. Microsoft also collects similar metrics.

There are two differences between us and other browser vendors:

  1. We do not assign a unique id to every user. This sucks from a developer perspective as it makes it a lot harder to track performance over time, but we believe the privacy benefits are worth it.
  2. We made our dashboards public because we would like to have our community actively involved in helping us track Firefox performance.

In part 2 I’ll discuss how various people at Mozilla use Telemetry.

Comments are closed.