Telemetry and What It Is Good for: Part 2: Telemetry Achievements

An inquisitive mind sent me an email with a pointed question:

“Is there an example of someone who’s not you that had a burning question that would drive some sort of research or development activity and got it answered by telemetry?”

I forwarded his email and got a pretty fun survey. See below for a slightly edited version of emails I got. In addition to positive experiences below, people had a lot of complains about the telemetry experience.

Justin Lebar
Justin is probably the most vocal telemetry user who isn’t me. He blogged about one of his more successful telemetry experiences.
Andrew McCreight
One telemetry stat I added is CYCLE_COLLECTOR_NEED_GC.  Sometimes a read barrier fails and we need to do a GC synchronously at the start of a CC, which is terrible for pause times.  Using telemetry, I confirmed my suspicion that this is very rare, and thus not worth trying to improve.
Another state Olli added is FORGET_SKIPPABLE_MAX, which tracks the length of the CC cleanup phases we run.  As we made the cleanup more and more thorough, the times of these got longer and longer.  I think eventually this led Olli to try to fix the worst case cleanup phases, in bug 747675.  He had this comment in there: “Based on the initial telemetry data, the patch doesn’t affect too much to the already low median times, but helps significantly with the worst 5%, so mean time decreases quite nicely.”
Also, back around Firefox 13, Olli was using telemetry to observe the results of various CC optimizations, to assess their effectiveness, which he then used to decide whether or not to nominate various patches for landing on Aurora 12.  Telemetry let us see the reward part of the risk vs. reward tradeoff, and get some pretty big improvements into 12.
Locally, I use about:telemetry to get a sense of what the behavior on my local machine has been, but I suppose that doesn’t really fall under “telemetry” per se.  But it was quite useful during the Cycle Collector Crisis to see what CC behavior people had been seeing on their machines.

Olli Pettay
Thanks Andrew, very accurate summary of what I’ve been doing
I tend to look at CC telemetry data daily. I very rarely use the histogram, since evolution is more interesting to me. Especially median time and also how P75 and P95 evolve. (The focus in this Q is to get lower bad times, so we should manage to drop P75 and P95)
I use also about:telemetry locally since I tend to run builds with some patch, and I want to see if they affect badly to CC or GC times.

Taras
My blog is basically a collection of telemetry trivia :)
I do not have testimonials from other people. However I heard that the silent-update team proved something about silent updates with telemetry, Necko team discovered that some optimizations were not, etc.
I encourage other people who solved a problem with telemetry to either wrote a blog post or leave a comment.

In part 3 I will cover flaws in the current Telemetry experience.

4 comments

  1. Just tried logging in to the dashboard for the first time and got stuck in some kind of redirect loop for a while. But that is probably some kind of browserid problem. It was also the first time I used browserid.

    But when I got in, the long flat list of measures seemed a bit overwhelming. Therefor, can telemetry currently tell which fraction of users have hardware acceleration enabled? (in the context of can you expect animation of css-transformations to be fast). It seemed that all users have canvas_webgl_used=1, but that can’t mean that it works for everybody.

  2. Yes, that broke as I was writing up my blog post ;(

    If you click on the telemetry histogram links, those should take you to the dashboard successfully.

  3. The security engineering team is also hoping to use telemetry around SSL to better understand what errors our users see and how often – as well as how often they click through the SSL warning.

  4. Do you have any thoughts on the hardware acceleration question in my second paragraph? (i.e. can telemetry currently answer what part of the users are running with hardware accelerated css and webgl?)