An inquisitive mind sent me an email with a pointed question:
“Is there an example of someone who’s not you that had a burning question that would drive some sort of research or development activity and got it answered by telemetry?”
I forwarded his email and got a pretty fun survey. See below for a slightly edited version of emails I got. In addition to positive experiences below, people had a lot of complains about the telemetry experience.
Justin is probably the most vocal telemetry user who isn’t me. He blogged about one of his more successful telemetry experiences.
One telemetry stat I added is CYCLE_COLLECTOR_NEED_GC. Sometimes a read barrier fails and we need to do a GC synchronously at the start of a CC, which is terrible for pause times. Using telemetry, I confirmed my suspicion that this is very rare, and thus not worth trying to improve.
Another state Olli added is FORGET_SKIPPABLE_MAX, which tracks the length of the CC cleanup phases we run. As we made the cleanup more and more thorough, the times of these got longer and longer. I think eventually this led Olli to try to fix the worst case cleanup phases, in bug 747675. He had this comment in there: “Based on the initial telemetry data, the patch doesn’t affect too much to the already low median times, but helps significantly with the worst 5%, so mean time decreases quite nicely.”
Also, back around Firefox 13, Olli was using telemetry to observe the results of various CC optimizations, to assess their effectiveness, which he then used to decide whether or not to nominate various patches for landing on Aurora 12. Telemetry let us see the reward part of the risk vs. reward tradeoff, and get some pretty big improvements into 12.
Locally, I use about:telemetry to get a sense of what the behavior on my local machine has been, but I suppose that doesn’t really fall under “telemetry” per se. But it was quite useful during the Cycle Collector Crisis to see what CC behavior people had been seeing on their machines.
Thanks Andrew, very accurate summary of what I’ve been doing
I tend to look at CC telemetry data daily. I very rarely use the histogram, since evolution is more interesting to me. Especially median time and also how P75 and P95 evolve. (The focus in this Q is to get lower bad times, so we should manage to drop P75 and P95)
I use also about:telemetry locally since I tend to run builds with some patch, and I want to see if they affect badly to CC or GC times.
My blog is basically a collection of telemetry trivia
I do not have testimonials from other people. However I heard that the silent-update team proved something about silent updates with telemetry, Necko team discovered that some optimizations were not, etc.
I encourage other people who solved a problem with telemetry to either wrote a blog post or leave a comment.
In part 3 I will cover flaws in the current Telemetry experience.