Summary: (Total Crashes)/(Active Daily Users) is a low resolution metric for crash trending. We can improve Firefox’s stability for more users if we understand the distribution of crashes.
Over the last few months, improving Firefox’s stability has become a top priority. And our early results are encouraging! Both our survey data and crash reporter data show a downward trend in crashes. We should be careful, however, not to read too much into this data.
Due to privacy concerns, we do not store user ids when collecting crash data. Accordingly, our crash data suffers from a number of limitations. For example, we can only determine the number of crashes per daily session, not per session length. As a result, the number of crashes per user will appear to rise if users browse more often per day. Changes to the crash reporter UI will similarly bias our data. A 5% increase in the reporter response rate will lead to a 5% increase in reported crashes.
Perhaps most importantly, the lack of a user id limits our ability to draw inferences about the distribution of our data. Why does this matter? If Firefox crashes are skewed, we may reduce overall crashes by 10%, but have 80% of users experience more crashes.
A quick look at the “Week in the Life of a Browser” study suggests that crashes are indeed highly skewed. By examining start-up events without corresponding shut-down events, Jono calculated the number of unexplained session interruptions per user. While there are other causes of session interruptions, such as a computer losing its power, we can reasonably assume that session interruptions serve as a (perhaps highly overstated) proxy for crashes.
Session interruptions per user does not take on the bell-curve shape of normally distributed data. Rather, it follows a power law distribution. 49% of users did not experience a single session interruption, while 70% experienced one or fewer. The mean number of session interruptions, however, was 1.4. If our crash data follows a similar distribution, the average crash per user metric tells us little about the experience of a typical Firefox user.
Anecdotal evidence supports this hypothesis. While we all know people who swear by Firefox’s stability, we also know people who complain of frequent failures. I, for one, haven’t experienced any crashes since upgrading to the 3.6 beta a few weeks ago.
With this in mind, I suggest we use Test Pilot to run a longitudinal study of true Firefox crashes. Because Test Pilot is opt-in and allows users to review their data before submitting it, we’re able to consider data at a more granular level. As with previous TP experiments, we will go to great lengths to respect the privacy of participants.
In addition to crash events and session length, I would like to collect data we can correlate with crashes. Firefox version, operating system, and Add-ons installed immediately come to mind.
Have suggestions for additional data that we should, or shouldn’t, collect? Please leave them in the comments.