The Socorro crash report accumulation pipeline does not process all the crash reports. Though every report is stored on disk, only 10% are processed and saved in HBase as JSON objects. Each crash report has a crash signature (Crash Report Signature or CRS for short). The relationship between crash reports and CRSs is many to one.
Consumers of the crash reports (engineers working on bugfixes, product managers to name a few) had concerns regarding the use of samples. For example, some asked if the 10% sampling is a viable sampling rate to accurately estimate the frequency of the CRSs and if not all of them, then how accurate are the top N most frequently observed crash report signatures? With FF’s usage running into the 100 millions, we can expect new CRSs to be coming in every day. Some are very rare (occurs for a small user base) and others more frequent. How many days can we expect to wait till we see 50% of all the CRS that come in (for a given version)?
To answer these questions, the #breakpad team processed every crash report for the week 03/22-29/2011 , post Firefox 4 release. This served as a full enumeration of the crash report data. The full enumeration contained 2.2MM crash reports belonging to 84,760 CRSs.
Primarily, the crash-stats dashboard lists the top 100 most frequent crashes by OS. Some questions,
- How accurate are the sample estimates? Does the top 100 from a sample equal the top 100 from the full enumeration (population) and are the proportion estimates accurate?
- Given estimates, can we provide something about their accuracy?
- How many distinct crash types are there? Throttling is a random sample of incoming crash reports. If in a 10% sample, we observe ‘N’ CRSs, can we estimate how many there in the population i.e. how many haven’t we seen? Estimating the number of unique CRS is entirely different from estimating the proportions of the CRS.
For more read: http://people.mozilla.org/~sguha/species.crash.report.html