{"id":5178,"date":"2012-02-21T16:21:31","date_gmt":"2012-02-21T23:21:31","guid":{"rendered":"http:\/\/blog.mozilla.org\/metrics\/?p=5178"},"modified":"2019-09-18T12:05:34","modified_gmt":"2019-09-18T19:05:34","slug":"sampling-crash-volumes-rates-and-rarity-for-socorro-samples","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/metrics\/2012\/02\/21\/sampling-crash-volumes-rates-and-rarity-for-socorro-samples\/","title":{"rendered":"Sampling Crash Volumes, Rates and Rarity for Socorro Samples"},"content":{"rendered":"<h3 id=\"sec-1\">1\u00a0Introduction<\/h3>\n<div id=\"text-1\">\n<p>The Socorro crash report accumulation pipeline does not process all the crash reports. Though every report is stored on disk, only 10% are processed and saved in HBase as JSON objects. Each crash report has a crash signature (Crash Report Signature or CRS for short). The relationship between crash reports and CRSs is many to one.<\/p>\n<p>Consumers of the crash reports (engineers working on bugfixes, product managers to name a few) had concerns regarding the use of samples. For example, some asked if the 10% sampling is a viable sampling rate to accurately estimate the frequency of the CRSs and if not all of them, then how accurate are the top N most frequently observed crash report signatures? With FF&#8217;s usage running into the 100 millions, we can expect new CRSs to be coming in every day. Some are very rare (occurs for a small user base) and others more frequent. How many days can we expect to wait till we see 50% of all the CRS that come in (for a given version)?<\/p>\n<p>To answer these questions, the #breakpad team processed every crash report for the week 03\/22-29\/2011 , post Firefox 4 release. This served as a full enumeration of the crash report data. The full enumeration contained 2.2MM crash reports belonging to 84,760 CRSs.<\/p>\n<p>Primarily, the crash-stats dashboard lists the top 100 most frequent crashes by OS. Some questions,<\/p>\n<ul>\n<li>How accurate are the sample estimates? Does the top 100 from a sample equal the top 100 from the full enumeration (population) and are the proportion estimates accurate?<\/li>\n<li>Given estimates, can we provide something about their accuracy?<\/li>\n<li>How many distinct crash types are there? Throttling is a random sample of incoming crash reports. If in a 10% sample, we observe &#8216;N&#8217; CRSs, can we estimate how many there in the population i.e. how many haven&#8217;t we seen? Estimating the number of unique CRS is entirely different from estimating the proportions of the CRS.<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>For more read: \u00a0<a title=\"http:\/\/people.mozilla.org\/~sguha\/species.crash.report.html\" href=\"http:\/\/people.mozilla.org\/~sguha\/species.crash.report.html\">http:\/\/people.mozilla.org\/~sguha\/species.crash.report.html<\/a><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>1\u00a0Introduction The Socorro crash report accumulation pipeline does not process all the crash reports. Though every report is stored on disk, only 10% are processed and saved in HBase as JSON objects. Each crash report has a crash signature (Crash &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/metrics\/2012\/02\/21\/sampling-crash-volumes-rates-and-rarity-for-socorro-samples\/\">Continue reading<\/a><\/p>\n","protected":false},"author":263,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/posts\/5178"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/users\/263"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/comments?post=5178"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/posts\/5178\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/media?parent=5178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/categories?post=5178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/metrics\/wp-json\/wp\/v2\/tags?post=5178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}