Crash Analysis: now in Open Source flavor

Mike Morgan

7

History can tell you that companies don’t disclose crashes in their software. They keep a pretty close eye on what crashes and bugs are disclosed.

Mozilla doesn’t.

Rather than being the exception, openness is the rule, and that is one of the coolest things about being a part of this. My job, my everday tasks, they aren’t secret, and they are not to drive profits. They are to drive the web.

soccorro screenshot

In that spirit, our crash reporting system (Socorro) is available to whoever wants to view it. Aside from user-bound statistics, crash information is available in full and anybody in the community can learn about where in the code their client crashed. They can also help provide hints or comments about what they were doing at the time they crashed.

This opens the door for the community to learn valuable things about their software and how they use it:

  • What crashes the most? What crashes the most over time? What is the breakdown across branches, versions and products?
  • Where did we crash? Crash signatures provide a head start for locating the cause for a crash. From there, full stack traces are available to analyze callback and find the source of the actual crash.
  • What was installed? What modules were installed for a given crash? Soon we will also be able to understand what extensions were installed so we can understand the correlation between core client crashes and crashes caused by faulty extensions. The end result is a closer relationship with the extension developer community and better quality in our add-ons space.
  • How are we doing? Overall the jackpot question is — are we crashing more or less? How are we doing with this beta, alpha or rc1? Are we regressing in real-life situations despite positive automated testing results??

All of this was possible because of a collaborative effort between quite a few parties:

  • Mark Mentovai and the breakpad team, for writing a great client and processor under a flexible open source license that is easy to integrate
  • Ted Mielczarek for his work on the client, processor and integrating the project into Firefox 3
  • Benjamin Smedberg and Robert Sayre for their work in getting the initial versions of the breakpad server off the ground

Where do we go from here?

Of the many projects we have in 2008, this is one of the most exciting. It’s an opportunity to open up information that hasn’t historically been available to the masses, and hack on a great tool for improving the quality of all Mozilla projects

7 responses

  1. Andy Burns wrote on :

    Crash reported only seems to fire on about 60% of my crashes with 3.0b5 on windows, if that’s typical it will skew your numbers.

  2. morgamic wrote on ::

    Andy – this is related to a bug about how Flash handles crashes when it’s active;
    https://bugzilla.mozilla.org/show_bug.cgi?id=422308

    Were some of your crashes happening when Flash was active? We have seen a similar skew and it sucks, but is known and being worked on.

  3. AndersH wrote on :

    I’ve installed flashblock just to get to see the crash reporter (instead of the application error dialog). But since I have the debugging tools installed, are there a form where I could upload a minidump (or whatever is needed) into Socorro?

  4. Dave Miller wrote on ::

    Having our crash stats available to the public in itself isn’t something new; we’ve had crash reports available to the public for several years via http://talkback-public.mozilla.org/ and you can still see current crashes from Firefox and Thunderbird 2.x there.

    The difference is that Talkback is proprietary software that we haven’t been able to hand out the source for, or make any really useful changes to, so it’s been mostly the same stale code for several years, and could only be included in official builds. Breakpad is open and free, and anyone can make a build that includes it.

    Socorro (the server-side component) is also open and free, and runs on modern hardware/software (thus making IT very happy). The Talkback collector server runs on Solaris 8 (and very likely won’t run on newer versions of Solaris), and the digester that processes the reports requires Windows (eek). IT will be *very* *very* happy when all of the Talkback-enabled products are finally end-of-lifed so that we can get rid of those ancient servers. :-)

  5. morgamic wrote on ::

    @Dave – good points, I am not very familiar with the Talkback system, but I think it’s great that in both cases we’ve made an effort to make that information public.

  6. Andy Burns wrote on :

    @morgamic

    yes flash seems to be behind quite a few of the crashes I get, seems quite a spike in frequency from b4 to b5.

    Thanks.

  7. digi-tv wrote on ::

    The Talkback collector server runs on Solaris 8 (and very likely won’t run on newer versions of Solaris), and the digester that processes the reports requires windows