How to fix a bug, episode 434494, part 2

At the end of part 1, Cameron McCormack had just tracked bug 434494 down to two .focus() calls in browser.js. It was a race condition: whichever .focus() call happened to run last got to keep the focus.

You might have wondered why these calls were happening in a random order to begin with. As we’ll see, Cameron was able to fix the bug without investigating this! But I’ll talk a little bit about it because it’s interesting, and it can bite ordinary web pages, too.

Opening files is a very common cause of non-determinism. You call a function like window.open() or XMLHttpRequest().open to make it happen, and the call returns right away, even though the document hasn’t finished loading yet, so the application can remain responsive. The system will send a load event later, when it’s done. Because loading a file takes a small but somewhat random amount of time, the order of that load event compared to other events can be slightly random.

The root cause of bug 434494 is indeed the timing of load events. But Cameron found a way to fix the bug without having to track down the root cause, as we’ll see—after a little detour…

       <heycam> ok, so we know we have these two focus calls running in either
                order
   <jorendorff> hang on
   <jorendorff> I've got a Firefox 4 nightly on my Mac. So I suppose I have a
                browser.js in here somewhere.
   <jorendorff> One hesitates to encourage people to edit their browser app
       <heycam> ah yes, if you want to try it
       <heycam> heh
   <jorendorff> but just for my entertainment...
   <jorendorff> I can just throw an alert() in here?
       <heycam> I don't know how well it works to just edit the browser.js in
                whatever jar it lives in
       <heycam> it might work
       <heycam> I tended to just do a rebuild
   <jorendorff> you edited the source file?
       <heycam> yeah
       <heycam> then make, as usual
       <heycam> but probably editing browser.js in situ is fine
   <jorendorff> ...What about a chrome debugger? Ever use one of those?
       <heycam> no, well only just yesterday
   <jorendorff> heh
   <jorendorff> let's try editing it in place ;)
       <heycam> ok :)
   <jorendorff>  /Applications/Nightly.app/Contents/MacOS/
   <jorendorff> omni.jar
       <heycam> I'm guessing so
   <jorendorff> chrome/browser/content/browser/browser.js
   <jorendorff> now to see if Emacs will let me edit this file in the middle
                of a jar
   <jorendorff> gosh, I think it will
       <heycam> would surprise me if it didn't ;)

If you’re on Mac or Linux, and you have Emacs, go ahead and find omni.jar in your Firefox installation directory, make yourself a nice backup copy, and poke around inside. If you like, you can make the exact change I made. Search the archive for browser.js, open it, then search for BrowserSearch_webSearch. Add an alert, like so:

in situ, as they say in auckland

Save it… and…

   <jorendorff> restarting... *fingers crossed*
   <jorendorff> hmm, no good.

It didn’t work. But are we discouraged? We are not. Later I asked about this on irc.mozilla.org, and I found out that you can force Firefox to read omni.jar again by passing the right command-line flag. First make sure Firefox is closed. Then:

  $ cd /Applications/Nightly.app/Contents/MacOS
  $ ./firefox --purgecaches

If you’re on Mac OS 10.5, you need to use a slightly different command to start Firefox:

  $ arch -i386 ./firefox-bin --purgecaches

And of course if you are using the official Firefox 4 instead of the unstable Nightly, you’ll find the file under /Applications/Firefox.app instead.

Then, when you hit Cmd+K:

achievement unlocked: hack

You might want to put your original omni.jar back when you’re done. Software updates run a bit more smoothly when you haven’t been manually editing the programs. :)

Back to the bug:

   <jorendorff> so did you manage to convince yourself that this is what was
                happening by using alerts?
       <heycam> yes
       <heycam> alerting what the currently focused element was, at the time
                of the gURLBar.focus call
       <heycam> my thought was, not knowing anything about the random
                ordering, to ensure that the gURLBar isn't focused if we'd
                already done the search bar focusing
       <heycam> so I made an assumption, which I verified
       <heycam> that I could tell whether something had been focused already,
                by seeing if commandDispatcher.focusedElement was null
   <jorendorff> ah
       <heycam> alerting just before gURLBar.focus is called, in say a
                Cmd+N new window, showed me that focusedElement was null

This led to Cameron’s first patch for the problem. He wrapped the code that calls gURLBar.focus() in an if block, so that it only happens if we haven’t already focused something else. You can see those changes here. (The left side shows the code before the fix; the right side is how it looks after the fix. Incidentally you can find that page yourself by visiting bug 434494, then clicking “Show obsolete patches”, then clicking on the word “Diff” next to the first patch.)

   <jorendorff> the first part, i see the if statement you added
   <jorendorff> but what is that second change?
       <heycam> see the comment above the setTimeout 0 call
       <heycam> clearly it's trying to do something to avoid the url bar focus
   <jorendorff> oh
   <jorendorff> heh - yeah, it looks like someone thought they fixed this
                before. :)
       <heycam> initially I only wrote the first hunk, and that worked.
   <jorendorff> That being done, the old setTimeout hack wasn't needed
                anymore, so you ripped it out.
       <heycam> exactly
       <heycam> there's no point in delaying the webSearch call in the
                setTimeout

The second half of the patch was just a little code cleanup.

Now Cameron had a working fix. He posted it in Bugzilla and requested a code review from Gavin Sharp, a long-time Firefox hacker who’s an expert in this area of the code. Gavin lives in Toronto.

Code reviews are interesting. You almost always learn something.

   <jorendorff> It looks like you spoke with gavin about this first patch
   <jorendorff> and you decided to make a few changes based on that.
       <heycam> yes
   <jorendorff> I don't suppose you have that chatlog...?
       <heycam> I do, hang on

Here is what Gavin said…

            <gavin> hm, that patch seems to be relying on the ordering of the load events
           <heycam> how's that?
            <gavin> and I'm not sure that it's safe to assume that focusedElement will be null on initial load
            <gavin> BrowserStartup runs off a load event
            <gavin> so does BrowserSearch.webSearch, with your change
           <heycam> I see
           <heycam> is there something I can listen for once the browser startup is done?
            <gavin> not at the moment
           <heycam> or maybe a way I could pass an argument into the browser
           <heycam> rather than listening for an event
            <gavin> oh, I lied
            <gavin> you could use browser-delayed-startup-finished
            <gavin> slightly messy because it's an observer notification, but doable
           <heycam> ok
           <heycam> I'll try that and post another patch. thanks!

Depending on .focusedElement was too iffy. Gavin suggested a different approach, which led Cameron to make a second version of the patch.

   <jorendorff> gavin left some new comments in the bug :)
       <heycam> yes, I plan to get to them today :)

Since my chat with Cameron last week, he posted a third version, which now bears Gavin’s seal of approval. It might be checked in by the time you read this. Sweet!

the plus means quality
       <heycam> the one hiccup was the time between my posting the original
                patch, and getting comments from gavin
   <jorendorff> Was that due to the Firefox 4 crunch?
       <heycam> probably. gavin also seems to have a long request queue.
       <heycam> so a week ago I pinged him on email
   <jorendorff> ah

So I glossed over something: the three months between Cameron’s first patch and his chat with Gavin.

I hate to admit it, but I still have review requests from that era too. Firefox 4 was in beta, monopolizing developers’ time, for months. A lot of valuable work got put on hold. We’re never doing that again, thank goodness.

But sometimes a code review gets lost in the shuffle anyway. If yours takes more than a few days, it’s important to speak up:

       <heycam> although I'm usually hesitant to prod people
       <heycam> but that mail got swallowed :(
       <heycam> so I pinged him on irc yesterday to ask him about it
       <heycam> in general I have no idea how long I should be letting my
                patches sit in someones queue before bugging them about it :)
   <jorendorff> At least 3 days. But never 2 weeks.
       <heycam> I think that aspect of the process should be emphasised to new
                contributors
       <heycam> getting review, expectations of turnaround time
   <jorendorff> As long as you're super nice about it, a reminder is usually
                welcome, in my experience
       <heycam> ok

So this is what open source development is like. There is some stuff that looks like work, stuff that takes multiple tries, delays and waiting. There are also cool things to tinker with, puzzles waiting to be solved, useful skills to learn, and smart people around the world who would love to help you.

Don’t be shy.

3 Responses to How to fix a bug, episode 434494, part 2

  1. Small nit: it’s –purgecaches (no dash in the middle), http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsAppRunner.cpp?rev=a2e48e0c44bb&mark=3453#3453

    Reading this series has been interesting (it’s been a while for me), though sadly I probably would have forgotten about the patch somewhere past month one.

  2. Yeah, having a patch sit for a month is something we, like, need to never do to anybody ever.

    Thanks for the correction; I fixed the post.

  3. For my about:memory revamp (bug 633653) I’ve found it extremely useful to be able to modify aboutMemory.js on the fly while the browser is running and just hit ‘reload’ for the new code to take effect. Not having to rebuild and restart the browser saves heaps of time. When I started doing that patch I had a lot more of the work being done on the C++ side of things, but eventually I ended up doing almost everything on the JS side.

    And the consequences of bad edits are probably lower than aboutMemory.js than they are with with browser.js, too.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>