Category Archives: Work habits

Working on the browser sucks

I’ve joked before that I hoped to never work on anything in Firefox that requires me to build the browser regularly.  That probably sounds weird unless you are a member of the JavaScript team, because it’s possible to do close to 100% of SpiderMonkey development just building the JS shell.

Working on the JS shell is great.  On my 2.5 year old Linux box it takes maybe 1 minute to build from scratch, rebuilds are almost instantaneous, the regression tests take a few minutes, shell start-up is instantaneous, you rarely have to do try server runs, tools like GDB and Valgrind are easy to run, and landing patches on the TraceMonkey repo is low-stress because breakage doesn’t affect too many people.

In comparison, working on the browser sucks.  Builds from scratch take 25 minutes, zero-change rebuilds take 1.5 minutes, single-change rebuilds take 3 or 4 minutes, the linking stage grinds my machine to a halt, cold starts take up to 20 seconds or more (warm starts are much better), the test suites are gargantuan, every change requires a try server run, tools like GDB and Valgrind require jumping though hoops (--disable-jemalloc, anyone?), and landing patches on mozilla-central is stressful.

Thanks to my recent about:memory work, I’ve had to experience this pain first-hand.  It’s awful.  Debugging experiments that would take 20 seconds in the shell take 5 minutes in the browser. I avoid ‘hg up’ as much as possible due to the slow rebuilds it usually entails.  How do all you non-JS people deal with it?  Maybe you just get used to it… but I figure there have to be some tips and tricks I’m not aware of.

(Nb: Why are rebuilds so slow?  Because configure is invoked every time?  Imprecise dependencies in makefiles?  Too much use of recursive make?  Bug 629668? Why do Fennec rebuilds seem to be slower than Firefox rebuilds?)

Kyle Huey told me how you can get away with rebuilding only parts of the browser.  Eg. if I only modify code under xpcom/, I can just do make -C <build>/xpcom && make -C <build>/toolkit/library and this reduces the rebuild time a bit.  The down-side is that when I screw it up, eg. by forgetting to rebuild a directoy that I changed, it creates Frankenbuilds, and realizing what I’ve done can end up taking a lot more time than I saved.

Another trick I worked out:  implement things in JavaScript.  Seriously!  While doing my about:memory revamp I originally had a big chunk of the work done on the C++ side.  I quickly realized that if I did most of it on the JavaScript side I could see the effect of most changes by simply copying aboutMemory.js from my source dir to my build dir and then reloading the page.  Much better than re-building and re-starting.

What else can I do?  Get a faster machine is the obvious option, I guess.  More cores would help, though linking would still be a bottleneck.  Do SSDs make a big difference?

Also, there’s talk of using more project branches and introducing mozilla-staging.  That would avoid the stress of landing on mozilla-central, but that’s really the smallest part of what I’m complaining about.

Any and all suggestions are welcome!  Please, I’m begging you.

Safer refactoring in Nanojit

Recently I’ve been making a lot of refactoring changes to Nanojit that I think of as “code hygiene” — changes that clean up the code, but don’t change its behaviour at all.  (Example 1, example 2.)  When making changes like these I like to do the changes carefully and gradually to make sure that I don’t affect behaviour unintentionally.  In Nanojit’s case, this means that the generated code shouldn’t change.  But it’s always possible to make mistakes, and sometimes a particular change is disruptive enough that you can’t evolve the code from one state to another without breaking it temporarily.

In such cases, if the regression tests pass it tells me that the new code works, but it doesn’t tell me if I’ve managed to avoid changing the behaviour as I intended.  So I’ve started doing comparisons of the native code generated by Nanojit for parts of SunSpider using TraceMonkey’s verbose output, which is controlled with the TMFLAGS environment variable.  For such changes, the generated code should be identical to an unchanged version.  The way I organise my workspaces is such that I always have an unchanged repository available, which makes such comparisons easy.

Except, the generated code is never quite identical because it contains addresses and the slightest change in the executable can affect these.  So I run the verbose output through this script:

perl -p -e 's/[0-9a-fA-F]{4,}/....../g'

It just replaces numbers with four or more digits with “……”, which is enough to filter out these address differences.

I tried this technique on all of SunSpider, but it turns out it doesn’t work reliably, because crypto-aes.js is non-deterministic — it contains a call to Date.getTime() which introduces enough non-determinism that TraceMonkey sometimes traces different code.  A couple of the V8 benchmarks have similar non-deterministic behaviours.  Non-determinism is a rather undesirable feature of a benchmark suite so I filed a SunSpider bug which is yet to be acted on.

Fortunately, experience has shown me that I don’t need to do code diffs on all of SunSpider to be confident that I haven’t changed Nanojit’s behaviour — doing code diffs on two or three of the bigger benchmarks suffices, as enough code is generated that even small behavioural differences are quickly found.

In fact, this code diff technique is useful enough that I’ve started using it sometimes even when I do want to change behaviour — I split such changes into two parts, one which doesn’t change the behaviour and one which does.  For example, when I added an opcode to distinguish between 64-bit integer loads and 64-bit integer floats I landed a non-behaviour-changing patch first, then did a follow-up change that improved x86-64 code generation.  This turned out to be a good idea because I introduced a regression with the follow-up change, and the fact that it was separated from the non-behavioural changes made it easy to determine what had gone wrong, because the regressing patch was much smaller than a combined patch would have been.

I also recently wrote a script to automate the diff process, which makes it that much easier to perform, which in turn makes it that much more likely that I’ll actually do it.

In summary, it’s always worth thinking about new ways to test your code, and when you find a good one, it’s worth making it as easy as possible to run those tests.

How I Work on Tracemonkey

After six months of working on Tracemonkey, I’ve built up a particular workflow — how I use Mercurial, arrange my workspaces, run tests, and commit code.  I thought it would be worth describing this in case it helps other developers improve their workflow, or perhaps so they can give me ideas on how to improve my own workflow.

Workspace Structure

I have two machines, an Ubuntu Linux desktop and a Mac laptop.  For both machines I use the same workspace structure.  All my Mozilla work is in a directory ~/moz/.  At any one time I have up to 10 workspaces.  ~/moz/ws0/ always contains an unmodified clone of the tracemonkey repository, created like so:

hg clone http://hg.mozilla.org/tracemonkey/ ~/moz/ws0

Workspaces ~/moz/ws1 through to ~/moz/ws9 are local clones of ~/moz/ws0/ in which I make modifications.  I create these workspaces like this:

hg clone ~/moz/ws0 ~/moz/wsN

Local hg clones are much cheaper than ones done over the network.  On my Linux box it takes about 45 seconds, on my Mac somewhere over 2 minutes;  it seems that laptops have slower hard disks than desktops.  In comparison, cloning hg.mozilla.org/tracemonkey/ can take anywhere from 5 to 30 minutes or more (I don’t know why there’s so much variation there).

I mostly work with the Javascript shell, called ‘js’, so I do most of my work in ~/moz/wsN/js/src/.  There are three ways I commonly build ‘js’.

  • Debug builds go in ~/moz/wsN/js/src/debug/.  I use these for most of my development and testing.
  • Optimised builds go in ~/moz/wsN/js/src/opt/.  I use these for measuring performance.
  • Optimised builds with symbols go in ~/moz/wsN/js/src/optg/.  I use these with Cachegrind, which needs optimised code with symbols to be useful.

I have a number of bash aliases I use to move around these directories:

alias m="cd ~/moz/"
alias m0="cd ~/moz/ws0/"
alias j0="cd ~/moz/ws0/js/src/"
alias j0d="cd ~/moz/ws0/js/src/debug/"
alias j0o="cd ~/moz/ws0/js/src/opt/"

and so on for the remaining workspaces ws1 through ws9.  I have a common bash config file that I use on both my machines;  whenever I change it I copy it to the other machine.  This is a manual process, which is not ideal, but in practice it works well enough.

I find nine workspaces for making changes is enough to cover everything I’m doing;  if I find myself needing more it’s because some of the existing ones have stagnated and I need to do some cleaning up.

Building ‘js’

I have three scripts, js_conf_debug, js_conf_opt, js_conf_optg, which configure and build from scratch.  Here is js_conf_debug, the others are similar:

#! /bin/sh

if [ -z $1 ] ; then
    echo "usage: $0 <dirname>"
elif [ -d $1 ] ; then
    echo "directory $1 already exists"
else
    autoconf2.13
    mkdir $1
    cd $1
    CC='gcc -m32' CXX='g++ -m32' AR=ar ../configure \
        --enable-debug --disable-optimize --target=i686-pc-linux-gnu
    make --quiet -j 2
fi

These are scripts rather than bash aliases or functions because they are quite different on the Linux machine and the Mac.

I also have this alias for incremental builds:

alias mq="make --quiet -j 2"

Testing ‘js’

The program I run most is trace-test.js.  So much so that I have more aliases for it:

alias jsott="time opt/js -j trace-test.js"
alias jsdtt="time debug/js -j trace-test.js"

I don’t need an alias for the optg build because that’s only used with Cachegrind, which I run in a different way (see below).

I run the JS unit test with the following script:

function js_regtest
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: js_regtest <ws-number-1> <ws-number-2>"
    else
        xdir=$HOME/moz/ws$x/js/src/debug
        ydir=$HOME/moz/ws$y/js/src/debug
        echo "############################"
        echo "## COMPILING $xdir"
        echo "############################"
        cd $xdir && mq
        echo "############################"
        echo "## COMPILING $ydir"
        echo "############################"
        cd $ydir && mq
        cd $ydir/../../tests
        echo "############################"
        echo "## TESTING $xdir"
        echo "############################"
        time jsDriver.pl \
            -k \
            -e smdebug \
            --opt '-j' \
            -L spidermonkey-n.tests slow-n.tests \
            -f base.html \
            -s $xdir/js && \
        echo "############################"
        echo "## TESTING $ydir"
        echo "############################"
        time jsDriver.pl \
             -k \
             -e smdebug \
             --opt '-j' \
             -L spidermonkey-n.tests slow-n.tests \
             -L base-failures.txt \
             -s $ydir/js
    fi
}

An example invocation would be:

js_regtest 0 3

The above invocation first ensures a debug ‘js’ is built in workspaces 0 and 3.  Then it runs ~/moz/ws0/js/src/debug/js in order to get the baseline failures, which are put in base-failures.txt.  Then it runs ~/moz/ws3/js/src/debug/js and compares the results against the baseline.  The -L lines skip the tests that are really slow;  without them it takes hours to run.  I time each invocation just so I always know roughly how long it takes;  it’s a bit over 10 minutes to do both runs.  It assumes that workspace 0 and 3 correspond to the same hg revision;  perhaps I could automate that to guarantee it but I haven’t (knowingly) got that wrong yet so haven’t bothered to do so.

Timing ‘js’

I time ‘js’ by running SunSpider.  I obtained it like so:

svn http://svn.webkit.org/repository/webkit/trunk/SunSpider ~/moz/SunSpider

I haven’t updated it in a while, I hope it hasn’t changed recently!

I run it with this bash function:

function my_sunspider
{
    x=$1
    y=$2
    n=$3
    if [ -z $x ] || [ -z $y ] || [ -z $n ] ; then
        echo "usage: my_sunspider <ws-number-1> <ws-number-2> <number-of-runs>"
    else
        for i in $x $y ; do
            dir = $HOME/moz/ws$i/js/src/opt
            cd $dir || exit 1
            make --quiet || exit 1
            cd ~/moz/SunSpider
            echo "############################"
            echo "####### TESTING ws$i #######"
            echo "############################"
            time sunspider --runs=$n --args='-j' --shell $dir/js > opt$i
         done

         my_sunspider_compare_results $x $y
    fi
}

function my_sunspider_compare_results
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: my_sunspider_compare_results <ws-number-1> <ws-number-2>"
    else
        sunspider-compare-results \
            --shell $HOME/moz/ws$x/js/src/opt/js opt$x opt$y
    fi
}

An invocation like this:

my_sunspider 0 3 100

will ensure that optimised builds in both workspaces are present, and then compare them by doing SunSpider 100 runs.  That usually gives what SunSpider claims as +/-0.1% variation (I don’t believe it, though).  On my Mac this takes about 3.5 minutes, and 100 runs is enough that the results are fairly reliable, certainly more so than the default of 10 runs.  But when testing a performance-affecting change I like to do some timings, wait until a few more patches have landed in the tree, then update and rerun the timings — on my Mac I see variations of 5-10ms regularly due to minor code differences.  Timing multiple versions like this gives me a better idea of whether a timing difference is real or not.  Even then, it’s still not easy to know for sure, and this can be frustrating when trying to work out if an optimisation I applied is really giving a 5ms speed-up or not.

On my Linux box, I have to use 1000 runs to get +/-0.1% variation.  This takes about 25 minutes, so I rarely do performance-related work on this machine.  I don’t know why Linux causes greater timing variation.

Profiling ‘js’ with Cachegrind

I run Cachegrind on ‘js’ running SunSpider with this bash function:

function cg_sunspider
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: cg_sunspider <ws-number-1> <ws-number-2>"
    else
        for i in $x $y ; do
            dir = $HOME/moz/ws$i/js/src/optg
            cd $dir || exit 1
            make --quiet || exit 1
            cd ~/moz/SunSpider
            time valgrind --tool=cachegrind --branch-sim=yes --smc-check=all \
                --cachegrind-out-file=cachegrind.out.optg$i \
                --auto-run-dsymutil=yes \
                $dir/js `cat ss0-args.txt`
            cg_annotate --auto=yes cachegrind.out.optg$i > ann-optg$i
        done
    fi
}

ss0-args.txt contains this text:

-j -f tmp/sunspider-test-prefix.js -f resources/sunspider-standalone-driver.js

What this does is run just the main SunSpider program, once, avoiding all the start-up processes and all that.  This is important for Cachegrind — it means that I can safely use –cachegrind-out-file to name a specific file, which is not safe if running Cachegrind on a program involving multiple processes.   (I think this is slightly dangerous… if you run ‘sunspider –ubench’ it seems to change one of the above .js files and you have to rerun SunSpider normally to get them back to normal.)  I use –branch-sim=yes because I often find it to be useful; at least twice recently it has helped me identify performance problems.

If I want to focus on a particular Cachegrind statistic, e.g. D2mr (level 2 data read misses) or Bim (indirect branch mispredictions) then I rerun cg_annotate like this:

cg_annotate --auto=yes --show=I2mr --sort=I2mr cachegrind.out.optgN > ann-optgN-I2mr

Profiling ‘js’ with Shark

To profile ‘js’ with Shark, I use SunSpider’s –shark20 and –test options.  I don’t have this automated yet, I probably should.

Managing Changes with Mercurial

Most of my changes are not that large, so I leave them uncommitted in a workspace.  This is primitive, but has one really nice feature:  when pulling and updating, hg merges the changes and marks conflicts in the nice “<<<” “>>>” way.

In comparison, with Mercurial queues (which I tried for a while) you have to pop your patches, update, then push them, and it uses ‘patch’ to do the merging.  And I hate ‘patch’ because conflicted areas tend to be larger, and because they go in a separate reject file rather than being inserted inline.

I also avoid doing local commits unless I’m working on something really large just because the subsequent merging is difficult (at least, I think it’s difficult;  my Mercurial knowledge still isn’t great).  In that case I do local commits until the change is finished, then apply the patch (using ‘hg diff’ and ‘patch’) in a single hit to a newly cloned tree — given Mozilla’s use of Bugzilla, the change will have to be a single patch anyway so this aggregation step has to happen at some point.

Pre-push Checklist

Before landing any patch, I do my best to work through the following check-list.  I created this list recently after having to back out several commits due to missing one of the above steps;  I give examples of breakage I’ve caused in square brackets.

  • Ensure there are no new compiler warnings for ‘js’ for optimised and debug builds.  [I managed to introduce some warnings on an optimised build recently for what was supposedly a whitespace-only change!]
  • Ensure ‘js’ runs trace-test.js without failures, for optimised builds, debug builds, debug builds with TMFLAGS=full (to test the verbose output) under Valgrind (to test for memory errors).  [I've had to back out several patches due to breaking TMFLAGS=full]
  • Ensure lirasm builds and passes its tests for both optimised and debug builds.  [I've forgotten this numerous times, leaving lirasm in a broken state, which is why I created bug 503449].
  • Ensure unit tests pass with a debug build.  [Amusingly enough, I don't think I've ever caused breakage by forgetting this step!]
  • (For any commit that might affect performance) Check SunSpider timings with an optimised build.
  • (For complex changes) Check the patch on the try servers. (Nb: they run optimised builds, so will miss assertion failures among other things)
  • (For changes affecting the ARM backend) Check the patch builds and runs trace-test.js (using a debug build) on my virtual qemu+ARM/Linux machine.
  • Check tinderbox to make sure the tree is open for commits.  [When the tree is closed, there's no mechanism that actually prevents you from committing.  I had to back-out a patch during a tinderbox upgrade because of this.]

It’s quite a list, and I don’t usually do anything with a browser build, when I probably should, so that would make it even longer.  And there are other things to get wrong… for example, I never test the –disable-jit configuration and I broke it once.

Pushing

When I’m ready to push a change, I make sure my workspaces are up-to-date with respect to the Mozilla repo.  I then commit the change to my modified repo, then push it from there into ~/moz/ws0/, then check ‘hg outgoing -p’ on that repo to make sure it looks ok, and then push to the Mozilla repo from there.  I try to do this quickly so that no-one else lands something in the meantime;  this has only happened to me once and I tried to use ‘hg rollback’ to undo my local changes which I think should have worked but seemingly didn’t.

Post-push Checklist

After committing, I do these steps:

  • Mark the bug’s “whiteboard” field as “fixed-in-tracemonkey”.
  • Put a link to the commit in a comment for the bug, of the form http://hg.mozilla.org/tracemonkey/rev/<revhash>/.  I always test the link before submitting the comment.

Conclusions

That’s a lot of stuff.  Two of my more notable conclusions are:

  • Automation is a wonderful thing.  In particular, having scripts for the complicated tasks (e.g. running the unit tests, running sunspider, running sunspider under Cachegrind) has saved me lots of time and typing (and lots of head-scratching and re-running when I realised I forgot some command line option somewhere).  And this automation was made much easier once I settled on a standard workspace+build layout.
  • The pre-push checklist is both disconcertingly long and disconcertingly incomplete.  And I had to work it out almost entirely by myself — I’m not aware of any such check-list documented anywhere else.  Having lots of possible configurations really hurts testability.  I’m not sure how to improve this.

If you made it this far, congratulations!  That was pretty dry, especially if you’re not a Tracemonkey developer.  I’d love to hear suggestions for improving what I’m doing.