Categories
Personal

Zachary Blake Nethercote

Nick, Phoebe and Keira are pleased to announce the birth of Zachary Blake Nethercote, born on May 24, 2010, weighing 3.45kg (7lb 9oz).

Everyone is healthy and doing well!

Categories
Correctness Tinderbox

Green

I landed a TraceMonkey patch yesterday that was completely green on the tinderbox pushlog:

All green TBPL!

I don’t think that’s ever happened to me before.  Even the full tinderbox was completely green except for a single burning N810 Talos column.

I pushed another patch afterwards, that one didn’t get all green 🙁

Categories
Correctness Programming Tracemonkey

A win for code hygiene

I’ve written recently about clean-ups I’ve been making in Nanojit — simplifying code, refactoring, adding more internal sanity checking.  I’m a firm believer that these kinds of changes are worth doing, that “if it ain’t broke, don’t fix it” is not always true.  But there’s always a voice in the back of my head wondering “am I just polishing this code for little gain?  Should I be working on something else instead?”  Yesterday I received confirmation that this work has been worthwhile.  But before I can explain why, I need to give some background.

Nanojit serves as the back-end of TraceMonkey.  It converts LIR (a low-level intermediate representation) generated by TraceMonkey’s front-end into native code.  So Nanojit is a critical part of TraceMonkey.  And TraceMonkey is a critical part of Firefox.

However, code generators (and register allocators) are tricky beasts — easy to get wrong and hard to debug.  (A long time ago I heard someone, I can’t remember who, say “code generators and garbage collectors should crash as early and noisily as possible”.)  So with a piece of code this important and inherently difficult, it’s a good idea to arm yourself with some weapons that give you confidence that it is correct.

Weapon 1: clean IR semantics

One way to do this is to give your IR clean semantics.  LIR’s semantics are mostly clean, but there are a few nasty cases.  One of the worst is the ‘ov’ instruction.  It performs an overflow check on an arithmetic (add, mul, sub, neg) operation.

JavaScript numbers are doubles by default, but TraceMonkey sometimes demotes them to integers for speed.  TraceMonkey has to check for integer overflow in these cases;  if an integer does overflow TraceMonkey has to step back and use doubles for the computation.

Here’s some example LIR that uses ‘ov’:

 ld6 = ld sp[-8]
 add1 = add ld6, 1
 ov1 = ov add1
 xt4: xt ov1 -> pc=0x883ed8 imacpc=0x0 sp+0 rp+0 (GuardID=218)

The first instructions loads a value from the stack.  The second instruction adds 1 to that value.  The third instruction performs an overflow check and puts the result in ‘ov1’.  The fourth instruction is a guard;  if ‘ov1’ is true it exits the code block.

So why is ‘ov’ semantically nasty?  First consider ‘add’, which is a semantically cleaner instruction.  Its result (in this case ‘add1’) is a function of its inputs (‘ld6’ and 1).  In comparison, ‘ov’ does not have this property.  The ‘ov’ doesn’t really take ‘add1’ as an input —  you can’t tell by looking at ‘add1’ whether the addition overflowed.  In fact you can’t really understand what is happening here at the LIR level;  you have to know what happens in the native code that is generated from this LIR.  It turns out that the native code for the ‘add’ also sets some condition codes, and the native code generated for the ‘ov’/’xt’ pair inspects those condition codes.  For this to work, the ‘add’ has to immediately precede the ‘ov’;  if it does not, whatever intermediate native code that is generated could overwrite the condition codes, in which case the behaviour of the guard becomes completely unpredictable.  This is obviously bad.

The real problem is that the addition has two outputs:  the result, and the overflow status indicator (which maps to condition codes in the hardware).  But LIR has no explicit way to represent that second output.  So the second output is implicit, and an extra constraint must be imposed on the LIR instead, i.e. ‘ov’ must immediately follow an arithmetic operation.  Because this constraint is so arbitrary it’s easy to forget and easy to break.

In May 2009 Julian Seward opened a bug to improve LIR’s semantics, giving five suggestions.  It generated a lot of discussion and Julian drafted a patch to replace ‘ov’ with some cleaner opcodes, but there was enough disagreement that it never went anywhere.  But I didn’t forget about it, and ‘ov’ has annoyed me enough times recently that I decided to resurrect Julian’s idea, this time in a new bug to escape the baggage of the old one.  The idea is simple: if ‘ov’ always has to follow an arithmetic operation, then why not create new opcodes that fuse the two parts together?  These new opcodes are ‘addxov’, ‘subxov’ and ‘mulxov’.  With my patch the code from above now looks like this:

 ld6 = ld sp[-8]
 add1 = addxov ld6, 1 -> pc=0x883ed8 imacpc=0x0 sp+0 rp+0 (GuardID=218)

The ‘addxov’ adds ‘ld6’ and 1, puts the result in ‘add1’, and exits if there was an overflow.  The generated native code is unchanged, but the implicit output of the addition is now hidden within a single LIR instruction, and it’s impossible for overflow checks in the native code to become separated from the potentially-overflowing arithmetic operation.

Weapon 2: strong IR sanity checking

Compilers are usually built in a pipeline fashion, where you have multiple passes over the code representation, and each pass does a task such as optimisation, register allocation or code generation.  It’s also a really good idea to have a pass that does a thorough sanity check of the code representation.

In November 2008 Jim Blandy filed a bug suggesting such a pass for Nanojit, one that performs type-checking of the LIR.  The bug sat untouched for over six months until some extra discussion occurred and then (once again) Julian wrote a patch.  It found at least one case where TraceMonkey was generating bad LIR code, but again the bug stalled, this time because we spent three months merging Mozilla’s and Adobe’s copies of Nanojit.  I resurrected the bug again recently and added some “structure checking” for things like the ‘ov’ constraint, and it landed in the TraceMonkey repository on January 21st.  Happily, my version of the checker was simpler than Julian’s;  this was possible because LIR had had some other semantic clean-ups (e.g. properly distinguishing 64-bit integers from 64-bit floats).  My patch replaced a very basic type-checker (called “SanityFilter”) that had been in TraceMonkey, and immediately found two bugs, one of which involved ‘ov’.

Synergy

It’s not often that a bug report involving your code makes you smile.  Yesterday Gary Kwong hit an assertion failure in Nanojit when fuzz-testing:

    LIR structure error (end of writer pipeline):
    in instruction with opcode: ov
    argument 1 has opcode: add
    it should be: located immediately prior, but isn't
  One way to debug this:  change the failing NanoAssertMsgf(0, ...) call to a
  printf(...) call and rerun with verbose output.  If you're lucky, this error
  message will appear before the block containing the erroneous instruction.

In other words, there’s an ‘ov’ following an ‘add’, but there are one or more other LIR instructions between them.  This means that the code generated will be bogus, as I explained above.  This bug may have been in TraceMonkey for a long time, but it wasn’t detected until strong internal sanity checking was added.  The good news is that the ‘ov’-removal patch (which hasn’t landed as it’s still awaiting review) will fix this patch.

Lessons learned

This story tied together a number of related ideas for me.  In particular:

  • Clean IR semantics are worth the effort.  Hacks may save time initially but they will come back to bite you.
  • Strong IR sanity checking is worth the effort.  (And cleaner semantic allows for stronger checking.)
  • Listen to Julian, especially when he talks about correctness.  (And read his code!  VEX/pub/libvex_ir.h in the Valgrind source code describes Valgrind’s IR, which is a wonderfully clean IR, particularly in the way it separates statements (which have side-effects) from expressions (which don’t).)
  • Don’t put too many ideas into one bug report.  Too much discussion can kill a bug, or at least put it in a coma.
  • Follow-through is important.  A patch can’t do much good unless it lands.
  • Fuzz testing is awesome (but we already knew that).
Categories
Programming Tracemonkey

My Contribution to Firefox 3.6

Firefox 3.6 is the first Firefox release that has my code in it.  Much of it was written over six months ago, so it’s nice that it’s finally seeing the light of day.  It’s also fun to know that my code is running on the machines of my friends and family;  that’s new for me.

One friend asked what my specific contribution to the release was.  I said “in some cases it might be a little faster and a little less likely to crash”.  At the December all-hands, when people asked me what I’ve been working on, I said “filing the sharp edges off Nanojit“.  Nanojit is the JIT compiler back-end used by TraceMonkey;  it translates a low-level intermediate representation called LIR into native code.  It has some clever features and does a pretty good job in many respects but a year ago a lot of its code was, well, awful.  It’s now much better:  simpler, faster, with fewer ways to go wrong, more internal sanity checking, and better testing (including some fuzzing).  And many of these improvements didn’t make it into 3.6.

I have a few more fixes to make, but Nanojit improvements are winding down, and my attention will soon be focused elsewhere within TraceMonkey.  I thought that I would be spending some time this year improving the quality of Nanojit’s code generation, but I’ve found that the quality is already pretty good.  This perhaps isn’t so surprising because LIR is so low-level that it’s pretty close to machine code.  However, while determining that I found that the LIR generated by the TraceMonkey front-end is often awful.  It looks like there is some fat to trim there!

Categories
Correctness Programming Tracemonkey Work habits

Safer refactoring in Nanojit

Recently I’ve been making a lot of refactoring changes to Nanojit that I think of as “code hygiene” — changes that clean up the code, but don’t change its behaviour at all.  (Example 1, example 2.)  When making changes like these I like to do the changes carefully and gradually to make sure that I don’t affect behaviour unintentionally.  In Nanojit’s case, this means that the generated code shouldn’t change.  But it’s always possible to make mistakes, and sometimes a particular change is disruptive enough that you can’t evolve the code from one state to another without breaking it temporarily.

In such cases, if the regression tests pass it tells me that the new code works, but it doesn’t tell me if I’ve managed to avoid changing the behaviour as I intended.  So I’ve started doing comparisons of the native code generated by Nanojit for parts of SunSpider using TraceMonkey’s verbose output, which is controlled with the TMFLAGS environment variable.  For such changes, the generated code should be identical to an unchanged version.  The way I organise my workspaces is such that I always have an unchanged repository available, which makes such comparisons easy.

Except, the generated code is never quite identical because it contains addresses and the slightest change in the executable can affect these.  So I run the verbose output through this script:

perl -p -e 's/[0-9a-fA-F]{4,}/....../g'

It just replaces numbers with four or more digits with “……”, which is enough to filter out these address differences.

I tried this technique on all of SunSpider, but it turns out it doesn’t work reliably, because crypto-aes.js is non-deterministic — it contains a call to Date.getTime() which introduces enough non-determinism that TraceMonkey sometimes traces different code.  A couple of the V8 benchmarks have similar non-deterministic behaviours.  Non-determinism is a rather undesirable feature of a benchmark suite so I filed a SunSpider bug which is yet to be acted on.

Fortunately, experience has shown me that I don’t need to do code diffs on all of SunSpider to be confident that I haven’t changed Nanojit’s behaviour — doing code diffs on two or three of the bigger benchmarks suffices, as enough code is generated that even small behavioural differences are quickly found.

In fact, this code diff technique is useful enough that I’ve started using it sometimes even when I do want to change behaviour — I split such changes into two parts, one which doesn’t change the behaviour and one which does.  For example, when I added an opcode to distinguish between 64-bit integer loads and 64-bit integer floats I landed a non-behaviour-changing patch first, then did a follow-up change that improved x86-64 code generation.  This turned out to be a good idea because I introduced a regression with the follow-up change, and the fact that it was separated from the non-behavioural changes made it easy to determine what had gone wrong, because the regressing patch was much smaller than a combined patch would have been.

I also recently wrote a script to automate the diff process, which makes it that much easier to perform, which in turn makes it that much more likely that I’ll actually do it.

In summary, it’s always worth thinking about new ways to test your code, and when you find a good one, it’s worth making it as easy as possible to run those tests.

Categories
Uncategorized

Nanojit test coverage

On i386, Nanojit has two basic modes of operation:  SSE, and non-SSE.  Non-SSE is for old machines that don’t support SSE instructions.  (It might actually be SSE2 instructions, I’m not sure.)  My two machines both support SSE and so the non-SSE is never exercised unless I specify the environment variable X86_FORCE_SSE=no.  Since this invocation doesn’t exactly roll off the fingertips, I don’t do it often.  It’s also easy to mistype, in which case the normal SSE code will be run but I probably won’t notice and so I’m testing something different to what I think I am testing.

I recently committed a patch (bug 516348) that broke the non-SSE mode.  (It may have also broken the SSE mode, but in a less obvious way.)  Whenever I land a patch that breaks something, I try to work out if I could have avoided the breakage through a better process.  In this case I could have, through automation.  I now have the following set of aliases and functions in my .bashrc:

alias jstt_prefix="python trace-test/trace-test.py"
JSTTARGS32="--no-slow -f -x sunspider/check-date-format-tofte.js"
JSTTARGS64="$JSTTARGS32"

alias jsdtt32="                   jstt_prefix debug32/js $JSTTARGS32"
alias jsott32="                   jstt_prefix opt32/js   $JSTTARGS32"
alias jsdtt32b="X86_FORCE_SSE2=no jstt_prefix debug32/js $JSTTARGS32"
alias jsott32b="X86_FORCE_SSE2=no jstt_prefix opt32/js   $JSTTARGS32"
alias jsott64="                   jstt_prefix opt64/js   $JSTTARGS64"
alias jsdtt64="                   jstt_prefix debug64/js $JSTTARGS64"

function jsatt
{
  if [ -d debug32 ] && [ -d debug64 ] && [ -d opt32 ] && [ -d opt64 ] ; then
    cd debug32 && mq && cd .. && \
    cd debug64 && mq && cd .. && \
    cd opt32 && mq && cd .. && \
    cd opt64 && mq && cd ..

    if [ $? -eq 0 ] ; then
      echo
      echo "debug32"          && jsdtt32   && echo
      echo "debug32 (no SSE)" && jsdtt32b  && echo
      echo "debug64"          && jsdtt64   && echo
      echo "opt32"            && jsott32   && echo
      echo "opt32 (no SSE)"   && jsott32b  && echo
      echo "opt64"            && jsott64   && echo
    fi
  else
    echo "missing one of debug32/debug64/opt32/opt64"
  fi
}

The code is boring.  For those reading closely, it relies on the fact that I always put different builds in the directories debug32/, opt32/, debug64/, opt64/.  And I skip check-data-format-tofte.js because it fails if you’re in a non-US timezone, see bug 515214.

I already had ‘jsdtt32’ et al, each of which tests a single configuration.  But now with a single command ‘jsatt’ (which is short for “JavaScript All trace-tests”) I can run the JS trace-tests on 6 different configurations on a single x86-64 machine: 32-bit debug (SSE), 32-bit debug (non-SSE), 64-bit debug, 32-bit optimised (SSE), 32-bit optimised (non-SSE), 64-bit optimised.  And it builds them to make sure they’re all up-to-date.

It’s only a small process change for me, but it means that it’s unlikely I will break any of these configurations in the future, or at least, not in a way that shows up in the trace-tests.

Categories
Mac OS X Valgrind

Valgrind and Mac OS 10.6

A new entry in the annals of unfortunate software release dates:

  • On August 19, Valgrind 3.5.0 was released. It added support for Mac OS 10.5.
  • On August 28, Mac OS 10.6 was released.
  • Valgrind 3.5.0 does not support Mac OS 10.6.

If you try to install Valgrind on a machine running Mac OS 10.6, it will fail at configure-time.  If you hack the configure file appropriately so that the install completes, Valgrind will run but crash quickly on any program.  Bug 205241 has the details.  Greg Parker says he has a series of patches to make Valgrind work and he’s just waiting for the open source release of xnu (the core of Mac OS X) before making them public.  With some luck, these fixes will make it into Valgrind 3.5.1 relatively soon.

However, once that’s fixed, there’s another problem.  Mac OS 10.6 uses 64-bit executables by default.  In comparison, 10.5 uses 32-bit executables by default, even though it’s capable of creating and running 64-bit executables.  Unfortunately Valgrind’s support for 64-bit executables on Mac OS X isn’t very good.  The main problem is that start-up is sloooooow, which means that even Hello World takes over four seconds to run on my MacBook Pro.  Fixing this one will be harder, as it will require reworking the Mac OS X start-up sequence.  Bug 205938 is tracking this problem.

Related to this: does anyone know if there is an easy way to have both 10.5 and 10.6 installed on a single machine?  That would be a big help when it comes to developing and testing Valgrind’s Mac OS X support.

Categories
C Correctness Cplusplus Programming

“No-else-after-return” considered harmful

(Update below)

At the time of writing this, the Mozilla Coding Style guidelines have this recommendation under “General C/C++ Practices”:

Don’t put an else right after a return. Delete the else, it’s unnecessary and increases indentation level.

I can appreciate this in some circumstances, such as when checking for an error case early in a long function:

void f()
{
    if (errorcase())
        return;

    a();
    b();
    c();
    d();
}

But as a blanket guideline I think it’s misguided because it promotes an impure style of programming.  And by “impure” here I mean the opposite of “pure” in this sense, i.e. “free from side-effects and other such error-prone junk”.  Let’s consider a simple example:  the max() function.  Here’s a way to write it that is quite sensible, in my opinion:

int max(int a, int b)
{
    if (a > b)
        return a;
    else
        return b;
}

But this violates the no-else-after-return guideline.  The Mozilla way to write this is like so:

int max(int a, int b)
{
    if (a > b)
        return a;
    return b;
}

(I’m not exaggerating, see this comment and this comment from Mozilla’s VP of Engineering and CTO respectively, about an example very similar to this.)

Urk.  That’s horrible.  Any time you’re using an ‘if’ without a corresponding ‘else’, you’re relying on an impure feature — you must be doing something with a side-effect in the ‘then’ branch, or some kind of impure control flow.  Mozilla’s coding guidelines insist that we write max(), a pure function, in an impure way.  Also, it’s just harder to read because the two return statements have different indentation levels despite the fact that they are closely related.

In this case there’s a way around the problem:

int max(int a, int b)
{
    return (a > b ? a : b);
}

Even though C’s if-then-else expression syntax is awful, it’s a better way to do it.

In general, statements suck, expressions rule.  Even in a language like C/C++ you can do something approaching functional programming by avoiding statements where possible.  In a tiny example like max() the difference may seem trivial, but in bigger examples the difference becomes more important.  Consider the following code from Nanojit’s register allocator which picks a register by looking at the free registers, the registers that are allowed to be used, and possibly the preferred registers.  You don’t have to understand it closely, but it helps to know that RegisterMask is just a bitfield with one bit per register.

Register Assembler::registerAlloc(RegisterMask allow)
{
    RegAlloc &regs = _allocator;
    RegisterMask prefer = SavedRegs & allow;
    RegisterMask free = regs.free & allow;

    RegisterMask set = prefer;
    if (set == 0) set = allow;

    if (free)
    {
        // at least one is free
        set &= free;

        // ok we have at least 1 free register so let's try to pick
        // the best one given the profile of the instruction
        if (!set)                                                             
        {
            // desired register class is not free so pick first of any class
            set = free;
        }
        NanoAssert((set & allow) != 0);
        Register r = nRegisterAllocFromSet(set);
        regs.used |= rmask(r);
        return r;
    }

    // ... more code here ...
}

I find this code incredibly hard to understand.  There are three ‘ifs’ that lack ‘elses’, so it relies heavily on state updates to do its work.  In fact, in order to understand it, I ended up rewriting it in a more pure style, by using a sequence of small refactorings:

Register Assembler::registerAlloc(RegisterMask allow)
{
    RegAlloc &regs = _allocator;
    RegisterMask allowedAndFree = allow & regs.free;

    if (allowedAndFree)
    {
        // At least one usable register is free -- no need to steal.  
        // Pick a preferred one if possible.
        RegisterMask preferredAndFree = allowedAndFree & SavedRegs;
        RegisterMask set = ( preferredAndFree ? preferredAndFree : allowedAndFree );
        Register r = nRegisterAllocFromSet(set);
        regs.used |= rmask(r);
        return r;
    }
    // ... more code here ...
}

There’s now only a single ‘if’ lacking an ‘else’, and it turns out even that can be removed by putting the “… more code here …” part into an else branch, and then the ‘return r;’ can be moved to the end of the function.

The code is now comprehensible — it picks a register that is free and allowed, and preferred as well if possible.

Part of the art of programming involves knowing when to avoid using a language feature.  Everybody knows that ‘goto’ is occasionally highly useful, but should be avoided whenever something less dangerous (a for-loop, a while-loop, a ‘break’ statement) can be used.  And everybody knows it because everybody is taught it.  But programmers aren’t taught to avoid unnecessary side-effects… at least, I wasn’t, certainly not via examples such as those I’ve given above.  I’ve only come to realise these things as I’ve switched several times between using C/C++ and some pure functional languages over the years, and come to appreciate how purity helps make programs more readable and less error-prone.

To come full circle, I’d like to see the no-else-after-return guideline removed from the Mozilla style guide, or at the very least, be qualified in a way that allows the cases I’ve argued for.  I’d also love to see a guideline (with examples) added such as “prefer expressions to statements”.

Update:

(The arguments had the wrong names in the initial post, I’ve fixed them now.)

All three versions of max() are pure, because it’s possible to combine impure pieces of code (such as an else-less if) into a pure piece of code.  So I’ll augment my recommendation: pure code is preferable, and furthermore, given that C/C++ makes it so easy to write impure code, it’s preferable that pure code be obviously pure. I argue that the 3rd version of max() with the ?: operator is the most obviously pure, because a ?: is pure if its operands are pure.  In comparison, it’s harder to tell if an if-then-else is pure (although this case is still pretty easy thanks to the matching return statements).  And an if-without-else almost never results in pure code;  the max() example above is a rare exception.

I also omitted a fourth formulation of max:

int max(int a, int b)
{
    int m;
    if (a > b)
        m = a;
    else
        m = b;
    return m;
}

For this very simple case, such a formulation is overkill.  But if the ‘then’ and ‘else’ cases are more complicated it makes more sense.  (Furthermore, much of this discussion should be thought of in more complex cases, as these kinds of small decision can make a bigger difference then.)  This version also doesn’t violate the no-else-after-return rule.  So perhaps that rule isn’t as bad as I thought, if only because it’s easier to work around than I thought… but I still think rewriting the 1st version of max() into the 2nd version is a bad idea.

Categories
Mac OS X Valgrind

Valgrind 3.5.0 has been released

Valgrind 3.5.0 has been released!  It’s the first release that supports Mac OS X.  It also adds a number of other new features and a whole lot of bug-fixes.  See the release notes for details.  Many thanks to everyone who contributed to this release.

Categories
Mercurial Tracemonkey Work habits

How I Work on Tracemonkey

After six months of working on Tracemonkey, I’ve built up a particular workflow — how I use Mercurial, arrange my workspaces, run tests, and commit code.  I thought it would be worth describing this in case it helps other developers improve their workflow, or perhaps so they can give me ideas on how to improve my own workflow.

Workspace Structure

I have two machines, an Ubuntu Linux desktop and a Mac laptop.  For both machines I use the same workspace structure.  All my Mozilla work is in a directory ~/moz/.  At any one time I have up to 10 workspaces.  ~/moz/ws0/ always contains an unmodified clone of the tracemonkey repository, created like so:

hg clone http://hg.mozilla.org/tracemonkey/ ~/moz/ws0

Workspaces ~/moz/ws1 through to ~/moz/ws9 are local clones of ~/moz/ws0/ in which I make modifications.  I create these workspaces like this:

hg clone ~/moz/ws0 ~/moz/wsN

Local hg clones are much cheaper than ones done over the network.  On my Linux box it takes about 45 seconds, on my Mac somewhere over 2 minutes;  it seems that laptops have slower hard disks than desktops.  In comparison, cloning hg.mozilla.org/tracemonkey/ can take anywhere from 5 to 30 minutes or more (I don’t know why there’s so much variation there).

I mostly work with the Javascript shell, called ‘js’, so I do most of my work in ~/moz/wsN/js/src/.  There are three ways I commonly build ‘js’.

  • Debug builds go in ~/moz/wsN/js/src/debug/.  I use these for most of my development and testing.
  • Optimised builds go in ~/moz/wsN/js/src/opt/.  I use these for measuring performance.
  • Optimised builds with symbols go in ~/moz/wsN/js/src/optg/.  I use these with Cachegrind, which needs optimised code with symbols to be useful.

I have a number of bash aliases I use to move around these directories:

alias m="cd ~/moz/"
alias m0="cd ~/moz/ws0/"
alias j0="cd ~/moz/ws0/js/src/"
alias j0d="cd ~/moz/ws0/js/src/debug/"
alias j0o="cd ~/moz/ws0/js/src/opt/"

and so on for the remaining workspaces ws1 through ws9.  I have a common bash config file that I use on both my machines;  whenever I change it I copy it to the other machine.  This is a manual process, which is not ideal, but in practice it works well enough.

I find nine workspaces for making changes is enough to cover everything I’m doing;  if I find myself needing more it’s because some of the existing ones have stagnated and I need to do some cleaning up.

Building ‘js’

I have three scripts, js_conf_debug, js_conf_opt, js_conf_optg, which configure and build from scratch.  Here is js_conf_debug, the others are similar:

#! /bin/sh

if [ -z $1 ] ; then
    echo "usage: $0 <dirname>"
elif [ -d $1 ] ; then
    echo "directory $1 already exists"
else
    autoconf2.13
    mkdir $1
    cd $1
    CC='gcc -m32' CXX='g++ -m32' AR=ar ../configure \
        --enable-debug --disable-optimize --target=i686-pc-linux-gnu
    make --quiet -j 2
fi

These are scripts rather than bash aliases or functions because they are quite different on the Linux machine and the Mac.

I also have this alias for incremental builds:

alias mq="make --quiet -j 2"

Testing ‘js’

The program I run most is trace-test.js.  So much so that I have more aliases for it:

alias jsott="time opt/js -j trace-test.js"
alias jsdtt="time debug/js -j trace-test.js"

I don’t need an alias for the optg build because that’s only used with Cachegrind, which I run in a different way (see below).

I run the JS unit test with the following script:

function js_regtest
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: js_regtest <ws-number-1> <ws-number-2>"
    else
        xdir=$HOME/moz/ws$x/js/src/debug
        ydir=$HOME/moz/ws$y/js/src/debug
        echo "############################"
        echo "## COMPILING $xdir"
        echo "############################"
        cd $xdir && mq
        echo "############################"
        echo "## COMPILING $ydir"
        echo "############################"
        cd $ydir && mq
        cd $ydir/../../tests
        echo "############################"
        echo "## TESTING $xdir"
        echo "############################"
        time jsDriver.pl \
            -k \
            -e smdebug \
            --opt '-j' \
            -L spidermonkey-n.tests slow-n.tests \
            -f base.html \
            -s $xdir/js && \
        echo "############################"
        echo "## TESTING $ydir"
        echo "############################"
        time jsDriver.pl \
             -k \
             -e smdebug \
             --opt '-j' \
             -L spidermonkey-n.tests slow-n.tests \
             -L base-failures.txt \
             -s $ydir/js
    fi
}

An example invocation would be:

js_regtest 0 3

The above invocation first ensures a debug ‘js’ is built in workspaces 0 and 3.  Then it runs ~/moz/ws0/js/src/debug/js in order to get the baseline failures, which are put in base-failures.txt.  Then it runs ~/moz/ws3/js/src/debug/js and compares the results against the baseline.  The -L lines skip the tests that are really slow;  without them it takes hours to run.  I time each invocation just so I always know roughly how long it takes;  it’s a bit over 10 minutes to do both runs.  It assumes that workspace 0 and 3 correspond to the same hg revision;  perhaps I could automate that to guarantee it but I haven’t (knowingly) got that wrong yet so haven’t bothered to do so.

Timing ‘js’

I time ‘js’ by running SunSpider.  I obtained it like so:

svn http://svn.webkit.org/repository/webkit/trunk/SunSpider ~/moz/SunSpider

I haven’t updated it in a while, I hope it hasn’t changed recently!

I run it with this bash function:

function my_sunspider
{
    x=$1
    y=$2
    n=$3
    if [ -z $x ] || [ -z $y ] || [ -z $n ] ; then
        echo "usage: my_sunspider <ws-number-1> <ws-number-2> <number-of-runs>"
    else
        for i in $x $y ; do
            dir = $HOME/moz/ws$i/js/src/opt
            cd $dir || exit 1
            make --quiet || exit 1
            cd ~/moz/SunSpider
            echo "############################"
            echo "####### TESTING ws$i #######"
            echo "############################"
            time sunspider --runs=$n --args='-j' --shell $dir/js > opt$i
         done

         my_sunspider_compare_results $x $y
    fi
}

function my_sunspider_compare_results
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: my_sunspider_compare_results <ws-number-1> <ws-number-2>"
    else
        sunspider-compare-results \
            --shell $HOME/moz/ws$x/js/src/opt/js opt$x opt$y
    fi
}

An invocation like this:

my_sunspider 0 3 100

will ensure that optimised builds in both workspaces are present, and then compare them by doing SunSpider 100 runs.  That usually gives what SunSpider claims as +/-0.1% variation (I don’t believe it, though).  On my Mac this takes about 3.5 minutes, and 100 runs is enough that the results are fairly reliable, certainly more so than the default of 10 runs.  But when testing a performance-affecting change I like to do some timings, wait until a few more patches have landed in the tree, then update and rerun the timings — on my Mac I see variations of 5-10ms regularly due to minor code differences.  Timing multiple versions like this gives me a better idea of whether a timing difference is real or not.  Even then, it’s still not easy to know for sure, and this can be frustrating when trying to work out if an optimisation I applied is really giving a 5ms speed-up or not.

On my Linux box, I have to use 1000 runs to get +/-0.1% variation.  This takes about 25 minutes, so I rarely do performance-related work on this machine.  I don’t know why Linux causes greater timing variation.

Profiling ‘js’ with Cachegrind

I run Cachegrind on ‘js’ running SunSpider with this bash function:

function cg_sunspider
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: cg_sunspider <ws-number-1> <ws-number-2>"
    else
        for i in $x $y ; do
            dir = $HOME/moz/ws$i/js/src/optg
            cd $dir || exit 1
            make --quiet || exit 1
            cd ~/moz/SunSpider
            time valgrind --tool=cachegrind --branch-sim=yes --smc-check=all \
                --cachegrind-out-file=cachegrind.out.optg$i \
                --auto-run-dsymutil=yes \
                $dir/js `cat ss0-args.txt`
            cg_annotate --auto=yes cachegrind.out.optg$i > ann-optg$i
        done
    fi
}

ss0-args.txt contains this text:

-j -f tmp/sunspider-test-prefix.js -f resources/sunspider-standalone-driver.js

What this does is run just the main SunSpider program, once, avoiding all the start-up processes and all that.  This is important for Cachegrind — it means that I can safely use –cachegrind-out-file to name a specific file, which is not safe if running Cachegrind on a program involving multiple processes.   (I think this is slightly dangerous… if you run ‘sunspider –ubench’ it seems to change one of the above .js files and you have to rerun SunSpider normally to get them back to normal.)  I use –branch-sim=yes because I often find it to be useful; at least twice recently it has helped me identify performance problems.

If I want to focus on a particular Cachegrind statistic, e.g. D2mr (level 2 data read misses) or Bim (indirect branch mispredictions) then I rerun cg_annotate like this:

cg_annotate --auto=yes --show=I2mr --sort=I2mr cachegrind.out.optgN > ann-optgN-I2mr

Profiling ‘js’ with Shark

To profile ‘js’ with Shark, I use SunSpider’s –shark20 and –test options.  I don’t have this automated yet, I probably should.

Managing Changes with Mercurial

Most of my changes are not that large, so I leave them uncommitted in a workspace.  This is primitive, but has one really nice feature:  when pulling and updating, hg merges the changes and marks conflicts in the nice “<<<” “>>>” way.

In comparison, with Mercurial queues (which I tried for a while) you have to pop your patches, update, then push them, and it uses ‘patch’ to do the merging.  And I hate ‘patch’ because conflicted areas tend to be larger, and because they go in a separate reject file rather than being inserted inline.

I also avoid doing local commits unless I’m working on something really large just because the subsequent merging is difficult (at least, I think it’s difficult;  my Mercurial knowledge still isn’t great).  In that case I do local commits until the change is finished, then apply the patch (using ‘hg diff’ and ‘patch’) in a single hit to a newly cloned tree — given Mozilla’s use of Bugzilla, the change will have to be a single patch anyway so this aggregation step has to happen at some point.

Pre-push Checklist

Before landing any patch, I do my best to work through the following check-list.  I created this list recently after having to back out several commits due to missing one of the above steps;  I give examples of breakage I’ve caused in square brackets.

  • Ensure there are no new compiler warnings for ‘js’ for optimised and debug builds.  [I managed to introduce some warnings on an optimised build recently for what was supposedly a whitespace-only change!]
  • Ensure ‘js’ runs trace-test.js without failures, for optimised builds, debug builds, debug builds with TMFLAGS=full (to test the verbose output) under Valgrind (to test for memory errors).  [I’ve had to back out several patches due to breaking TMFLAGS=full]
  • Ensure lirasm builds and passes its tests for both optimised and debug builds.  [I’ve forgotten this numerous times, leaving lirasm in a broken state, which is why I created bug 503449].
  • Ensure unit tests pass with a debug build.  [Amusingly enough, I don’t think I’ve ever caused breakage by forgetting this step!]
  • (For any commit that might affect performance) Check SunSpider timings with an optimised build.
  • (For complex changes) Check the patch on the try servers. (Nb: they run optimised builds, so will miss assertion failures among other things)
  • (For changes affecting the ARM backend) Check the patch builds and runs trace-test.js (using a debug build) on my virtual qemu+ARM/Linux machine.
  • Check tinderbox to make sure the tree is open for commits.  [When the tree is closed, there’s no mechanism that actually prevents you from committing.  I had to back-out a patch during a tinderbox upgrade because of this.]

It’s quite a list, and I don’t usually do anything with a browser build, when I probably should, so that would make it even longer.  And there are other things to get wrong… for example, I never test the –disable-jit configuration and I broke it once.

Pushing

When I’m ready to push a change, I make sure my workspaces are up-to-date with respect to the Mozilla repo.  I then commit the change to my modified repo, then push it from there into ~/moz/ws0/, then check ‘hg outgoing -p’ on that repo to make sure it looks ok, and then push to the Mozilla repo from there.  I try to do this quickly so that no-one else lands something in the meantime;  this has only happened to me once and I tried to use ‘hg rollback’ to undo my local changes which I think should have worked but seemingly didn’t.

Post-push Checklist

After committing, I do these steps:

  • Mark the bug’s “whiteboard” field as “fixed-in-tracemonkey”.
  • Put a link to the commit in a comment for the bug, of the form http://hg.mozilla.org/tracemonkey/rev/<revhash>/.  I always test the link before submitting the comment.

Conclusions

That’s a lot of stuff.  Two of my more notable conclusions are:

  • Automation is a wonderful thing.  In particular, having scripts for the complicated tasks (e.g. running the unit tests, running sunspider, running sunspider under Cachegrind) has saved me lots of time and typing (and lots of head-scratching and re-running when I realised I forgot some command line option somewhere).  And this automation was made much easier once I settled on a standard workspace+build layout.
  • The pre-push checklist is both disconcertingly long and disconcertingly incomplete.  And I had to work it out almost entirely by myself — I’m not aware of any such check-list documented anywhere else.  Having lots of possible configurations really hurts testability.  I’m not sure how to improve this.

If you made it this far, congratulations!  That was pretty dry, especially if you’re not a Tracemonkey developer.  I’d love to hear suggestions for improving what I’m doing.