Nicholas Nethercote – Page 27 – Notes on Rust, Firefox, MemShrink, JavaScript, and more

How I Work on Tracemonkey

After six months of working on Tracemonkey, I’ve built up a particular workflow — how I use Mercurial, arrange my workspaces, run tests, and commit code. I thought it would be worth describing this in case it helps other developers improve their workflow, or perhaps so they can give me ideas on how to improve my own workflow.

Workspace Structure

I have two machines, an Ubuntu Linux desktop and a Mac laptop. For both machines I use the same workspace structure. All my Mozilla work is in a directory ~/moz/. At any one time I have up to 10 workspaces. ~/moz/ws0/ always contains an unmodified clone of the tracemonkey repository, created like so:

hg clone http://hg.mozilla.org/tracemonkey/ ~/moz/ws0

Workspaces ~/moz/ws1 through to ~/moz/ws9 are local clones of ~/moz/ws0/ in which I make modifications. I create these workspaces like this:

hg clone ~/moz/ws0 ~/moz/wsN

Local hg clones are much cheaper than ones done over the network. On my Linux box it takes about 45 seconds, on my Mac somewhere over 2 minutes; it seems that laptops have slower hard disks than desktops. In comparison, cloning hg.mozilla.org/tracemonkey/ can take anywhere from 5 to 30 minutes or more (I don’t know why there’s so much variation there).

I mostly work with the Javascript shell, called ‘js’, so I do most of my work in ~/moz/wsN/js/src/. There are three ways I commonly build ‘js’.

Debug builds go in ~/moz/wsN/js/src/debug/. I use these for most of my development and testing.
Optimised builds go in ~/moz/wsN/js/src/opt/. I use these for measuring performance.
Optimised builds with symbols go in ~/moz/wsN/js/src/optg/. I use these with Cachegrind, which needs optimised code with symbols to be useful.

I have a number of bash aliases I use to move around these directories:

alias m="cd ~/moz/"
alias m0="cd ~/moz/ws0/"
alias j0="cd ~/moz/ws0/js/src/"
alias j0d="cd ~/moz/ws0/js/src/debug/"
alias j0o="cd ~/moz/ws0/js/src/opt/"

and so on for the remaining workspaces ws1 through ws9. I have a common bash config file that I use on both my machines; whenever I change it I copy it to the other machine. This is a manual process, which is not ideal, but in practice it works well enough.

I find nine workspaces for making changes is enough to cover everything I’m doing; if I find myself needing more it’s because some of the existing ones have stagnated and I need to do some cleaning up.

Building ‘js’

I have three scripts, js_conf_debug, js_conf_opt, js_conf_optg, which configure and build from scratch. Here is js_conf_debug, the others are similar:

#! /bin/sh

if [ -z $1 ] ; then
    echo "usage: $0 <dirname>"
elif [ -d $1 ] ; then
    echo "directory $1 already exists"
else
    autoconf2.13
    mkdir $1
    cd $1
    CC='gcc -m32' CXX='g++ -m32' AR=ar ../configure \
        --enable-debug --disable-optimize --target=i686-pc-linux-gnu
    make --quiet -j 2
fi

These are scripts rather than bash aliases or functions because they are quite different on the Linux machine and the Mac.

I also have this alias for incremental builds:

alias mq="make --quiet -j 2"

Testing ‘js’

The program I run most is trace-test.js. So much so that I have more aliases for it:

alias jsott="time opt/js -j trace-test.js"
alias jsdtt="time debug/js -j trace-test.js"

I don’t need an alias for the optg build because that’s only used with Cachegrind, which I run in a different way (see below).

I run the JS unit test with the following script:

function js_regtest
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: js_regtest <ws-number-1> <ws-number-2>"
    else
        xdir=$HOME/moz/ws$x/js/src/debug
        ydir=$HOME/moz/ws$y/js/src/debug
        echo "############################"
        echo "## COMPILING $xdir"
        echo "############################"
        cd $xdir && mq
        echo "############################"
        echo "## COMPILING $ydir"
        echo "############################"
        cd $ydir && mq
        cd $ydir/../../tests
        echo "############################"
        echo "## TESTING $xdir"
        echo "############################"
        time jsDriver.pl \
            -k \
            -e smdebug \
            --opt '-j' \
            -L spidermonkey-n.tests slow-n.tests \
            -f base.html \
            -s $xdir/js && \
        echo "############################"
        echo "## TESTING $ydir"
        echo "############################"
        time jsDriver.pl \
             -k \
             -e smdebug \
             --opt '-j' \
             -L spidermonkey-n.tests slow-n.tests \
             -L base-failures.txt \
             -s $ydir/js
    fi
}

An example invocation would be:

js_regtest 0 3

The above invocation first ensures a debug ‘js’ is built in workspaces 0 and 3. Then it runs ~/moz/ws0/js/src/debug/js in order to get the baseline failures, which are put in base-failures.txt. Then it runs ~/moz/ws3/js/src/debug/js and compares the results against the baseline. The -L lines skip the tests that are really slow; without them it takes hours to run. I time each invocation just so I always know roughly how long it takes; it’s a bit over 10 minutes to do both runs. It assumes that workspace 0 and 3 correspond to the same hg revision; perhaps I could automate that to guarantee it but I haven’t (knowingly) got that wrong yet so haven’t bothered to do so.

Timing ‘js’

I time ‘js’ by running SunSpider. I obtained it like so:

svn http://svn.webkit.org/repository/webkit/trunk/SunSpider ~/moz/SunSpider

I haven’t updated it in a while, I hope it hasn’t changed recently!

I run it with this bash function:

function my_sunspider
{
    x=$1
    y=$2
    n=$3
    if [ -z $x ] || [ -z $y ] || [ -z $n ] ; then
        echo "usage: my_sunspider <ws-number-1> <ws-number-2> <number-of-runs>"
    else
        for i in $x $y ; do
            dir = $HOME/moz/ws$i/js/src/opt
            cd $dir || exit 1
            make --quiet || exit 1
            cd ~/moz/SunSpider
            echo "############################"
            echo "####### TESTING ws$i #######"
            echo "############################"
            time sunspider --runs=$n --args='-j' --shell $dir/js > opt$i
         done

         my_sunspider_compare_results $x $y
    fi
}

function my_sunspider_compare_results
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: my_sunspider_compare_results <ws-number-1> <ws-number-2>"
    else
        sunspider-compare-results \
            --shell $HOME/moz/ws$x/js/src/opt/js opt$x opt$y
    fi
}

An invocation like this:

my_sunspider 0 3 100

will ensure that optimised builds in both workspaces are present, and then compare them by doing SunSpider 100 runs. That usually gives what SunSpider claims as +/-0.1% variation (I don’t believe it, though). On my Mac this takes about 3.5 minutes, and 100 runs is enough that the results are fairly reliable, certainly more so than the default of 10 runs. But when testing a performance-affecting change I like to do some timings, wait until a few more patches have landed in the tree, then update and rerun the timings — on my Mac I see variations of 5-10ms regularly due to minor code differences. Timing multiple versions like this gives me a better idea of whether a timing difference is real or not. Even then, it’s still not easy to know for sure, and this can be frustrating when trying to work out if an optimisation I applied is really giving a 5ms speed-up or not.

On my Linux box, I have to use 1000 runs to get +/-0.1% variation. This takes about 25 minutes, so I rarely do performance-related work on this machine. I don’t know why Linux causes greater timing variation.

Profiling ‘js’ with Cachegrind

I run Cachegrind on ‘js’ running SunSpider with this bash function:

function cg_sunspider
{
    x=$1
    y=$2
    if [ -z $x ] || [ -z $y ] ; then
        echo "usage: cg_sunspider <ws-number-1> <ws-number-2>"
    else
        for i in $x $y ; do
            dir = $HOME/moz/ws$i/js/src/optg
            cd $dir || exit 1
            make --quiet || exit 1
            cd ~/moz/SunSpider
            time valgrind --tool=cachegrind --branch-sim=yes --smc-check=all \
                --cachegrind-out-file=cachegrind.out.optg$i \
                --auto-run-dsymutil=yes \
                $dir/js `cat ss0-args.txt`
            cg_annotate --auto=yes cachegrind.out.optg$i > ann-optg$i
        done
    fi
}

ss0-args.txt contains this text:

-j -f tmp/sunspider-test-prefix.js -f resources/sunspider-standalone-driver.js

What this does is run just the main SunSpider program, once, avoiding all the start-up processes and all that. This is important for Cachegrind — it means that I can safely use –cachegrind-out-file to name a specific file, which is not safe if running Cachegrind on a program involving multiple processes. (I think this is slightly dangerous… if you run ‘sunspider –ubench’ it seems to change one of the above .js files and you have to rerun SunSpider normally to get them back to normal.) I use –branch-sim=yes because I often find it to be useful; at least twice recently it has helped me identify performance problems.

If I want to focus on a particular Cachegrind statistic, e.g. D2mr (level 2 data read misses) or Bim (indirect branch mispredictions) then I rerun cg_annotate like this:

cg_annotate --auto=yes --show=I2mr --sort=I2mr cachegrind.out.optgN > ann-optgN-I2mr

Profiling ‘js’ with Shark

To profile ‘js’ with Shark, I use SunSpider’s –shark20 and –test options. I don’t have this automated yet, I probably should.

Managing Changes with Mercurial

Most of my changes are not that large, so I leave them uncommitted in a workspace. This is primitive, but has one really nice feature: when pulling and updating, hg merges the changes and marks conflicts in the nice “<<<” “>>>” way.

In comparison, with Mercurial queues (which I tried for a while) you have to pop your patches, update, then push them, and it uses ‘patch’ to do the merging. And I hate ‘patch’ because conflicted areas tend to be larger, and because they go in a separate reject file rather than being inserted inline.

I also avoid doing local commits unless I’m working on something really large just because the subsequent merging is difficult (at least, I think it’s difficult; my Mercurial knowledge still isn’t great). In that case I do local commits until the change is finished, then apply the patch (using ‘hg diff’ and ‘patch’) in a single hit to a newly cloned tree — given Mozilla’s use of Bugzilla, the change will have to be a single patch anyway so this aggregation step has to happen at some point.

Pre-push Checklist

Before landing any patch, I do my best to work through the following check-list. I created this list recently after having to back out several commits due to missing one of the above steps; I give examples of breakage I’ve caused in square brackets.

Ensure there are no new compiler warnings for ‘js’ for optimised and debug builds. [I managed to introduce some warnings on an optimised build recently for what was supposedly a whitespace-only change!]
Ensure ‘js’ runs trace-test.js without failures, for optimised builds, debug builds, debug builds with TMFLAGS=full (to test the verbose output) under Valgrind (to test for memory errors). [I’ve had to back out several patches due to breaking TMFLAGS=full]
Ensure lirasm builds and passes its tests for both optimised and debug builds. [I’ve forgotten this numerous times, leaving lirasm in a broken state, which is why I created bug 503449].
Ensure unit tests pass with a debug build. [Amusingly enough, I don’t think I’ve ever caused breakage by forgetting this step!]
(For any commit that might affect performance) Check SunSpider timings with an optimised build.
(For complex changes) Check the patch on the try servers. (Nb: they run optimised builds, so will miss assertion failures among other things)
(For changes affecting the ARM backend) Check the patch builds and runs trace-test.js (using a debug build) on my virtual qemu+ARM/Linux machine.
Check tinderbox to make sure the tree is open for commits. [When the tree is closed, there’s no mechanism that actually prevents you from committing. I had to back-out a patch during a tinderbox upgrade because of this.]

It’s quite a list, and I don’t usually do anything with a browser build, when I probably should, so that would make it even longer. And there are other things to get wrong… for example, I never test the –disable-jit configuration and I broke it once.

Pushing

When I’m ready to push a change, I make sure my workspaces are up-to-date with respect to the Mozilla repo. I then commit the change to my modified repo, then push it from there into ~/moz/ws0/, then check ‘hg outgoing -p’ on that repo to make sure it looks ok, and then push to the Mozilla repo from there. I try to do this quickly so that no-one else lands something in the meantime; this has only happened to me once and I tried to use ‘hg rollback’ to undo my local changes which I think should have worked but seemingly didn’t.

Post-push Checklist

After committing, I do these steps:

Mark the bug’s “whiteboard” field as “fixed-in-tracemonkey”.
Put a link to the commit in a comment for the bug, of the form http://hg.mozilla.org/tracemonkey/rev/<revhash>/. I always test the link before submitting the comment.

Conclusions

That’s a lot of stuff. Two of my more notable conclusions are:

Automation is a wonderful thing. In particular, having scripts for the complicated tasks (e.g. running the unit tests, running sunspider, running sunspider under Cachegrind) has saved me lots of time and typing (and lots of head-scratching and re-running when I realised I forgot some command line option somewhere). And this automation was made much easier once I settled on a standard workspace+build layout.
The pre-push checklist is both disconcertingly long and disconcertingly incomplete. And I had to work it out almost entirely by myself — I’m not aware of any such check-list documented anywhere else. Having lots of possible configurations really hurts testability. I’m not sure how to improve this.

If you made it this far, congratulations! That was pretty dry, especially if you’re not a Tracemonkey developer. I’d love to hear suggestions for improving what I’m doing.

Mac OS X Valgrind

Valgrind + Mac OS X update (July 17, 2009)

Post author By Nicholas Nethercote
Post date July 17, 2009
5 Comments on Valgrind + Mac OS X update (July 17, 2009)

We’re now in the preparation phase for the 3.5.0 release of Valgrind, which will be the first release with Mac OS X support. We’ve absorbed some Mozilla culture in the Valgrind development process — we’re now using Bugzilla much more effectively. We have 17 open blockers (and 18 closed blockers), and 41 open “wanted” bugs (and 7 closed ones). Any contributions towards fixing these bugs is most welcome! We’re hoping to release in early August.

C Correctness Cplusplus Programming

What I currently hate most about C++

Post author By Nicholas Nethercote
Post date June 19, 2009
25 Comments on What I currently hate most about C++

Everyone knows that global variables are bad and should be avoided wherever possible. Why? Because each global variable is, in effect, an implicit argument to every function that can see the global variable. The same thing is true of any non-local state.

And the presence of non-local state means that you can’t reason locally about your code. That makes your code more complex, and complex code is likely to have more defects.

And the thing I hate about C++ (and other object-oriented languages) is that it vigorously encourages non-local state.

Non-local state within classes

First, of all, C++ encourages (nay, forces) non-local state within classes, because all class methods have access to all fields within a class, even the ones they don’t need to. In other words, every class field is an implicit argument to every class method. This can work well for, let’s say, a “Date” class, because the number of fields is small, and most class methods will access most fields.

But problems appear when classes grow larger, when they start to look like what would be a whole module in a non-OO language like C. For example, Nanojit, the compiler core in TraceMonkey, contains a class called Assembler, which encapsulates the translation of Nanojit’s low-level intermediate representation (called “LIR”) to assembly code. If you exclude members that are only included when debugging is enabled, there are 18 data fields and 102 methods. And some of those 18 data fields are pointers to objects that are themselves complex.

Let’s consider a single field, _thisfrag, which holds a fragment of LIR code. It gets set via an argument passed into the method beginAssembly(). It then gets overwritten — but with the same value! — via an argument passed into the method assemble(). It is accessed directly in only 7 of those 103 methods:

assemble(): which increments _thisfrag->compileNbr
gen(), printActivationState(), asmspilli(): which use _thisfrag->lirbuf->names, but only when verbose output is asked-for
assignSavedRegs(), reserveSavedRegs(), assignParamRegs(): where parts of _thisfrag->lirbuf are read

And that’s just one example, which I chose because I’d been thinking about this problem and then just this morning I had to hunt down all those uses of _thisfrag in order to understand its purpose and whether I could change some related code safely. I’m sure a similar story will hold for a lot of the fields in this class.

Just imagine, if you were writing Assembler as a C module, would you make _thisfrag a (module-level) global variable? Almost certainly not, you’d pass it only to the functions that need it; actually you’d probably only pass parts of _thisfrag around. But C++ encourages you to make everything a class, and stick everything a class ever needs in as a data field, creating lots of non-local state that complicates everything.

(An aside: Assembler probably also isn’t a very good basis for a class because it’s a *process*. I figure that if you’d write something as a struct in C, then it makes for a good class in C++. But I need to think about that some more.)

Non-local state beyond classes

But it gets even worse. Good C++ practice encourages everyone to create private fields and use public get/set methods to access class data fields from outside the class. But get/set methods are just lipstick on a pig; all too often you end up with something like this example, again from the Assembler class:

    private:
        AssmError   _err;

    public:
        void        setError(AssmError e) { _err = e; }
        AssmError   error() { return _err; }

Oh great, I feel much safer now.

It would be better to just make _err public and avoid the get/set obfuscation; at least then it would be obvious how exposed _err is. It also saves you from having to check the definitions of error() and setError().

Even better, in this case _err gets set from various places within class Assembler, but also from various places outside class Assembler. I’ve tried twice to simplify this, by passing error codes around explicitly instead of implicitly through this quasi-global variable, but both times I was defeated by the complexity of the control flow governing how _err is accessed, in particular the fact that’s it’s set on some control paths but not others. This is a big part of the reason why out-of-memory handling in Nanojit is a total nightmare.

The end result

Currently Nanojit has a number of large, complex classes, and many of them link to other large complex classes. At many points in the code there is a bewildering amount of accessible non-local state. (And I haven’t even mentioned how this can complicate memory management, if you end up with multiple pointers to objects.) The complexity caused by this is a tax on development that we are all paying daily.

A better way

Before joining Mozilla, I spent three years programming in a functional language called Mercury. Mercury entirely lacks global variables (except for some very restricted cases which are rarely used). This means that you have to pass more data around as arguments than you do in C++. But it also means that when you look at a function, you know exactly what its inputs and outputs are, and so you can use purely local reasoning to understand what it does. This is an *enormous* help, and one that’s easy to underestimate if you haven’t experienced it.

Obviously we’re not going to rewrite Firefox in a functional language any time soon. And of course non-local state is necessary sometimes. But even C is better than C++ in this respect, because at least in C global variables are obvious and everyone knows that you should minimise their use — the language doesn’t actively encourage you to put non-local state everywhere and let you feel good about it. Information hiding is one of the fundamental principles of programming, and object-oriented programming is meant to promote it, but unless you are very disciplined it tends to do the opposite.

So next time you are thinking about adding a field to a class, ask yourself: is it really necessary? Could it be passed in as an argument instead, or something else? Can you make your life easier by avoiding some non-local state?

Mac OS X Valgrind

Valgrind + Mac OS X update (June 17, 2009)

Post author By Nicholas Nethercote
Post date June 17, 2009
5 Comments on Valgrind + Mac OS X update (June 17, 2009)

It’s time for the June update on the progress of the Mac OS X port of Valgrind.

Progress has been good: the DARWIN branch has been merged to the trunk. With that having happened, we’re now in sight of an actual release (3.5.0) containing Mac OS X support. There’s some polishing and bug-fixing — both for Mac OS X and in general — to be done before that happens, but hopefully we’ll release 3.5.0 in early August. That will be before Snow Leopard comes out; another release may be necessary afterwards, but we want to get this code released sooner rather than later.

One interesting problem we encountered was some users were having Valgrind abort with a SIGTRAP extremely early. It was very mysterious, and none of the developers were able to reproduce it. Turns out that a program called Instant Hijack by a company called Rogue Amoeba was the cause of the problem. Both Valgrind and Instant Hijack do some stuff with dyld, and apparently Instant Hijack’s stuff is a bit dodgy. Turns out there’s an easy workaround, which involves temporarily disabling Instant Hijack. This was reported by a Rogue Amoeba developer, fortunately he tried Valgrind himself, had the same SIGTRAP abort, found the bug report, and realised what the problem was. If it wasn’t for him, we’d still be scratching our heads!

In the meantime, keep reporting any problems you have, in particular any unimplemented syscall wrappers — a number have been added lately but there are still more to be done. Please report problems via Bugzilla rather than in comments on this blog, as bugzilla reports are more likely to be acted upon. Thanks!

Uncategorized

Valgrind on Windows?

With the Valgrind-on-Mac support coming along nicely, it’s worth addressing another widely-used platform: Windows. Will Valgrind work on Windows any time soon? There are actually two answers: (a) hell no, and (b) it already does (sort of).

The patch I merged from the Darwin branch onto the trunk yesterday was 28,300 lines. And that was almost entirely new code, because I’d done a lot of work to synchronize the branch and trunk so that all non-addition changes had been dealt with. Greg Parker spent over four years, off and on, working on the original port, and I spent close to three months full time cleaning it up, and Julian Seward also pitched in a bit. I roughly estimate the Darwin port represents at least 1,000 person-hours of work, possibly much more.

And Mac OS X is a lot closer to Linux than Windows is. Also, the Mac OS X kernel is open source, which makes a port much easier. A Valgrind-on-Windows port would therefore be an enormous undertaking, one that is unlikely to happen soon, if ever. That is how we get answer (a) above.

However, although Valgrind doesn’t run on Windows, it is possible to run Windows programs under Valgrind, thanks to Wine — you run the Windows program under Wine, and Wine under Valgrind. The development (trunk) versions of both Valgrind and Wine now have enough awareness of each other that they can apparently be used together. I say “apparently” because I haven’t tried it myself, but I know that others have had some success. But please note that this is fairly new and experimental, and should only be tried by those not afraid to get their hands dirty (this page has more details). And that’s how we get the answer (b) above.

Mac OS X Valgrind

Mac OS X now supported on the Valgrind trunk

Post author By Nicholas Nethercote
Post date May 28, 2009
17 Comments on Mac OS X now supported on the Valgrind trunk

This morning I merged the DARWIN branch, which had been holding Valgrind’s support for Mac OS X, onto the trunk. The branch is now defunct, and Valgrind-on-Mac users should check out the trunk like so:

svn co svn://svn.valgrind.org/valgrind/trunk <dirname>
cd <dirname>

and then build it according to the instructions in the README file.

This is a good thing, if only because it means I can spend less time maintaining a branch and more time actually fixing things.

Update: fixed the svn URL.

Mac OS X Valgrind

Valgrind + Mac OS X update (May 18, 2009)

Post author By Nicholas Nethercote
Post date May 18, 2009
4 Comments on Valgrind + Mac OS X update (May 18, 2009)

It’s time for the May update on the progress of the Mac OS X port of Valgrind. In the last month, 133 commits have been made to the DARWIN branch by Julian Seward and myself.

Here are the current (as of r9898) values of the metrics I have been using as a means of tracking progress.

The number of regression test failures on Mac was 418/128/43/0. It’s now 421/102/15/0. I.e. the number of failures went from 171 to 117. If we ignore the tools Helgrind, DRD and exp-Ptrcheck (which are not widely used and still mostly broken on the branch) the number of failures dropped from 50 to 13. That’s a similar number to what we get on some Linux systems, and we’re in real diminishing-returns territory — the failing tests are all testing very obscure things. So we can basically declare victory on that front.
The number of “FIXME”-style marker comments that indicate something in the code that needs to be fixed was 274. It’s now 260. Furthermore, the method I used last month to count “FIXME”-style comments was flawed, so the number has actually gone down by more than 14; the comparison next month will be reliable. But a lot of these comments are for very obscure things that won’t need to be fixed even before a release, so you shouldn’t be worried by the high number!

Functionality improvements from the last month are as follows.

Some extra system calls are handled.
Some more signal-handling improvements.
Some debug info reading improvements.
File descriptor tracking (–track-fds) now works.
The –auto-run-dsymutil option was added. When used, it makes Valgrind run dsymutil to generate debug info for any files that need it.
Helgrind sort of works; some of its tests pass. But it’s still probably not usable.

Things are going well enough that we should be ready to merge the branch to the trunk soon! That will be a significant milestone, and will make life easier as I won’t have to maintain the branch in parallel with the trunk. I’m currently going through the branch/trunk differences carefully in order to get ready for the merged, with luck it will happen by the end of this week.

Update, March 19: fixed some HTML tags.

Valgrind

RFC: Making Valgrind easier to use with multi-process programs

Post author By Nicholas Nethercote
Post date April 30, 2009
13 Comments on RFC: Making Valgrind easier to use with multi-process programs

Now that I’m working for Mozilla, one of my goals is to make Valgrind easier to use on big programs like Firefox. One feature of such programs is that often they create multiple processes. For example, when I invoke ‘firefox’ on my Linux box, I’m really running three programs. /usr/bin/firefox is a start-up shell script. It uses /usr/bin/basename as part of its preprocessing, and then invokes /usr/lib/firefox-3.0.9/firefox, which is the real firefox, via ‘exec’. (And a program like Google Chrome would be much worse, having one process per tab.)

In this post I’m going to make several suggestions for improvements to Valgrind to it easier for users to use with multi-process programs. I’d love to hear feedback from Valgrind users about these suggestions.

Proposal 1: trace child processes by default

If you run “valgrind firefox”, you get this output on the terminal:

==9045== Memcheck, a memory error detector.
==9045== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==9045== Using LibVEX rev 1888, a library for dynamic binary translation.
==9045== Copyright (C) 2004-2009, and GNU GPL'd, by OpenWorks LLP.
==9045== Using valgrind-3.5.0.SVN, a dynamic binary instrumentation framework.
==9045== Copyright (C) 2000-2009, and GNU GPL'd, by Julian Seward et al.
==9045== For more details, rerun with: -v
==9045==

and then Firefox starts up suspiciously quickly. Where’s that slow-down due to Valgrind? And where are the error messages from Valgrind? As it happens, by default Valgrind doesn’t trace into any child processes spawned by the program it’s tracing. So Valgrind is tracing /usr/bin/firefox, but /usr/bin/basename and /usr/lib/firefox-3.0.9/firefox are run natively.

In order to trace into child processes, you have to use the –trace-children=yes option; then it’ll do what you want.

But I think that not tracing by default is a bad idea. First of all, it’s quite unclear what’s happening, especially if you don’t understand Valgrind’s behaviour. We even have an entry in the FAQ about this. (In contrast, if we traced by default and you didn’t want that behaviour, the fact that you’d get one Valgrind start-up message per process makes it clearer what’s happening.)

Furthermore, in my experience, –trace-children=no is almost never what you want. And it’s easy to forget –trace-children=yes; I do it all the time.

So I think tracing into children should be the default. Others may disagree, so it would be useful to know if Valgrind users have an opinion on this.

Proposal 2: show what command is being run

If I invoke “valgrind –trace-children=yes firefox”, let it load the default page, and then quit, I get this output (eliding some of the startup/shutdown messages for brevity):

==9658== Memcheck, a memory error detector.
==9658== ...
==9658==
==9659== Memcheck, a memory error detector.
==9659== ...
==9659==
==9659==
==9659== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 8 from 1)
==9659== ...
==9658== Memcheck, a memory error detector.
==9658== ...
==9658== Syscall param write(buf) points to uninitialised byte(s)
==9658==    at 0x4E38E90: __write_nocancel (in /lib/libpthread-2.8.90.so)
==9658==    by 0xE55DEFE: ??? (in /usr/lib/libICE.so.6.3.0)
==9658==    ...
==9658==  Address 0x5e91964 is 12 bytes inside a block of size 1,024 alloc'd
==9658==    at 0x4C24724: calloc (vg_replace_malloc.c:368)
==9658==    by 0xE55A373: IceOpenConnection (in /usr/lib/libICE.so.6.3.0)
==9658==    ...
==9658==
==9658== Syscall param write(buf) points to uninitialised byte(s)
==9658==    at 0x4E38ECB: ??? (in /lib/libpthread-2.8.90.so)
==9658==    by 0x7E00876: ??? (in /usr/lib/libsqlite3.so.0.8.6)
==9658==    ...
==9658==  Address 0x15dcfefc is 36 bytes inside a block of size 4,104 alloc'd
==9658==    at 0x4C2694E: malloc (vg_replace_malloc.c:178)
==9658==    by 0x7DE9CF7: sqlite3_malloc (in /usr/lib/libsqlite3.so.0.8.6)
==9658==    ...
==9658==
==9658== Syscall param write(buf) points to uninitialised byte(s)
==9658==    at 0x4E38ECB: ??? (in /lib/libpthread-2.8.90.so)
==9658==    by 0x7E00876: ??? (in /usr/lib/libsqlite3.so.0.8.6)
==9658==    by ...
==9658==  Address 0x15dcfefc is 36 bytes inside a block of size 4,104 alloc'd
==9658==    at 0x4C2694E: malloc (vg_replace_malloc.c:178)
==9658==    by 0x7DE9CF7: sqlite3_malloc (in /usr/lib/libsqlite3.so.0.8.6)
==9658==    ...
==9658==
==9658== ERROR SUMMARY: 19 errors from 3 contexts (suppressed: 343 from 3)
==9658== ...

We have three Memcheck start-up messages, two Memcheck shut-down messages, and two PIDs. What’s going on? The first start-up message (PID 9658) is for /usr/bin/firefox. The second (PID 9659) is for /usr/bin/basename. The third start-up message is for /usr/lib/firefox-3.0.9/firefox; the PID 9658 is reused because /usr/lib/firefox-3.0.9/firefox is invoked with ‘exec’, which reuses the same process — this also explains why there are only two shut-down messages.

But working this out isn’t easy. In fact, I cheated, by also using the -v option. This make Valgrind produce verbose output, and one of the things this includes is the command being executed. Without that I would have had a much harder time understanding what happened. But -v produces lots of extra stuff that is rarely interesting, so it’s not a good solution.

So my second proposal is to always print the invoked command as part of the Valgrind start-up message, like this:

==9045== Memcheck, a memory error detector.
==9045== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==9045== Using LibVEX rev 1888, a library for dynamic binary translation.
==9045== Copyright (C) 2004-2009, and GNU GPL'd, by OpenWorks LLP.
==9045== Using valgrind-3.5.0.SVN, a dynamic binary instrumentation framework.
==9045== Copyright (C) 2000-2009, and GNU GPL'd, by Julian Seward et al.
==9045== Running: /usr/bin/firefox
==9045== For more details, rerun with: -v
==9045==

(We could possibly move the “For more details” message to shut-down, where other, similar messages are shown.)

In this case the command has no arguments, but they would be shown if present.

This change would make the output of running Valgrind on multi-process programs much easier to run.

Another possibility is to also show the parent process’s command, but that is probably overkill.

Proposal 3: control child tracing via black-listing

Currently you either trace all child processes, or none of them. This is crude. It would often be useful to be able to trace some of them.

An obvious way to do this is with a black-list. You would specify a list of processes, anything not on that list would be traced, anything on the list would not be traced. And allowing patterns would be useful. Valgrind already has support for patterns containing shell style ‘*’ and ‘?’ wildcards, so that would be an obvious choice to use.

Some examples:

# Matches nothing, ie. traces all children.  (Single quotes are necessary to
# protect most patterns from shell interference.)
--trace-blacklist=''

# Matches everything, ie. traces no children.
--trace-blacklist='*'

# Skips all /usr/bin/python subprocesses.
--trace-blacklist='/usr/bin/python *'

# Skips all /usr/bin/python subprocesses invoked with -v.
--trace-blacklist='/usr/bin/python *-v*'

# Matches nothing, ie. traces all children.  It looks like it might match
# all command containing the substring "python", but it does not, because
# patterns must match the entire command, not just part of it.
--trace-blacklist='python'

# Skips all /usr/bin/python and /usr/bin/perl subprocesses;  multiple
# blacklist options are combined, and any process matching any of the
# blacklist entries is blacklisted.
--trace-blacklist='/usr/bin/python *' --trace-blacklist='/usr/bin/perl *'

One interesting question is this: what exactly does it mean to not trace a process? More specifically, if an untraced process spawns its own children, should we trace them? If we run the process natively (as –trace-children=no currently does for child processes) then any spawned children will not be traced — once Valgrind loses control, it cannot get it back. An alternative is to run the black-listed processes under Nulgrind, the Valgrind tool that adds no instrumentation. This incurs a slow-down of about 5x compared to native execution, but allows Valgrind to keep control.

So there are two possible kinds of black-list: the “skip you” black-list, and the “skip you and all your descendents” black-list. If I had too choose one, I’d probably pick the latter; the former seems less likely to be useful. If we added both kinds, I don’t know what name I’d give the options.

Specifying both a –trace-blacklist option and a –trace-children option would be disallowed, as it’s not clear how they would interact.

If proposal 2 is implemented, it would probably make sense to output a message like “Skipping due to black-list: <cmd>” for black-listed processes.

Proposal 4: control child tracing via white-listing

Another way to control which processes are traced is with a white-list. In this case, any process not on the whitelist would have to be run with Nulgrind, so that its children can be traced. (You could also have a “skip you and your descendents” whitelist in which non-matching processes don’t have their children traced, but that seems less useful.)

Some examples:

# Matches nothing, ie. traces no processes (even the one named on the
# command line).  (Well, it traces them with Nulgrind.)
--trace-whitelist=''

# Matches everything, ie. traces all processes.
--trace-whitelist='*'

# Traces only /usr/lib/firefox-3.0.9/firefox processes.
--trace-whitelist='/usr/lib/firefox-3.0.9/firefox*'

For whitelists, it’s clear that you want the top-level process (ie. the one named on the command line) to be considered as part of the whitelist matching, not just the children of the initial process. This is different to black-lists. At least, it’s different to “skip you and all your descendents” black-lists, where black-listing the top-level process is not useful, as it is equivalent to not running Valgrind at all. If “skip you” black-lists were also implemented, then considering the top-level process for black-listing makes more sense. (Alternatively, maybe making black-list and white-list behaviour equivalent is better, I’m not sure.)

You couldn’t use both –trace-blacklist and –trace-whitelist in the same invocation of Valgrind, as there is no clear meaning (what if a command matches both lists? What if it matches neither?) Likewise with –trace-children and –trace-whitelist.

And again, if proposal 2 is implemented, it would make sense to output a message “Skipping due to white-list: <cmd>” for non-white-listed processes.

Proposal 5: remove –trace-children

With whitelists and blacklists present, –trace-children could be removed, because it is subsumed by them:

–trace-children=yes is equivalent to –trace-whitelist=’*’ and –trace-blacklist=”
–trace-children=no is equivalent to –trace-blacklist=’*’

I think this is a good idea, because I don’t think it’s smart to have multiple options with overlapping functionality.

Conclusion

I think these changes would make Valgrind easier to use with multi-process programs, but there are some design decisions still to be made. Any feedback about them from Valgrind users would be very helpful. Thanks.

Mac OS X Valgrind

Valgrind + Mac OS X update (April 17, 2009)

Post author By Nicholas Nethercote
Post date April 17, 2009

It’s time for the April update on the progress of the Mac OS X port of Valgrind. It’s been a quieter month because I was on vacation for over 3 weeks, and Julian Seward hasn’t had a great deal of time to work on the port either. Even still, in that time 77 commits have been made to the DARWIN branch.

Here are the current (as of r9567) values of the metrics I have been using as a means of tracking progress.

The number of regression test failures on Mac was 422/172/41/0. It’s now 418/128/43/0. I.e. the number of failures went from 213 to 171. If we ignore the tools Helgrind, DRD and exp-Ptrcheck (which are not widely used and still completely broken on the branch) the number of failures dropped from 92 to 50. So the functionality of the branch is progressing well.
The size of the diff between the trunk and the branch was 38,248 lines (1.3MB). It’s now 39,027 lines (1.3MB). However, 2,223 of these lines are code that was cut, but was put in a text file for reference. So the more realistic number would be 36,804 lines (1.2MB). This metric was intended to indicate how close the branch is to being ready to merge with the trunk, but it doesn’t do that very well, so I will stop using it in the future.
Instead, I’m going to use a new metric: the number of “FIXME”-style marker comments that indicate something in the code that needs to be fixed. A lot of these mark Darwin-specific code that works correctly, but hasn’t been abstracted cleanly. When this approaches zero, it will mean that the branch should be very close to merge-ready. (Actually, the branch may be merge-ready before it reaches zero.) The current number of these 274. (The task-tracking used within Valgrind is mostly pretty informal, you can get away with it when there’s only a handful of frequent contributors!) That number is quite high, but a lot of those will be easy to fix.

Functionality improvements are as follows.

The build system now works with older versions of automake (pre 1.10). automake’s handling of assembly code files (specifically, whether AM_CPPFLAGS is used for them) changed in 1.10, and the build system wasn’t working with older versions.
Some extra system calls are handled, enough that iTunes apparently now runs (although I haven’t tried it myself).
-mdynamic-no-pic is now used for compilation of Valgrind. This turns off position-independent code, which (strangely enough) is the default for GCC on Darwin. This speeds up most programs at least a little, and in some cases up to 30%.
Some more signal-handling improvements.

So things are still moving along well.

Mac OS X Valgrind

Valgrind + Mac OS X update (March 17, 2009)

Post author By Nicholas Nethercote
Post date March 17, 2009
1 Comment on Valgrind + Mac OS X update (March 17, 2009)

Another month has passed since I last wrote about my work on the Mac OS X port of Valgrind. In that time 126 commits have been made to the DARWIN branch (and a similar number to the trunk). I’ve done a lot of them, but Julian Seward has found some time to work on the DARWIN branch and so has been doing some as well.

Here are the current (as of r9455) values of the metrics I have been using as a means of tracking progress.

The number of regression test failures on Linux was: 484 tests, 4 stderr failures, 1 stdout failures, 0 post failures (which I’ll abbreviate as 484/4/1/0). It’s now 484/0/1/0. I.e. the number of failures went from 5 to 1, and that one failure occurs on my machine even on the trunk (it’s a bad test). In other words, the branch works on Linux as well as the trunk. Now that this metric is the same on the branch as the trunk, I won’t bother tracking it in the future.
The number of regression test failures on Mac was 402/213/52/0. It’s now 422/172/41/0. I.e. the number of failures went from 265 to 213. Also, 20 extra tests are being run — a broken CPU feature-detection program meant that a number of tests that should have been running were not, and this has been fixed. Once again, this is the most important metric, and it’s improving steadily, but there’s still a long way to go. One encouraging thing here is that 121 of these failures (more than half) involve the tools Helgrind, DRD and exp-Ptrcheck, which are three of the less-used tools in the Valgrind distribution, and which are all completely broken on the branch, and which I haven’t really looked at yet precisely because they are less-used. The other 92 failures involve Memcheck and Nulgrind (the “no-instrumentation” tool, failures for which indicate problems with the testing of Valgrind’s core). A lot of these are problems with non-portable tests, rather than the Darwin port’s functionality. Furthermore, the tools Cachegrind, Callgrind, and Massif pass all of their tests.
The size of the diff between the trunk and the branch was 41,895 lines (1.5MB). It’s now 38,248 (1.3MB). But note, once again, that this is not a very useful metric. I just scanned through the diff and there’s not a great deal of differences in the diff than can be merged before we reach the point of the big branch-to-trunk merge.

Functionality improvements are as follows.

Basic signals are now supported, thanks to Julian. This accounted for a lot of the new test passes. This also means that debug builds of Firefox run successfully!
Some extra system calls are handled.
64-bit builds are working. To configure Valgrind for them, pass to ./configure the option –build=amd64-darwin. 64-bit Valgrind is quite slow, it does some very large mmaps at startup which take several seconds. This will need to be fixed. This also hasn’t been tested as much as the 32-bit version, and passes fewer tests.

I’m taking three weeks of vacation starting on Thursday, so progress on Valgrind+Darwin will be minimal over the next month. But I will be visiting Mountain View early next week (Monday, March 23 and Tuesday, March 24) so I’ll be able to actually meet some of the people I work with! I may also give a talk about Valgrind, depending on whether it can be scheduled. Any suggestions for things to talk about are welcome.