Valgrind + Mac OS X update (Feb 17, 2009)

It’s been a month since I first wrote about my work on the Mac OS X port of Valgrind.  In that time I’ve made 85 commits to the DARWIN branch (and a similar number to the trunk).

Here are the current (as of r9192) values of the metrics I defined in the first post as a means of tracking progress.

  • The number of regression test failures on Linux was: 477 tests, 220 stderr failures, 53 stdout failures, 25 post failures (which I’ll abbreviate as 477/220/53/25). It’s now 484/4/1/0.  I.e. the number of failures went from 298 to 5.  A few new tests have been added.  Four of the failures are in Helgrind, the data race detector tool, which I haven’t tracked down yet.  The other failure is one that also occurs on the trunk.  So almost all the Linux functionality broken by the changes has been restored.
  • The number of regression test failures on Mac was 419/293/58/29.  It’s now 402/213/52/0.  I.e. the number of failures went from 380 to 265.  The total number of tests has gone down because some Linux-specific tests are no longer being (inappropriately) run on Mac.  This is the most important metric, and it’s improving steadily, but there’s still a long way to go.
  • The number of compiler warnings on Linux was 186.  It’s now 10, and all of these are from #warning declarations that mark places where improvement need to be made to the Darwin port, but aren’t actually a problem for Linux.  The number of compiler warnings on Mac was 461.  It’s now 44.  Of these, 33 are from #warning declarations, and 10 are from code generated by the Darwin ‘mig’ utility which I have no control over.  So compiler warnings aren’t an issue any more, and I won’t bother tracking them as a metric in the future.
  • The size of the diff between the trunk and the branch was 55,852 lines (1.9MB).  It’s now 41,895 lines (1.5MB).  But note that this is not a very useful metric;  progress will usually cause it to drop, but it will also increase as missing Darwin functionality is added.

Interestingly enough, although this number of Mac test failures has gone down significantly, if the branch didn’t handle your program a month ago it probably still won’t handle it now (although getsockopt() no longer causes an abort).  But Valgrind’s output may well be better (e.g. debugging information will be better utilized).  Much of my effort has been in making the tests pass — improving cases where the Darwin port was doing basically the right thing, but its output didn’t exactly match that expected.

One example is that stack traces were a little unclean, in various minor ways.  Another example is that I added a –ignore-fn option to Massif (the heap profiler) which allows it to ignore certain heap allocations.  This was required because Darwin’s libc always does a few heap allocations at start-up, but Linux’s libc doesn’t.  The new option allows the Darwin allocations to be ignored and therefore Massif’s output to be consistent on both platforms.

Few if any of these changes have made the branch closer to handling new programs, at least directly.  But there’s no point apologising about this, because the branch won’t reach a highly functional state without a working test suite to serve as a safety net against regressions.  And as I progress, getting more tests to pass will require genuine new program functionality to be supported, so improvements should start to occur on that front soon.  For example, signals currently aren’t supported at all, and this is why Firefox does not run under Valgrind on Mac yet — all calls to sigaction() currently return -1, which causes an assertion failure somewhere in NSPR.

Something else worth mentioning:  I bought a new MacBook Pro, as my old 32-bit only was was slow and noisy and getting annoying.  The new machine is 64-bit capable, but compiles to 32-bit by default and Valgrind’s configure script identifies it as a 32-bit only machine.  If anybody knows how to make configure recognise that it’s a 64-bit machine I’d love to hear about it.

Update, March 17: fixed a broken link to an earlier post.

9 Responses to Valgrind + Mac OS X update (Feb 17, 2009)

  1. Dumb suggestion – are you sure the OS is running as 64-bit?

  2. There is a mozconfig file for Mozilla’s build system that will get Mozilla building 64-bit on Mac OS X 10.5 on Mozilla bug 468509 (the build won’t finish but the compiler setup is fine afaict). Building 64-bit on Mac OS X 10.5 is considered cross compiling in our system though, I’m not sure how Xcode’s UI does it. I realize that file isn’t directly applicable to valgrind but you might be able to deduce the information you need from it.

  3. If it’s any help, Mac OS X 10.5 is still a 32-bit only OS. (Even in Cocoa.) 10.6 is currently scheduled to support 64-bit.

  4. Nicholas Nethercote

    Thanks for the suggestions. I should have given some more info about the 64-bit issue.

    Mac OS X 10.5 is clearly claiming to be a 32-bit OS — ‘uname -m’ gives “i386″, and normal compilation produces 32-bit executables. However, using ‘gcc -m64′ or ‘gcc -arch x86_64′ produces 64-bit executables, with 8-byte pointers, and the OS happily runs them. So it’s 64-bit capable.

    In general, on 64-bit platforms, Valgrind builds itself twice, once as 64-bit and once as 32-bit; the 64-bit version handles 64-bit executables, and the 32-bit version handles 32-bit executables. The build machinery uses -m32 and -m64 as required to make this happen. This assumes that configure detects that the machine is 64-bit in the first place.

    So if configure detected that the machine is 64-bit capable, both versions would build without problem. However, the $host_cpu variable which is set by AC_CANONICAL_HOST gets the value “i386″. I just tried hacking the configure script to force $host_cpu to “x86_64″ and both versions were successfully built, and they both work. And using –host=x86_64-darwin also works and is a cleaner way to do it.

    So, problem solved! At least, with a workaround. Presumably in 10.6 the OS will claim to be 64-bit, as per Jason’s comment, and so configure will set $host_cpu to x86_64.

  5. Firefox actually runs just fine if you comment out the assertion in NSPR.

    Patch to comment out the assertion: http://pastebin.mozilla.org/627682

    Output from running Firefox, loading Tinderbox, and quitting: http://pastebin.mozilla.org/627688

    I imagine using an opt-with-symbols build rather than a debug build would make the experience even more pleasant and not require commenting out the assertion. But all I had handy was a debug build.

  6. Nicholas Nethercote

    Jesse: that’s good to know. It seems that __sigaction(), __disable_threadsignal() and wait4_nocancel() (syscall 400) are the only unhandled things, and that the current strategy of ignoring them works! Getting signals working under Valgrind on Mac still would seem like a good idea :)

    And it seems a bit dodgy that NSPR asserts that sigaction() doesn’t return -1.

  7. Do you have a place where we can report issues or ask questions?

    I’m getting this:

    –51712– WARNING: unhandled syscall: 33554700
    –51712– a.k.a.: 268
    ==51712== at 0xB6DDE2: sem_open (in /usr/lib/libSystem.B.dylib)

    Looking at the syswrap and vki stuff, it is listed, just don’t know what else needs to be done to make it work…

    BTW: Big thanks for your work on this! I’ve really missed valgrind since I moved away from developing on Linux.

  8. Nicholas Nethercote

    Andy: best place to report problems is Bugzilla: http://www.valgrind.org/support/bug_reports.html. If you could report this there so it doesn’t get lost that would be great. Thanks.

  9. For the record, Jason Oster’s comment above is wrong. OS X 10.5 definitely supports 64-bit apps, even at the GUI level. (Xcode itself runs 64-bit.) I can vouch for this, as I helped fix a number of 64-bit compatibility bugs in 10.5 back when I still worked at Apple =)

    Unlike Windows, there is no separate 32 or 64 bit mode for the OS. If the CPU is 64-bit-capable, the kernel supports both 32 and 64 bit apps running at the same time. If the executable contains a 64-bit binary, it will use that preferentially to a 32-bit one.