C Correctness Valgrind

atol() considered harmful

Here’s what sounds like a simple question: in C, what’s the best way to convert a string representing an integer to an integer?

The good way

One way is to use the standard library function called strtol(). There are a number of similar functions, such as strtoll() (convert to a long long) and strtod (convert to a double), but they all work much the same so I’ll focus on strtol().

Here’s the prototype for strtol() from the man page on my Linux box:

#include <stdlib.h>
long int strtol(const char *nptr, char **endptr, int base);

The first argument is the string to convert.  The third argument is the base;  if you give it zero the base used will be 10 unless the string begins with “0x” (hexadecimal) or ‘0’ (octal).  Any whitespace at the front is skipped over, and a ‘+’ or ‘-‘ just before the digits is allowed.  The return value is the long integer value that the string was converted into.

The second argument is a call-by-reference return value.  If it’s non-NULL, it gets filled in with a pointer to the first non-converted char in the string.  If the string was entirely numbers, such as “123”, endptr will point to the terminating NUL char.  If the string had no valid integral prefix, e.g. “one two three”, endptr will point to the start of the string.  If the string was “123xyz”, i.e. contains numbers followed by non-numbers, endptr will point to the ‘x’ char in the string.

There are two good things about endptr.  First, it lets you do error-detection.  For example, if you are expecting a number without any extra chars at the end, endptr must point to a NUL char:

char* endptr;
long int x = strtol(s, &endptr, 0);
if (*endptr) { /* error case */ }

Second, it allows more flexible parsing.  For example, if you need to parse a string consisting of three comma-separated integers (e.g. “1, 2, 3”) you can do this:

int i1 = strtol(s,        &endptr, 0);  if (*endptr != ',')  goto bad;
int i2 = strtol(endptr+1, &endptr, 0);  if (*endptr != ',')  goto bad;
int i3 = strtol(endptr+1, &endptr, 0);  if (*endptr != '\0') goto bad;
bad: /* error case */

(Nb: This example allows whitespace before each number but not before each comma.)

Finally, strtol() returns 0 and sets errno to EINVAL if the conversion could not be performed because the given base was invalid.  Also, it clamps the value and sets errno to ERANGE if an overflow or underflow occurrs (as could be the case in the above code examples).

The bad way

Another way is to use the standard library function called atol().  (Again, it’s one of a family of similar functions, such as atoi() and atof().)  atol() is equivalent to this call to strtol():

strtol(str, (char **)NULL, 10);

Seems like a nice simplification, especially if you know you want base 10 numbers, right?  The problem is that there is no scope for error-detection.  atol(“123xyz”) will return 123.  atol(“John Smith”) will return 0.  Even better, atol() doesn’t have to set errno in any case!  (And this means it’s actually not quite equivalent to the above strtol() call).

It’s quite amazing, really;  I’m not aware of any other C standard library functions that actually make error-detection impossible.  Furthermore, the documentation involving atol() doesn’t make this clear.  Compare this to the dangerous function gets():  on my Linux box, the man page has a BUGS section that says “Never use gets().”  The man page on my Mac is a little less forthright;  in the SECURITY CONSIDERATIONS section it says “The gets() function cannot be used securely.”

The POSIX specification is a little more forthright about atol():

The atol() function is subsumed by strtol() but is retained because it is used extensively in existing code. If the number is not known to be in range, strtol() should be used because atol() is not required to perform any error checking.

The only use I can think of for atol() is in the case where you’ve already scanned the string for some reason and know it contains only digits.

A few of the options in the current release of Valgrind (3.4.0) that take numbers currently accept non-numbers because they are implemented using atoll().  (Most of them use strtoll(), however, and the discrepancy has been removed in the trunk.)  For example, if you pass the option –log-fd=foo it will interpret the “foo” as 0.  Lovely.

Mac OS X Valgrind

Valgrind on iPhone

With Valgrind now working reasonably well on Darwin, it’s possible to run Valgrind on an iPhone. Well, not directly on an iPhone, because the Darwin port doesn’t work on ARM, but you can run Valgrind on x86 binaries built against the iPhone Simulator SDK.

However, it’s a little tricky, because you can’t run Valgrind from the command line in this environment.  Landon Fuller explains here the small hack that is required to get around this limitation.

Correctness Valgrind

Eliminating undefined values with Valgrind, the easy way

(For those who don’t want to read a long post, here’s the short version:  if you use Valgrind, make sure you learn about the –track-origins=yes option that was added to version 3.4.0, it will make your life easier.)

Earlier this week, Jesse Ruderman reported a problem found with the Valgrind tool Memcheck:

==56620== Conditional jump or move depends on uninitialised value(s)
==56620==    at 0x1870962: nsHTMLDocument::MaybeEditingStateChanged()
==56620==    <more stack trace lines omitted>

Mats Palmgren investigated, but wasn’t able to determine the cause. His conclusion was:

The only possible explanations I can think of is either a bug in valgrind itself, or we make bogus VALGRIND_* calls to misinform it about the status of this memory. AFAICT, jemalloc is only place we use VALGRIND_* macros… and I traced all of those and I don’t see any suspicious calls…

It turns out that the Memcheck was right, and the problem was real.  So why did Mats have trouble understanding what Valgrind was trying to tell him?

Tracking undefined values

Memcheck tracks every single value bit present in a running program, and tracks whether it is defined or undefined.  Undefined values arise from certain kinds of allocations;  for example, stack- and heap-allocated memory is undefined if it is not initialised.  Furthermore, undefinedness gets propagated through all operations.  For example, if you add an undefined value to a defined value, the result is undefined.

Actually, because the definedness tracking is bit-level, it’s much more complicated than that:  you can have partially defined values due to bit-field operations, and for every arithmetic operation Memcheck has to determine how undefined bits in the inputs can lead to undefined bits in the output.  This paper describes how it works in detail. It’s quite amazing that Memcheck is able to do this so accurately and efficiently, and Julian Seward deserves a great deal of credit for this minor miracle of code engineering.  (For those of you who have used Purify, this bit-precise definedness tracking is one of the main things that distinguishes Memcheck from Purify.)  Furthermore, the technique can be to track any binary property, and has been reused in tools tracking which bits are tainted and which bits are secret.

Anyway, the gory details don’t really matter.  What does matter is that when Memcheck complains that a “Conditional jump or move depends on uninitialised value(s)” — “undefined” would arguably be better in that message than “uninitialised”, but that message hasn’t changed for years — it’s determined that an undefined value is being used in a dangerous way.  And it’s almost always right.

But I still haven’t answered the original question of why Mats found the message hard to understand.

An infection chain

In my previous post I defined the terms defect, infection and failure, which are the three stages of a “bug”, and together consitute an infection chain.  If you haven’t read that post, or know of these terms from elsewhere, I suggest you read that post before continuing.

What’s the relevant defect here?  It’s almost always that the code fails to initialise a value (e.g. a variable or field).

What’s the infection?  When the defective code is executed, the value ends up undefined, which means it’s probably incorrect.  But it’s not guaranteed to be incorrect — you might be lucky (or arguably unlucky) and the garbage value ends up being the same as the one that would have been assigned.  (This is most common with heap allocations, which in practice often result in all the allocated memory being zeroed.)

How does the infection propagate?  Well, once you have a possibly incorrect value, there are many ways for things to go wrong:

  • If an incorrect value is used as an input to an operation that writes a result, such as an assignment, arithmetic operation or system call, the likely result another incorrect value (a further infection).  But this isn’t guaranteed, e.g. if you bitwise-AND an incorrect value with 0, the result is always 0.
  • If an incorrect value is used as an input to an operation with user-visible side-effects (e.g. a system call or an assertion), a possible result is incorrect output or an abort (both failures).
  • If an incorrect value is used as an address in an operation that accesses memory, it might lead to an incorrect value in the load/store destination (more infections).  Or it could cause inappropriate memory to be accessed, leading to a crash (a failure).
  • If an incorrect value is used in a conditional branch, or as a jump target, that can lead to an incorrect program counter (another infection), which can lead to failures such as incorrect behaviour, or crashes (if the jump destination is sufficiently bogus).

In other words, incorrect values can lead to all sorts of problems.  And that’s why failing to initialise values can lead to such insidious bugs — the range of possible outcomes is extremely wide.

Back to Memcheck

So where in this chain does Memcheck complain?  Ideally it would be as soon as the defect is executed.  But that could lead to false positives — it’s possible that the value will be initialised later, before being used in dangerous ways.

You might think that Memcheck should complain if an undefined value is used in any way.  But that too would lead to false positives — it turns that normal programs copy around lots of undefined values, because of padding within structures.  (Indeed, we’ve tried doing exactly this with Memcheck, and it results in hundreds of warnings for trivial programs like Hello World.)

You might think that Memcheck should complain if an undefined value is used in an arithmetic operation.  Again, it doesn’t, because such an operation is not necessarily dangerous;  it depends if the resulting value is undefined and how it is then used.

Really, the problem is that it’s hard to tell if an undefined value constitutes an infection — doing that requires knowing the programmer’s intention.

What Memcheck does is complain when the use of an undefined value is very likely to lead to defects:

  • In the condition of a conditional branch or test, or the target address of a branch, because it is likely to lead to incorrect control flow, which will lead to incorrect behaviour.  (The complaint above was an example of this.)
  • As an argument to a system call, because it is likely to cause an incorrect externally-visible side-effect.
  • As the address in a load or store, because it could cause a crash.

The end result of this is that the place where Memcheck complains can be quite some distance from the defect, and also quite some distance from any failures that manifest as a result.  This is why Mats had trouble with this warning.

People often ask “can’t you do more eager checking?”  It should be clear by now that there are trade-offs between how early definedness problems are reported and how many false positives you get.  It’s not clear that there’s a single best answer, but the Valgrind developers have given this matter lots of thought over many years and it’s probably not going to change any time soon!  But do not despair…

So how do I get better information?

When Memcheck complains about a dangerous use of an undefined value, you need to work backwards to find the defect — the missing initialisation.  In easy cases, you can do this in your head, just by looking at the code.

If you’re still stumped, you can insert the Memcheck client requests VALGRIND_CHECK_MEM_IS_DEFINED or VALGRIND_CHECK_VALUE_IS_DEFINED into your code to determine which variables are undefined at different points in your code.  These can be very helpful, but they’re a bit laborious because you have to recompile your code every time you want to insert a new one.  But for over seven years that was the best way to do it.

However, Valgrind 3.4.0 introduced a new option, –track-origins=yes.  What it does is, for every undefined value, it tracks where the undefinedness originated.  The origin is usually a stack or heap allocation.  With –track-origins=yes, each complaint about an undefined value will be augmented with origin information like this:

==29553==  Uninitialised value was created by a stack allocation
==29553==    at 0x400968: main (origin1-yes.c:23)

or this:

==29553==  Uninitialised value was created by a heap allocation
==29553==    at 0x4C2758C: malloc (vg_replace_malloc.c:192)
==29553==    by 0x400A6F: main (origin1-yes.c:61)

This is useful because the missing initialisation — the defect — is very often supposed to occur immediately after the allocation.  So origin-tracking usually takes you immediately to the actual defect.  This is one of those features that probably doesn’t seem that impressive unless you’ve had to deal with this problem before, in which case you’ll realise it can be a godsend.

By default, –track-origins is not turned on.  This is because it slows Memcheck down a lot, because of all the extra origin information that must be propagated.  But if you have undefined value warnings, you should turn it on immediately!

With this information, Mats was able to track down the defect:  a constructor failed to initialise two fields.  Hooray!  And just think, if C++ warned about fields uninitialised by constructors, all these heroics wouldn’t be necessary 🙂


Bugs, defects, infections and failures

People often use the word “bug”. Unfortunately it’s a very imprecise word. “Error” suffers from the same problems. Both of them are used at different times to mean incorrect program code, incorrect program states, and visibly incorrect program behaviour. The conflation of these distinct things inhibits clear thinking and can mask subtleties regarding program correctness.

Better terminology

Because of this, I like the following terminology used by Andreas Zeller in his book Why Programs Fail:

  • A defect is an erroneous piece of code, one that can cause an infection when executed (but it may not always, i.e. it may be masked).  Defects are created by programmers.
  • An infection is an erroneous piece of program state, i.e. one for which there is a discrepancy between the intended and actual program state. Infections are caused by defects and/or prior infections. Infections can be masked via overwriting or correction.
  • A failure is an erroneous user-visible behaviour, i.e. one for which there is discrepancy between the intended and actual user-visible behaviour. Failures are caused by infections.
  • An infection chain is a cause-effect chain from a defect to one or more infections to a failure.

(Nb: “Intended state” and “intended behaviour” can be fuzzy concepts. Few programs have complete specifications, and these concepts typically reside partly in the mind of the programmer, partly in the mind of the user, partly in the documentation, and partly nowhere!)

Zeller’s book has received high praise from many quarters.  I personally found these definitions, which appear in chapter 1, the single best part of the whole book.

Some examples

Common terminology for memory-related “bugs” haphazardly covers all of these concepts.  Consider the following examples.

  • A double free is a defect.
  • A memory leak is an infection;  the underlying defect is (in a C program) a missing call to free() of a heap block, and the resulting failure may be degraded performance or an out-of-memory abort.  (If the leak is minor enough that the user doesn’t notice any difference, then it’s arguably not a failure. There’s a whole separate philosophical discussion to be had on whether poor performance could be considered a failure, depending on what the user’s implicit mental specification of “fast enough” is.)
  • A segmentation fault (just one kind of crash) is a failure which may be caused by a number of different infections, each of which may be caused by a number of different defects.
  • A buffer overflow attack involves an entire infection chain;  for example, a particular defect (a missing bounds check), causes an infection (an incorrect pointer value), which causes more infections (incorrect values on the stack), which causes yet more infections (an incorrect value for the program counter), which causes failures (incorrect and malicious behaviour from injected code).

Users care about failures, programmers care about defects

Failures affect users.  But defects are the root cause of failures, and they are what programmers must fix.  Furthermore, defects are still defects (and infections and still infections) even if they cannot cause failures;  such defects may not cause problems now, but if the program is changed later, the defect may cause a failure.

Therefore, the aim of “bug detection” tools such as those built with Valgrind is to help the programmer identify defects, whether they cause failures or not. Sometimes a tool can identify defects directly;  more often a tool will identify infections or failures, and it is the programmer’s task to work back through the infection chain to identify the defect.  The usability of such tools is greatly affected by how easy this task is, and my next post will discuss a particular example in more detail, and may teach even veteran Valgrind users a new trick or two that will make their lives easier.

Mac OS X Valgrind

Valgrind + Mac OS X update (Feb 17, 2009)

It’s been a month since I first wrote about my work on the Mac OS X port of Valgrind.  In that time I’ve made 85 commits to the DARWIN branch (and a similar number to the trunk).

Here are the current (as of r9192) values of the metrics I defined in the first post as a means of tracking progress.

  • The number of regression test failures on Linux was: 477 tests, 220 stderr failures, 53 stdout failures, 25 post failures (which I’ll abbreviate as 477/220/53/25). It’s now 484/4/1/0.  I.e. the number of failures went from 298 to 5.  A few new tests have been added.  Four of the failures are in Helgrind, the data race detector tool, which I haven’t tracked down yet.  The other failure is one that also occurs on the trunk.  So almost all the Linux functionality broken by the changes has been restored.
  • The number of regression test failures on Mac was 419/293/58/29.  It’s now 402/213/52/0.  I.e. the number of failures went from 380 to 265.  The total number of tests has gone down because some Linux-specific tests are no longer being (inappropriately) run on Mac.  This is the most important metric, and it’s improving steadily, but there’s still a long way to go.
  • The number of compiler warnings on Linux was 186.  It’s now 10, and all of these are from #warning declarations that mark places where improvement need to be made to the Darwin port, but aren’t actually a problem for Linux.  The number of compiler warnings on Mac was 461.  It’s now 44.  Of these, 33 are from #warning declarations, and 10 are from code generated by the Darwin ‘mig’ utility which I have no control over.  So compiler warnings aren’t an issue any more, and I won’t bother tracking them as a metric in the future.
  • The size of the diff between the trunk and the branch was 55,852 lines (1.9MB).  It’s now 41,895 lines (1.5MB).  But note that this is not a very useful metric;  progress will usually cause it to drop, but it will also increase as missing Darwin functionality is added.

Interestingly enough, although this number of Mac test failures has gone down significantly, if the branch didn’t handle your program a month ago it probably still won’t handle it now (although getsockopt() no longer causes an abort).  But Valgrind’s output may well be better (e.g. debugging information will be better utilized).  Much of my effort has been in making the tests pass — improving cases where the Darwin port was doing basically the right thing, but its output didn’t exactly match that expected.

One example is that stack traces were a little unclean, in various minor ways.  Another example is that I added a –ignore-fn option to Massif (the heap profiler) which allows it to ignore certain heap allocations.  This was required because Darwin’s libc always does a few heap allocations at start-up, but Linux’s libc doesn’t.  The new option allows the Darwin allocations to be ignored and therefore Massif’s output to be consistent on both platforms.

Few if any of these changes have made the branch closer to handling new programs, at least directly.  But there’s no point apologising about this, because the branch won’t reach a highly functional state without a working test suite to serve as a safety net against regressions.  And as I progress, getting more tests to pass will require genuine new program functionality to be supported, so improvements should start to occur on that front soon.  For example, signals currently aren’t supported at all, and this is why Firefox does not run under Valgrind on Mac yet — all calls to sigaction() currently return -1, which causes an assertion failure somewhere in NSPR.

Something else worth mentioning:  I bought a new MacBook Pro, as my old 32-bit only was was slow and noisy and getting annoying.  The new machine is 64-bit capable, but compiles to 32-bit by default and Valgrind’s configure script identifies it as a 32-bit only machine.  If anybody knows how to make configure recognise that it’s a 64-bit machine I’d love to hear about it.

Update, March 17: fixed a broken link to an earlier post.

Mac OS X Personal Valgrind

Me, Valgrind, and Mac OS X

Welcome to my blog, where I’ll be discussing some of the work I’m doing for Mozilla.

A little about me

I’m Australian. I live in Melbourne. I’ve also lived in Cambridge, England and Austin, Texas, and so I am fluent in at least three dialects of English. I like spending time with my wife Phoebe and baby daughter Keira, eating food, riding my bike, and following US presidential elections obsessively. Two weeks ago I left the academic/research world and started working for Mozilla.


My first big task for Mozilla is to improve support for Mac OS X in Valgrind. I’ve been involved with Valgrind since before the 1.0 release in 2002, and have done lots of work on it, including writing two tools that are in the Valgrind distribution: Cachegrind, a cache profiler, and Massif, a memory profiler. I even wrote a PhD dissertation about it.

And it seems that lots of Mozilla people find Valgrind useful, which is nice. However, it currently only runs on Linux. (Well, it also runs on AIX, but not many people care about that.)

Valgrind on Mac OS X

More than four years ago, on December 16, 2004, an Apple employee named Greg Parker wrote to the Valgrind developers mailing list to tell us that he was working on a port of Valgrind for Mac OS X.  He’s been working on it ever since then. (This must be why Mac OS 10.5 shipped late.)

After such a long time, I’m happy to report that there is now a branch holding Greg’s port in the Valgrind SVN repository.  If you want to check it out, do this:

  svn co svn:// <workspace-name>
  cd <workspace-name>

and then build it according to the instructions in the README file.  The branch is called DARWIN because Darwin is the name of the Mac OS “core”, which consists of a Mach-based microkernel and a few other bits and pieces.

However, please note that the port currently is, in Greg’s words: “UNSUPPORTED and INCOMPLETE and BUGGY… It may not find bugs in your program, or run your program correctly, or run your program at all.” What Greg has done is very impressive, and goes an awfully long way towards having a complete port of Valgrind on Mac OS X.  But it’s not the cleanest patch ever.  To give you an idea…

  • The patch I imported was 31,144 lines, just over 1MB of text.
  • The patch initially didn’t work on 32-bit Macs.
  • The patch broke Valgrind on Linux.  This took me a couple of days to fix, mostly involving the addition of appropriate #if statements.
  • The patch broke the regression test system;  they wouldn’t even build, let alone run. After fixing them to run again, more than half of the tests failed on Linux, and almost three-quarters failed on Mac.
  • There are lots of compiler warnings.  (The Valgrind trunk has none).
  • Much of the code in the patch has 4 space indenting;  the rest of Valgrind code has 3 space indenting.

So there’s plenty of work to be done to get the branch into a state where it will be suitable for merging with the trunk.  It’s hard to estimate how long this will take, it will just be a matter of fixing things one piece at a time.  My guess is that three months might suffice, but it’s really just a guess.  But here are some metrics I can use to judge progress, and their values just after I got the the system and regression tests building and running again on Mac and Linux:

  • The number of regression test failures on Linux: 477 tests, 220 stderr failures, 53 stdout failures, 25 post failures.  (“stderr” failures generally indicate that Valgrind’s output had a problem, “stdout” failures generally indicate that the test program’s output had a problem, and “post” failures indicate that the output of a Valgrind post-processing step had a problem.)  These numbers roughly indicate how much existing functionality has been broken on Linux by the Darwin changes, and should be fairly easy to get down.
  • The number of regression test failures on Mac:  419 tests, 293 stderr failures, 58 stdout failures, 29 post failures.  These numbers are the most important, as they roughly indicate how complete the Mac functionality is, and will be much more work to get down.
  • The number of compiler warnings: 186.  This number should be easy to reduce.   (Update, Jan 20: That’s on Linux. On Darwin it was 461.)
  • The size of the diff between the branch and the trunk: 55,852 lines, 1.9MB.  This is larger than the original patch because some files have been moved on the branch but not yet moved on the trunk, including some tests that are large and have large expected outputs.  This number will go down in fits and starts;  it will never get to zero, as the final merge will happen when there are many differences between the branch and trunk.

I’ll occasionally post updates to these numbers so people can track progress.

If Valgrind-on-Mac is of interest to you, please try out the new branch and let me know how it goes. Note that I’m working on an old MacBook Pro which is only 32-bit, so it’s possible that I’ve broken the 64-bit Mac support, but have no way to determine this.


First post!

You gotta start somewhere.

Update (Jan 19): Hmm, I wasn’t expecting this post to make it onto Planet Mozilla, as I wrote it two days before I requested that my blog be syndicated.  Well, at least it didn’t take you long to read 🙂