C Correctness Valgrind

atol() considered harmful

Here’s what sounds like a simple question: in C, what’s the best way to convert a string representing an integer to an integer?

The good way

One way is to use the standard library function called strtol(). There are a number of similar functions, such as strtoll() (convert to a long long) and strtod (convert to a double), but they all work much the same so I’ll focus on strtol().

Here’s the prototype for strtol() from the man page on my Linux box:

#include <stdlib.h>
long int strtol(const char *nptr, char **endptr, int base);

The first argument is the string to convert.  The third argument is the base;  if you give it zero the base used will be 10 unless the string begins with “0x” (hexadecimal) or ‘0’ (octal).  Any whitespace at the front is skipped over, and a ‘+’ or ‘-‘ just before the digits is allowed.  The return value is the long integer value that the string was converted into.

The second argument is a call-by-reference return value.  If it’s non-NULL, it gets filled in with a pointer to the first non-converted char in the string.  If the string was entirely numbers, such as “123”, endptr will point to the terminating NUL char.  If the string had no valid integral prefix, e.g. “one two three”, endptr will point to the start of the string.  If the string was “123xyz”, i.e. contains numbers followed by non-numbers, endptr will point to the ‘x’ char in the string.

There are two good things about endptr.  First, it lets you do error-detection.  For example, if you are expecting a number without any extra chars at the end, endptr must point to a NUL char:

char* endptr;
long int x = strtol(s, &endptr, 0);
if (*endptr) { /* error case */ }

Second, it allows more flexible parsing.  For example, if you need to parse a string consisting of three comma-separated integers (e.g. “1, 2, 3”) you can do this:

int i1 = strtol(s,        &endptr, 0);  if (*endptr != ',')  goto bad;
int i2 = strtol(endptr+1, &endptr, 0);  if (*endptr != ',')  goto bad;
int i3 = strtol(endptr+1, &endptr, 0);  if (*endptr != '\0') goto bad;
bad: /* error case */

(Nb: This example allows whitespace before each number but not before each comma.)

Finally, strtol() returns 0 and sets errno to EINVAL if the conversion could not be performed because the given base was invalid.  Also, it clamps the value and sets errno to ERANGE if an overflow or underflow occurrs (as could be the case in the above code examples).

The bad way

Another way is to use the standard library function called atol().  (Again, it’s one of a family of similar functions, such as atoi() and atof().)  atol() is equivalent to this call to strtol():

strtol(str, (char **)NULL, 10);

Seems like a nice simplification, especially if you know you want base 10 numbers, right?  The problem is that there is no scope for error-detection.  atol(“123xyz”) will return 123.  atol(“John Smith”) will return 0.  Even better, atol() doesn’t have to set errno in any case!  (And this means it’s actually not quite equivalent to the above strtol() call).

It’s quite amazing, really;  I’m not aware of any other C standard library functions that actually make error-detection impossible.  Furthermore, the documentation involving atol() doesn’t make this clear.  Compare this to the dangerous function gets():  on my Linux box, the man page has a BUGS section that says “Never use gets().”  The man page on my Mac is a little less forthright;  in the SECURITY CONSIDERATIONS section it says “The gets() function cannot be used securely.”

The POSIX specification is a little more forthright about atol():

The atol() function is subsumed by strtol() but is retained because it is used extensively in existing code. If the number is not known to be in range, strtol() should be used because atol() is not required to perform any error checking.

The only use I can think of for atol() is in the case where you’ve already scanned the string for some reason and know it contains only digits.

A few of the options in the current release of Valgrind (3.4.0) that take numbers currently accept non-numbers because they are implemented using atoll().  (Most of them use strtoll(), however, and the discrepancy has been removed in the trunk.)  For example, if you pass the option –log-fd=foo it will interpret the “foo” as 0.  Lovely.

One reply on “atol() considered harmful”

Comments are closed.