{"id":58,"date":"2009-03-13T17:20:33","date_gmt":"2009-03-13T06:20:33","guid":{"rendered":"http:\/\/blog.mozilla.org\/nnethercote\/?p=58"},"modified":"2009-03-13T17:20:33","modified_gmt":"2009-03-13T06:20:33","slug":"atol-considered-harmful","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nnethercote\/2009\/03\/13\/atol-considered-harmful\/","title":{"rendered":"atol() considered harmful"},"content":{"rendered":"<p>Here&#8217;s what sounds like a simple question: in C, what&#8217;s the best way to convert a string representing an integer to an integer?<\/p>\n<h3>The good way<\/h3>\n<p>One way is to use the standard library function called strtol(). There are a number of similar functions, such as strtoll() (convert to a long long) and strtod (convert to a double), but they all work much the same so I&#8217;ll focus on strtol().<\/p>\n<p>Here&#8217;s the prototype for strtol() from the man page on my Linux box:<\/p>\n<pre>#include &lt;stdlib.h&gt;\r\nlong int strtol(const char *nptr, char **endptr, int base);<\/pre>\n<p>The first argument is the string to convert.\u00a0 The third argument is the base;\u00a0 if you give it zero the base used will be 10 unless the string begins with &#8220;0x&#8221; (hexadecimal) or &#8216;0&#8217; (octal).\u00a0 Any whitespace at the front is skipped over, and a &#8216;+&#8217; or &#8216;-&#8216; just before the digits is allowed.\u00a0 The return value is the long integer value that the string was converted into.<\/p>\n<p>The second argument is a call-by-reference return value.\u00a0 If it&#8217;s non-NULL, it gets filled in with a pointer to the first non-converted char in the string.\u00a0 If the string was entirely numbers, such as &#8220;123&#8221;, endptr will point to the terminating NUL char.\u00a0 If the string had no valid integral prefix, e.g. &#8220;one two three&#8221;, endptr will point to the start of the string.\u00a0 If the string was &#8220;123xyz&#8221;, i.e. contains numbers followed by non-numbers, endptr will point to the &#8216;x&#8217; char in the string.<\/p>\n<p>There are two good things about endptr.\u00a0 First, it lets you do error-detection.\u00a0 For example, if you are expecting a number without any extra chars at the end, endptr must point to a NUL char:<\/p>\n<pre>char* endptr;\r\nlong int x = strtol(s, &amp;endptr, 0);\r\nif (*endptr) { \/* error case *\/ }<\/pre>\n<p>Second, it allows more flexible parsing.\u00a0 For example, if you need to parse a string consisting of three comma-separated integers (e.g. &#8220;1, 2, 3&#8221;) you can do this:<\/p>\n<pre>int i1 = strtol(s,\u00a0  \u00a0\u00a0\u00a0\u00a0 &amp;endptr, 0);  if (*endptr != ',')\u00a0 goto bad;\r\nint i2 = strtol(endptr+1, &amp;endptr, 0);  if (*endptr != ',')\u00a0 goto bad;\r\nint i3 = strtol(endptr+1, &amp;endptr, 0);  if (*endptr != '\\0') goto bad;\r\n...\r\nbad: \/* error case *\/<\/pre>\n<p>(Nb: This example allows whitespace before each number but not before each comma.)<\/p>\n<p>Finally, strtol() returns 0 and sets errno to EINVAL if the conversion could not be performed because the given base was invalid.\u00a0 Also, it clamps the value and sets errno to ERANGE if an overflow or underflow occurrs (as could be the case in the above code examples).<\/p>\n<h3>The bad way<\/h3>\n<p>Another way is to use the standard library function called atol().\u00a0 (Again, it&#8217;s one of a family of similar functions, such as atoi() and atof().)\u00a0 atol() is equivalent to this call to strtol():<\/p>\n<pre>strtol(str, (char **)NULL, 10);<\/pre>\n<p>Seems like a nice simplification, especially if you know you want base 10 numbers, right?\u00a0 The problem is that there is no scope for error-detection.\u00a0 atol(&#8220;123xyz&#8221;) will return 123.\u00a0 atol(&#8220;John Smith&#8221;) will return 0.\u00a0 Even better, atol() doesn&#8217;t have to set errno in any case!\u00a0 (And this means it&#8217;s actually not quite equivalent to the above strtol() call).<\/p>\n<p>It&#8217;s quite amazing, really;\u00a0 I&#8217;m not aware of any other C standard library functions that actually make error-detection impossible.\u00a0 Furthermore, the documentation involving atol() doesn&#8217;t make this clear.\u00a0 Compare this to the dangerous function gets():\u00a0 on my Linux box, the man page has a BUGS section that says &#8220;Never use gets().&#8221;\u00a0 The man page on my Mac is a little less forthright;\u00a0 in the SECURITY CONSIDERATIONS section it says &#8220;The gets() function cannot be used securely.&#8221;<\/p>\n<p>The <a href=\"http:\/\/www.opengroup.org\/onlinepubs\/009695399\/\">POSIX specification<\/a> is a little more forthright about atol():<\/p>\n<blockquote>\n<p>The <em>atol<\/em>() function is subsumed by <em>strtol()<\/em> but is retained because it is used extensively in existing code. If the number is not known to be in range, <em>strtol()<\/em> should be used because <em>atol<\/em>() is not required to perform any error checking.<\/p>\n<\/blockquote>\n<p>The only use I can think of for atol() is in the case where you&#8217;ve already scanned the string for some reason and know it contains only digits.<\/p>\n<p>A few of the options in the current release of Valgrind (3.4.0) that take numbers currently accept non-numbers because they are implemented using atoll().\u00a0 (Most of them use strtoll(), however, and the discrepancy has been removed in the trunk.)\u00a0 For example, if you pass the option &#8211;log-fd=foo it will interpret the &#8220;foo&#8221; as 0.\u00a0 Lovely.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s what sounds like a simple question: in C, what&#8217;s the best way to convert a string representing an integer to an integer? The good way One way is to use the standard library function called strtol(). There are a number of similar functions, such as strtoll() (convert to a long long) and strtod (convert [&hellip;]<\/p>\n","protected":false},"author":139,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[540,528,484],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/58"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/users\/139"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/comments?post=58"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/58\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/media?parent=58"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/categories?post=58"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/tags?post=58"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}