{"id":238,"date":"2013-11-05T12:02:08","date_gmt":"2013-11-05T17:02:08","guid":{"rendered":"http:\/\/blog.mozilla.org\/nfroyd\/?p=238"},"modified":"2013-11-05T12:02:08","modified_gmt":"2013-11-05T17:02:08","slug":"the-performance-implications-of-strncpy","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nfroyd\/2013\/11\/05\/the-performance-implications-of-strncpy\/","title":{"rendered":"the performance implications of strncpy"},"content":{"rendered":"<p>Last week, I was working on <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=921040\">making Firefox compile for a OS X target on a Linux host<\/a>.\u00a0 As part of this effort, I ported <a href=\"http:\/\/www.opensource.apple.com\/source\/ld64\/\">Apple&#8217;s opensourced ld64 linker<\/a> to <a href=\"https:\/\/github.com\/froydnj\/ld64\/tree\/ld64-136-linux\">compile and run on a Linux host<\/a>.\u00a0 Since OS X is a BSD-derived operating system, ld64 made use of <a href=\"http:\/\/www.courtesan.com\/todd\/papers\/strlcpy.html\">the <tt>strlcpy<\/tt> and <tt>strlcat<\/tt> functions<\/a>, designed to be safer than the <tt>strcpy<\/tt>\/<tt>strncpy<\/tt>\/<tt>strcat<\/tt>\/<tt>strncat<\/tt> functions.\u00a0 Linux&#8217;s libc <a href=\"https:\/\/sourceware.org\/glibc\/wiki\/FAQ#Why_no_strlcpy_.2BAC8_strlcat.3F\">doesn&#8217;t implement <tt>strlcpy<\/tt> and <tt>strlcat<\/tt><\/a>, so I had to find replacements.\u00a0 <tt>strlcpy<\/tt> seemed easy enough, as <a href=\"http:\/\/shinh.skr.jp\/slide\/ldmac\/036.html\">a presentation on maloader suggested<\/a>:<\/p>\n<pre>size_t strlcpy(char* dst, const char* src, size_t size)\r\n{\r\n    dst[size - 1] = '\\0';\r\n    strncpy(dst, src, size - 1);\r\n    return strlen(dst);\r\n}<\/pre>\n<p>I cribbed <tt>strlcat<\/tt> from someplace else and went about my merry way.<\/p>\n<p>After I got ld64 to compile, then link simple programs, and worked my <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=931043\">way<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=931053\">through<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=933231\">some<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=932127\">configure<\/a> bugs for non-Android cross-compiles, I ran into a problem: the JavaScript shell was taking 8 minutes to link.\u00a0 This was unacceptable; it meant linking libxul was going to take over an hour, maybe over two, to link, which nobody would be happy about.\u00a0 The equivalent link of the JavaScript shell on my Mac mini took about two seconds.<\/p>\n<p>I started investigating what was going on with <a href=\"https:\/\/perf.wiki.kernel.org\/index.php\/Main_Page\">perf<\/a>, just checking into what ld64 was doing for parts of those 8 minutes.\u00a0 99%+ of the time was being spent inside <tt>strncpy<\/tt>.\u00a0 Hm, that&#8217;s strange.<\/p>\n<p>I fiddled around with a couple different things, none of which had much impact.\u00a0 Then I took a close look at the code calling <tt>strlcpy<\/tt> (yes, all the time in <tt>strlcpy<\/tt> was through this function, which should have been a red flag in the first place):<\/p>\n<pre>int32_t StringPoolAtom::add(const char* str)\r\n{\r\n\tint32_t offset = kBufferSize * _fullBuffers.size() + _currentBufferUsed;\r\n\tint lenNeeded = strlcpy(&amp;_currentBuffer[_currentBufferUsed], str, kBufferSize-_currentBufferUsed)+1;\r\n\tif ( (_currentBufferUsed+lenNeeded) &lt; kBufferSize ) {\r\n \t\t_currentBufferUsed += lenNeeded;\r\n \t}\r\n \telse {\r\n \t\tint copied = kBufferSize-_currentBufferUsed-1;\r\n \t\t\/\/ change trailing '\\0' that strlcpy added to real char\r\n \t\t_currentBuffer[kBufferSize-1] = str[copied];\r\n \t\t\/\/ alloc next buffer\r\n \t\t_fullBuffers.push_back(_currentBuffer);\r\n \t\t_currentBuffer = new char[kBufferSize];\r\n \t\t_currentBufferUsed = 0;\r\n \t\t\/\/ append rest of string\r\n \t\tthis-&gt;add(&amp;str[copied+1]);\r\n\t}\r\n\treturn offset;\r\n}<\/pre>\n<p>In this code, <tt>kBufferSize<\/tt> is 16MB, so the size parameter passed to <tt>strlcpy<\/tt> can be rather large compared to the size of the string being copied to the destination.<\/p>\n<p>I forget exactly where I read it, but I saw some small blurb about glibc&#8217;s <tt>strncpy<\/tt> having the crazy behavior of null padding the destination string, rather than just appending a null terminator. If <tt>strlcpy<\/tt> was implemented by calling out to <tt>strncpy<\/tt>, then just that function above would be writing hundreds or even thousands of megabytes of zeros more than required. That would definitely slow things down!<\/p>\n<p>(I later discovered that this &#8220;crazy behavior&#8221; is documented in the <a href=\"http:\/\/linux.die.net\/man\/3\/strncpy\"><tt>strncpy<\/tt> man page<\/a> and <a href=\"http:\/\/pubs.opengroup.org\/onlinepubs\/7990989775\/xsh\/strncpy.html\">is actually required by standards<\/a>.\u00a0 Indeed, the <a href=\"http:\/\/www.courtesan.com\/todd\/papers\/strlcpy.html\">original <tt>strlcpy<\/tt> paper<\/a> cites this problem of <tt>strncpy<\/tt> as a motivating factor for <tt>strlcpy<\/tt>.\u00a0 It is the only way the performance figures they give in the paper are actually relevant to their point.\u00a0 But silly me, I just assumed I knew how <tt>strncpy<\/tt> works rather than actually reading documentation. I am really curious how this behavior of <tt>strncpy<\/tt> came to be and why folks thought it was worth preserving.)<\/p>\n<p>Once I fixed the <tt>strlcpy<\/tt> implementation to do things properly, cross link times became comparable to native link times.\u00a0 And then I could think about linking libxul in a reasonable amount of time. (And I did link libxul in a reasonable amount of time, if you read through <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=921040\">the bug<\/a>. And it even runs on a Mac!)<\/p>\n<p>Lesson learned: don&#8217;t use <tt>strncpy<\/tt>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last week, I was working on making Firefox compile for a OS X target on a Linux host.\u00a0 As part of this effort, I ported Apple&#8217;s opensourced ld64 linker to compile and run on a Linux host.\u00a0 Since OS X is a BSD-derived operating system, ld64 made use of the strlcpy and strlcat functions, designed [&hellip;]<\/p>\n","protected":false},"author":320,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/238"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/users\/320"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/comments?post=238"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/posts\/238\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/media?parent=238"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/categories?post=238"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nfroyd\/wp-json\/wp\/v2\/tags?post=238"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}