{"id":1108,"date":"2011-08-05T13:35:52","date_gmt":"2011-08-05T02:35:52","guid":{"rendered":"http:\/\/blog.mozilla.org\/nnethercote\/?p=1108"},"modified":"2011-08-05T13:35:52","modified_gmt":"2011-08-05T02:35:52","slug":"clownshoes-available-in-sizes-2101-and-up","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nnethercote\/2011\/08\/05\/clownshoes-available-in-sizes-2101-and-up\/","title":{"rendered":"Clownshoes available in sizes 2^10+1 and up!"},"content":{"rendered":"<p>I&#8217;ve been working a lot on about:memory lately.\u00a0 It&#8217;s a really useful tool, but one frustrating thing about it is that a decent chunk of our memory usage is not covered by any of the existing memory reporters, and so is falling into the &#8220;heap-unclassified&#8221; bucket:<\/p>\n<p><a href=\"http:\/\/blog.mozilla.org\/nnethercote\/files\/2011\/08\/heap-unclassified.png\"><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-1110\" title=\"heap-unclassified\" src=\"http:\/\/blog.mozilla.org\/nnethercote\/files\/2011\/08\/heap-unclassified.png\" alt=\"\" width=\"639\" height=\"30\" srcset=\"https:\/\/blog.mozilla.org\/nnethercote\/files\/2011\/08\/heap-unclassified.png 639w, https:\/\/blog.mozilla.org\/nnethercote\/files\/2011\/08\/heap-unclassified-300x14.png 300w\" sizes=\"(max-width: 639px) 100vw, 639px\" \/><\/a><\/p>\n<p>This bucket typically accounts for 30&#8211;45% of about:memory&#8217;s &#8220;explicit&#8221; measurement.\u00a0 We&#8217;ve discussed this &#8220;dark matter&#8221; several times in MemShrink meetings.\u00a0 There&#8217;s lots of work underway to <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=563700\">add new reporters to uninstrumented parts of Gecko<\/a>, but I&#8217;ve seen cases where it looks like a decent chunk of the missing memory is due to the JS engine, which is already seemingly thoroughly instrumented.<\/p>\n<p>This week I realized that some of the dark matter could be due to &#8220;slop&#8221; from jemalloc, the browser&#8217;s heap allocator.\u00a0 What is slop?\u00a0 When you ask a heap allocator for a certain number of bytes, it&#8217;ll often give you back more than you asked for, rounding the size up.\u00a0 This wastes some memory, but there are good reasons for it &#8212; it makes the heap allocator much faster and simpler, and helps avoid fragmentation when the memory is freed.<\/p>\n<p>The following comment from jemalloc.c shows jemalloc&#8217;s size classes.\u00a0 Any request that&#8217;s not for one of the listed sizes below is rounded up to the nearest size.<\/p>\n<pre>*\u00a0\u00a0 |=====================================|\r\n*\u00a0\u00a0 | Category | Subcategory\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 Size |\r\n*\u00a0\u00a0 |=====================================|\r\n*\u00a0\u00a0 | Small\u00a0\u00a0\u00a0 | Tiny\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 4 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 8 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |----------------+---------|\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | Quantum-spaced |\u00a0\u00a0\u00a0\u00a0\u00a0 16 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 32 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0 48 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 ... |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 480 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 496 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 512 |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |----------------+---------|\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | Sub-page\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 1 kB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 2 kB |\r\n*\u00a0\u00a0 |=====================================|\r\n*\u00a0\u00a0 | Large\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 4 kB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 8 kB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0 12 kB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 ... |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | 1012 kB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | 1016 kB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 | 1020 kB |\r\n*\u00a0\u00a0 |=====================================|\r\n*\u00a0\u00a0 | Huge\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 1 MB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 2 MB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0 3 MB |\r\n*\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 ... |\r\n*\u00a0\u00a0 |=====================================|<\/pre>\n<p>In extreme cases, jemalloc will return almost double what you asked for.\u00a0 For example, if you ask for 1,025 bytes, it&#8217;ll give you 2,048.\u00a0 A lot of the time you have to just live with slop;\u00a0 if you need to heap-allocate an object that&#8217;s 680 bytes, jemalloc will give you 1,024 bytes, and you just have to accept the 344 bytes of waste.\u00a0 <em>But if you have some flexibility in your request size, it&#8217;s a good idea to pick a size that&#8217;s a power-of-two, because that always gives you zero slop.<\/em>\u00a0 (Remember this, it&#8217;s important later on.)<\/p>\n<p>So I instrumented jemalloc to print out an entry for every heap allocation, showing the requested amount, the actual allocated amount, and the resulting slop.\u00a0 I then started up Firefox, opened Gmail, then shut down Firefox.\u00a0 Next, I ran the resulting log through a wonderful little <a href=\"http:\/\/en.wikipedia.org\/wiki\/Concordance_%28publishing%29\">concordance<\/a>-like script I have called &#8220;counts&#8221; which analyzes a file and tells you how many times each distinct line occurs.\u00a0 Here&#8217;s the top ten lines of the output:<\/p>\n<pre>909062 numbers:\r\n( 1 ) 205920 (22.7%, 22.7%): small:     24 -&gt;     32 (     8 )\r\n( 2 )  66162 ( 7.3%, 29.9%): small:     72 -&gt;     80 (     8 )\r\n( 3 )  61772 ( 6.8%, 36.7%): small:     40 -&gt;     48 (     8 )\r\n( 4 )  54386 ( 6.0%, 42.7%): small:   1056 -&gt;   2048 (   992 )\r\n( 5 )  48501 ( 5.3%, 48.0%): small:     18 -&gt;     32 (    14 )\r\n( 6 )  47668 ( 5.2%, 53.3%): small:     15 -&gt;     16 (     1 )\r\n( 7 )  24938 ( 2.7%, 56.0%): large:   4095 -&gt;   4096 (     1 )\r\n( 8 )  24278 ( 2.7%, 58.7%): small:     56 -&gt;     64 (     8 )\r\n( 9 )  13064 ( 1.4%, 60.1%): small:    104 -&gt;    112 (     8 )\r\n(10 )  12852 ( 1.4%, 61.6%): small:    136 -&gt;    144 (     8 )<\/pre>\n<p>There were 909,062 lines, which means there were 909,062 allocations.\u00a0 (I didn&#8217;t print information about frees.)\u00a0 The most common line was &#8220;<code>small: 24 -&gt; 32 ( 8 )<\/code>&#8221; which occurred 205,920 times, accounting for 22.7% of all the lines in the file.\u00a0 The &#8220;small&#8221; refers to jemalloc&#8217;s class size categories (see above), and every allocation request of 24 bytes was rounded up to 32 bytes, resulting in 8 bytes of slop.<\/p>\n<p>Looking through the list, I saw a case where 1,048,578 (2^20+2) bytes had been requested, and jemalloc had returned\u00a02,097,152 (2^21) bytes.\u00a0 That&#8217;s a huge (1MB) waste, and also smelled very fishy.\u00a0 And on further inspection there were a lot of smaller but still suspicious cases where a number slightly larger than a power-of-two was being rounded up to the next power-of-two:\u00a0 1,032 to 2048;\u00a0 1,056 to 2,048;\u00a0 2,087 to 4,096;\u00a0 4,135 to 8,192, etc.<\/p>\n<p>I investigated a number of these by adding some code to jemalloc to detect these suspicious request sizes, and then <a href=\"http:\/\/blog.mozilla.org\/nnethercote\/2011\/01\/11\/using-valgrind-to-get-stack-traces\/\">using Valgrind to print out a stack trace<\/a> every time one occurred.\u00a0 What I found was <em>four<\/em> distinct places in the codebase where the code in question had flexibility in the amount of memory it allocated, and so had <em>tried to ask for a power-of-two, but had botched the request and thus ended up asking for slightly more than a power-of-two!<\/em><\/p>\n<ul>\n<li>In js\/src\/vm\/String.cpp, when concatenating two strings, the capacity of the resultant string is rounded up to the next power-of-two (unless the string is really big, in which case the rounding is less aggressive).\u00a0 This is a reasonable thing to do, as it means that if a further concatenation occurs, there&#8217;s a good chance there&#8217;ll be space for the extra chars to be added in-line, avoiding an extra allocation and copy.\u00a0 But the code first did its power-of-two round-up, and then added <code>sizeof(jschar)<\/code> (which is two bytes) to the size, <em>to allow space for the terminating NULL char<\/em>.\u00a0 This code was responsible for the 1,048,578 byte request I mentioned earlier.\u00a0 Luke Wagner <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=675132\">fixed this bug<\/a> with a minimal change earlier this week.\u00a0 I was unable to easily measure the overall cost of this, but avoiding 1MB of slop with some frequency can only be a good thing.<\/li>\n<li>nsprpub\/lib\/ds\/plarena.c implements an <a href=\"http:\/\/en.wikipedia.org\/wiki\/Region-based_memory_management\">arena allocator<\/a>.\u00a0 Each arena pool is given a size for each arena when it&#8217;s created, and the size passed in is almost always a power-of-two such as 1,024 or 4,096.\u00a0 Which is good, except that the allocator then has to add a few more bytes onto the request <em>because each arena has a header that holds book-keeping information<\/em>.\u00a0 When I start Firefox and load Gmail on my Linux64 box, I see that approximately 3MB of space is wasted because of this bug.\u00a0 The fix is simple, it just requires that the arena payload be reduced by the size of the header;\u00a0 my <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=676457\">patch<\/a> is awaiting review.\u00a0 Sadly enough, this problem was <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=414088\">identified<\/a> 3.5 years ago but not fixed.<\/li>\n<li>js\/src\/jsarena.cpp is almost identical to nsprpub\/lib\/ds\/plarena.c and the story is the same:\u00a0 it has the same <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=675150\">problem<\/a>; it was first <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=408921\">identified<\/a> 3.5 years ago but not fixed; and my patch is awaiting review.\u00a0 This didn&#8217;t make much difference in practice because this problem had been separately identified for the particular JSArenaPool that allocates the most memory, and worked around using an <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=558971\">ill-documented hack<\/a>.<\/li>\n<li>db\/sqlite3\/src\/sqlite3.c is the best one of all.\u00a0 SQLite is very careful about measuring its memory usage and provides an API to access those measurements.\u00a0 But in order to measure its own memory usage accurately, it <em>adds 8 bytes to every allocation request in order to store the requested size at the start of the resulting heap block<\/em>.\u00a0 In doing so, it converts a lot of 1,024 byte requests into 1,032 byte requests.\u00a0 Ironically enough, the slop caused by storing the requested size rendered the stored size inaccurate<em>.<\/em>\u00a0 I determined that as a result, SQLite is using roughly 1.15&#8211;1.20x more memory in Firefox than it thinks it is.\u00a0 So, go look at the &#8220;storage\/sqlite&#8221; number in about:memory and mentally adjust it accordingly. \u00a0 (On my Mac laptop which has an old profile and 11 tabs open it&#8217;s currently 70MB, so there&#8217;s probably an additional 10&#8211;14MB of slop).\u00a0 SQLite is an upstream product, and the authors are <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=676189\">working on a fix<\/a>.<\/li>\n<\/ul>\n<p>Once all these fixes have landed, there will be two benefits.\u00a0 First, Firefox&#8217;s memory usage will go down, which is good for everyone.\u00a0 Second, the proportion of dark matter in about:memory will also drop, which makes our memory profiling more accurate and useful.<\/p>\n<p>After that, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=676732\">converting existing reporters to use malloc_usable_size<\/a> will account for much of the legitimate slop, further reducing the amount of dark matter. And then I have <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=676724\">a master plan<\/a> for identifying whatever&#8217;s left.<\/p>\n<p><a href=\"http:\/\/www.flickr.com\/photos\/29233640@N07\/5131195458\/\"><img decoding=\"async\" loading=\"lazy\" title=\"clownshoes\" src=\"http:\/\/blog.mozilla.org\/nnethercote\/files\/2011\/08\/clownshoes.jpg\" alt=\"clownshoes\" width=\"640\" height=\"457\" \/><\/a><\/p>\n<p>So what&#8217;s up with this post&#8217;s title?\u00a0 Andreas Gal uses the term &#8220;clownshoes&#8221; to refer to code that is laughably bad.\u00a0 When I found and filed the first of these bugs, in a fit of whimsy I put &#8220;[clownshoes]&#8221; in its whiteboard.\u00a0 Only later did a realize how suitable this was: a clown&#8217;s shoes and these botched power-of-two-sized allocations both have lots of empty space in them, and thus make their owner less nimble than they could be.\u00a0 Perfect.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been working a lot on about:memory lately.\u00a0 It&#8217;s a really useful tool, but one frustrating thing about it is that a decent chunk of our memory usage is not covered by any of the existing memory reporters, and so is falling into the &#8220;heap-unclassified&#8221; bucket: This bucket typically accounts for 30&#8211;45% of about:memory&#8217;s &#8220;explicit&#8221; [&hellip;]<\/p>\n","protected":false},"author":139,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4550,30,4556,4544],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/1108"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/users\/139"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/comments?post=1108"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/1108\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/media?parent=1108"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/categories?post=1108"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/tags?post=1108"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}