{"id":4,"date":"2010-12-05T23:40:19","date_gmt":"2010-12-05T22:40:19","guid":{"rendered":"http:\/\/blog.mozilla.org\/jseward\/?p=4"},"modified":"2010-12-15T06:40:36","modified_gmt":"2010-12-15T05:40:36","slug":"fun-n-games-with-dhat","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/jseward\/2010\/12\/05\/fun-n-games-with-dhat\/","title":{"rendered":"Fun &#8216;n&#8217; games with DHAT"},"content":{"rendered":"<p>Back in 2007 Nick Nethercote morphed his <a href=\"http:\/\/www.valgrind.org\/docs\/manual\/ms-manual.html\">Massif<\/a> heap profiler into<br \/>\nits present form.\u00a0 Massif intercepts malloc\/free et al, takes<br \/>\nperiodic snapshots of the heap and shows results using stack trees.<br \/>\nIt answers the questions &#8220;what&#8217;s in the heap?&#8221; and &#8220;who put it there?&#8221;<\/p>\n<p>Since then, I&#8217;d been mulling over a heap profiler that could also tell<br \/>\nme something about block lifetimes and usages.\u00a0 This year I finally<br \/>\ngot around to hacking something up &#8212; <a href=\"http:\/\/www.valgrind.org\/docs\/manual\/dh-manual.html\">DHAT<\/a>.\u00a0 Like Massif, DHAT<br \/>\nintercepts malloc\/free, but it also inspects every (data) memory<br \/>\nreference, to see which block, if any, it is to.\u00a0 By doing that we can<br \/>\nidentify hot blocks and under-used ones.\u00a0 For allocation points which<br \/>\nalways allocate blocks of the same size, DHAT keeps count of how often<br \/>\neach block offset is accessed, thereby giving information on hot and<br \/>\ncold object fields, and showing up probable aligment holes.<\/p>\n<p>DHAT also records block lifetime information.\u00a0 Time is measured in<br \/>\ninstructions executed, as does Massif.\u00a0 DHAT notes the age at death of<br \/>\neach block and shows the average value for each allocation point.<br \/>\nDoing that makes it easy to find allocation points which chew through<br \/>\nlots of heap, but don&#8217;t hold on to it for long, or, conversely, points<br \/>\nthat allocate heap and hold on to it for the entire process lifetime.<\/p>\n<p>DHAT tracks two kinds of entities: blocks and allocation points (APs).\u00a0 A block<br \/>\nis just a heap block.\u00a0 An AP is a stack trace that has allocated one<br \/>\nor more blocks.\u00a0 When a block is freed, its statistics are merged back<br \/>\ninto its AP.\u00a0 At the end of the run, DHAT shows statistics for the top<br \/>\nN APs, as sorted by one of three user-selectable metrics.\u00a0 Most of the<br \/>\nart of using DHAT is in interpreting the mass of numbers it produces.<\/p>\n<p>DHAT perhaps ought to be merged with Massif at some point.\u00a0 For now,<br \/>\nthe emphasis was to get something up and running quickly, to see if it<br \/>\ngenerates any useful information.<\/p>\n<p>So does it show up anything interesting in Firefox?<\/p>\n<p>Here&#8217;s a no-brainer, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=609905\">bug 609905<\/a>:<\/p>\n<p><code>-------------------- 32 of 5000 --------------------<br \/>\nmax-live:\u00a0\u00a0\u00a0 524,328 in 1 blocks<br \/>\ntot-alloc:\u00a0\u00a0 524,328 in 1 blocks (avg size 524328.00)<br \/>\ndeaths:\u00a0\u00a0\u00a0\u00a0\u00a0 1, at avg age 15,015,989,851 (99.60% of prog lifetime)<br \/>\nacc-ratios:\u00a0 0.00 rd, 0.00 wr\u00a0 (192 b-read, 825 b-written)<br \/>\nat 0x4C27ECA: operator new(unsigned long) (vg_replace_malloc.c:261)<br \/>\nby 0x661D01D: js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7807)<br \/>\nby 0x651FBEB: JSThreadData::init() (jscntxt.cpp:497)<br \/>\nby 0x652049D: js_CurrentThread(JSRuntime*) (jscntxt.cpp:588)<br \/>\nby 0x652085C: js_InitContextThread(JSContext*) (jscntxt.cpp:659)<br \/>\n<\/code><\/p>\n<p>This is a half-megabyte block that&#8217;s allocated, held onto for the<br \/>\nentire browser run, but never accessed.\u00a0 The key is to look at the<br \/>\nacc-ratios field, which shows the average number of times each byte<br \/>\nin the block got read and written &#8212; here, 0.00 for both.\u00a0 Turns out<br \/>\nto be a leftover allocator from the regexp engine that predated YARR.<\/p>\n<p>Here&#8217;s another, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=611400\">bug 611400<\/a>, that&#8217;s more<br \/>\ninteresting.\u00a0 When profiling the browser I noticed quite a lot heap<br \/>\noccupied by allocations from Jaegermonkey&#8217;s method<br \/>\njs::mjit::Compiler::finishThisUp.\u00a0 I wondered what it was but didn&#8217;t<br \/>\nthink much more about it until I profiled the JS engine running<br \/>\nKraken, and fell across this:<\/p>\n<p><code>-------------------- 4 of 100 --------------------<br \/>\nmax-live:\u00a0\u00a0\u00a0 12,197,056 in 1 blocks<br \/>\ntot-alloc:\u00a0\u00a0 12,197,056 in 1 blocks (avg size 12197056.00)<br \/>\ndeaths:\u00a0\u00a0\u00a0\u00a0\u00a0 1, at avg age 1,432,667,675 (42.98% of prog lifetime)<br \/>\nacc-ratios:\u00a0 0.00 rd, 0.00 wr\u00a0 (59 b-read, 129 b-written)<br \/>\nat 0x47B44FF: calloc (vg_replace_malloc.c:467)<br \/>\nby 0x81FF24D: js::mjit::Compiler::finishThisUp(js::mjit::JITSc<br \/>\nby 0x8215E47: js::mjit::Compiler::performCompilation(js::mjit:<br \/>\nby 0xC2F402F: ???<br \/>\n<\/code><\/p>\n<p>Hmm, a 12.2 MB allocation which is never used!\u00a0 That&#8217;s around 12% of<br \/>\nthe maximum live heap in this run.\u00a0 Turns out the method JIT creates<br \/>\ntables mapping JS bytecodes to the corresponding native code entry<br \/>\npoints.\u00a0 This Kraken test (imaging-darkroom) includes huge tables of<br \/>\nconstants, for which there are no entry points, so the 12MB allocation<br \/>\nis a completely empty table.<\/p>\n<p>This is an extreme case of a more general problem, though.<br \/>\nInstrumenting the method jit shows that about 98% of table entries are<br \/>\nunused when running more &#8220;normal&#8221; Javascript, when<br \/>\nsurfing at fairly complex-looking web pages, with 5 open tabs.<\/p>\n<p>I changed the table representation to only store useful entries.\u00a0 That<br \/>\nsaves the 12.2MB in Kraken.\u00a0 For a browser with 5 tabs, it saves<br \/>\nsomewhere in the region of 1%-2% of the entire C++ heap, which is a<br \/>\nnice outcome.<\/p>\n<p>And there&#8217;s more:<\/p>\n<ul>\n<li>we were allocating a 3MB file buffer when writing Zip files (<a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=610040\">610040<\/a>) (although JOrendorff beat me to it by a few minutes)<\/li>\n<\/ul>\n<ul>\n<li>Inefficient layout of CSS style rules (<a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=596140\">596140<\/a>)<\/li>\n<\/ul>\n<p>The above results were obtained by looking through allocation points<br \/>\nsorted by decreasing maximum-live-volume.\u00a0 Sorting the output on other<br \/>\nmetrics gives other perspectives:<\/p>\n<ul>\n<li>sorting by decreasing max-blocks-live shows places where we allocate large numbers of small blocks (CSS handling, the HTML5 parser)<\/li>\n<\/ul>\n<ul>\n<li>sorting by decreasing total-bytes-allocated tends to show up places that turn over a lot of heap, even if it isn&#8217;t held onto for long (on Linux, the X client libraries seem particularly bad)<\/li>\n<\/ul>\n<p>DHAT is available in Valgrind-3.6.0 (&#8211;tool=exp-dhat).\u00a0 It is stable<br \/>\nbut can sometimes produce misleading numbers, and is unreasonably slow<br \/>\nfor such a simple tool.\u00a0 I have a fixed up and 2-3 x faster version,<br \/>\nwhich I&#8217;ll ship in 3.6.1.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Back in 2007 Nick Nethercote morphed his Massif heap profiler into its present form.\u00a0 Massif intercepts malloc\/free et al, takes periodic snapshots of the heap and shows results using stack trees. It answers the questions &#8220;what&#8217;s in the heap?&#8221; and &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/jseward\/2010\/12\/05\/fun-n-games-with-dhat\/\">Continue reading<\/a><\/p>\n","protected":false},"author":240,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/posts\/4"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/users\/240"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/comments?post=4"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/posts\/4\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/media?parent=4"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/categories?post=4"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/tags?post=4"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}