{"id":74,"date":"2011-01-27T15:01:14","date_gmt":"2011-01-27T14:01:14","guid":{"rendered":"http:\/\/blog.mozilla.org\/jseward\/?p=74"},"modified":"2011-02-21T13:00:03","modified_gmt":"2011-02-21T12:00:03","slug":"profiling-the-browsers-virtual-memory-behaviour","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/jseward\/2011\/01\/27\/profiling-the-browsers-virtual-memory-behaviour\/","title":{"rendered":"Profiling the browser&#8217;s virtual memory behaviour"},"content":{"rendered":"<p>We&#8217;ve been chipping away at memory use of Firefox 4 for a couple of<br \/>\nmonths now, with good results.\u00a0 Recently, though, I&#8217;ve been wondering<br \/>\nif we&#8217;re measuring the right things.\u00a0 It seems to me there&#8217;s two<br \/>\nimportant things to measure:<\/p>\n<ul>\n<li>Maximum virtual address space use for the process.\u00a0 Why is this<br \/>\nimportant?\u00a0 Because if the process runs out of address space, it&#8217;s<br \/>\nin serious trouble.\u00a0 Ditto, perhaps worse, if the process uses up<br \/>\nall the machine&#8217;s swap.<\/li>\n<\/ul>\n<ul>\n<li>But the normal case is different: we don&#8217;t run out of address space<br \/>\nor swap.\u00a0 In this case I don&#8217;t care how much memory the browser<br \/>\nuses.\u00a0 Really.\u00a0 When we talk about memory use in the non-OOM<br \/>\nsituation, we&#8217;re using that measure as a proxy for responsiveness.<br \/>\nExcessive memory use isn&#8217;t intrinsically bad.\u00a0 Rather, it&#8217;s the side<br \/>\neffect that&#8217;s the problem: it causes paging, both for the browser<br \/>\nand for everything else running on the machine, slowing<br \/>\neverything down.<\/li>\n<\/ul>\n<p>Trying to gauge responsiveness by looking at peak RSS figures strikes<br \/>\nme as a losing prospect.\u00a0 The RSS values are set by some more-or-less<br \/>\nopaque kernel page discard algorithm, and depend on the behaviour of<br \/>\nall processes in the system, not just Firefox.\u00a0 Worse, it&#8217;s uninformative:<br \/>\nwe get no information about which parts of our code base are causing<br \/>\npaging.<\/p>\n<p>So I hacked up a VM profiler.\u00a0 This tells me the page fault behaviour<br \/>\nwhen running Firefox using a given amount of real memory.\u00a0 It isn&#8217;t as<br \/>\nbig a task as it sounds, since we already have 99.9% of the required<br \/>\ncode in pace: Valgrind&#8217;s Cachegrind tool.\u00a0 It just required replacing<br \/>\nthe cache simulator with a virtual-to-physical address map simulator.<\/p>\n<p>The profiler does a pretty much textbook pseudo-LRU clock algorithm<br \/>\nsimulation.\u00a0 It differentiates between page faults caused by data and<br \/>\ninstruction accesses, since these require different fixes &#8212; make the<br \/>\ndata smaller vs make the code smaller.\u00a0 It also differentiates between<br \/>\nclean (page unmodified) and dirty (page modified, requires writeback)<br \/>\nfaults.<\/p>\n<p>Here are some preliminary results.\u00a0 Bear in mind the profiler has only<br \/>\njust started to work, so the potential for bogosity is still large.<\/p>\n<p>First question is: we know that 4.0 uses more memory than 3.6.x.\u00a0 But<br \/>\ndoes that result in more paging?\u00a0 I profiled both, loading 5 cad-comic<br \/>\ntabs (http:\/\/www.cad-comic.com\/cad\/random) and idling for a while, for<br \/>\nabout 8 billion instructions.\u00a0 Results, simulating 100MB of real memory:<\/p>\n<p>3.6.x, release build, using jemalloc:<\/p>\n<p>VM I accesses: 8,250,840,547\u00a0 (3,186 clean faults + 350 dirty faults)<br \/>\nVM D accesses: 3,089,412,941\u00a0 (5,239 clean faults + 552 dirty faults)<\/p>\n<p>M-C, release build, using jemalloc:<\/p>\n<p>VM I accesses: 8,473,182,041\u00a0 ( 8,140 clean faults +\u00a0 4,979 dirty faults)<br \/>\nVM D accesses: 3,372,806,043\u00a0 (22,720 clean faults + 14,335 dirty faults)<\/p>\n<p>Apparently it does page more.\u00a0 Most of the paging is due to data<br \/>\nrather than instruction accesses.\u00a0 Requires further investigation.<\/p>\n<p>Second question is: where does that paging come from?\u00a0 Are we missing<br \/>\nany easy wins?\u00a0 From a somewhat longer run with bigger workload, I got<br \/>\nthis (w\/ apologies for terrible formatting):<br \/>\n<code><br \/>\nDa (# data accesses)<br \/>\n.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Dfc (# clean data faults)<br \/>\n.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 function<br \/>\n------------------------------------------<br \/>\n18,921,574,436\u00a0\u00a0 382,023\u00a0\u00a0 PROGRAM TOTALS<\/code><\/p>\n<p><code> <\/code><\/p>\n<p><code>.\u00a0\u00a0 19,339,625\u00a0 \u00a0 60,583\u00a0\u00a0 js::Shape::trace<br \/>\n.\u00a0\u00a0\u00a0 2,228,649\u00a0\u00a0\u00a0 51,635\u00a0\u00a0 JSCompartment::purge<br \/>\n.\u00a0\u00a0 32,583,809\u00a0\u00a0\u00a0 22,223\u00a0\u00a0 js_TraceScript<br \/>\n.\u00a0\u00a0 16,306,348\u00a0\u00a0\u00a0 18,404\u00a0\u00a0 js::mjit::JITScript::purgePICs<br \/>\n.\u00a0\u00a0 18,160,249\u00a0\u00a0\u00a0 12,847\u00a0\u00a0 js::mjit::JITScript::purgePICs<br \/>\n.\u00a0\u00a0 52,155,631\u00a0\u00a0\u00a0 11,727\u00a0\u00a0 memset<br \/>\n.\u00a0\u00a0 27,229,391\u00a0\u00a0\u00a0 10,813\u00a0\u00a0 js::PropertyTree::sweepShapes<br \/>\n.\u00a0 120,482,308\u00a0\u00a0\u00a0 10,256\u00a0\u00a0 js::gc::MarkChildren<br \/>\n.\u00a0 138,049,859\u00a0\u00a0 \u00a0 9,134\u00a0\u00a0 memcpy<br \/>\n.\u00a0\u00a0\u00a0 2,228,649\u00a0\u00a0\u00a0\u00a0 8,779\u00a0\u00a0 JSCompartment::sweep<br \/>\n.\u00a0\u00a0 179,083,731\u00a0\u00a0\u00a0 8,057\u00a0\u00a0 js_TraceObject<br \/>\n.\u00a0\u00a0\u00a0 6,269,454\u00a0\u00a0\u00a0\u00a0  5,949\u00a0\u00a0 js::mjit::JITScript::sweepCallICs<br \/>\n<\/code><\/p>\n<p>18% ish of the faults come from js::Shape::trace.<\/p>\n<p>And quite a few come from js::mjit::JITScript::purgePICs (two<br \/>\nversions) and js::mjit::JITScript::sweepCallICs.\u00a0 According to Dave<br \/>\nAnderson and Chris Leary, there might be some opportunity to poke<br \/>\nthe code pages in a less jumping-around-y fashion.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ve been chipping away at memory use of Firefox 4 for a couple of months now, with good results.\u00a0 Recently, though, I&#8217;ve been wondering if we&#8217;re measuring the right things.\u00a0 It seems to me there&#8217;s two important things to measure: &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/jseward\/2011\/01\/27\/profiling-the-browsers-virtual-memory-behaviour\/\">Continue reading<\/a><\/p>\n","protected":false},"author":240,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/posts\/74"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/users\/240"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/comments?post=74"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/posts\/74\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/media?parent=74"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/categories?post=74"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/jseward\/wp-json\/wp\/v2\/tags?post=74"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}