{"id":306,"date":"2010-06-30T17:22:43","date_gmt":"2010-06-30T06:22:43","guid":{"rendered":"http:\/\/blog.mozilla.org\/nnethercote\/?p=306"},"modified":"2010-06-30T17:22:43","modified_gmt":"2010-06-30T06:22:43","slug":"cg_diff-a-differential-profiling-tool","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nnethercote\/2010\/06\/30\/cg_diff-a-differential-profiling-tool\/","title":{"rendered":"cg_diff: a differential profiling tool"},"content":{"rendered":"<p>I frequently use the SunSpider and V8 benchmark suites to compare the speed of different versions of TraceMonkey.\u00a0 The best <em>metric<\/em> for speed comparisons is always execution time.\u00a0 However, measuring execution time on modern machines is unreliable &#8212; you get different lots of variation between runs.\u00a0 This is a particular problem in this cae because the run-times of these benchmarks is very small &#8212; SunSpider takes less than 700 ms on my laptop, and V8 takes about 6.5 seconds.\u00a0 Run-to-run variations can be larger than the difference I&#8217;m trying to measure.\u00a0 This is annoying:\u00a0 the best speed metric cannot be measured exactly.<\/p>\n<p>So I frequently use Cachegrind to measure the number of executed instructions.\u00a0 This is a worse <em>metric<\/em> than execution time &#8212; the number of instructions doesn&#8217;t directly relate to the execution time, although it&#8217;s usually a good indicator &#8212; but it has the advantage that it can be measured exactly.\u00a0 Most of the SunSpider and V8 tests are deterministic, and if I measure them twice in a row I&#8217;ll get the same result.\u00a0 Cachegrind also gives instruction counts on a per-function and per-line basis, which is very useful.<\/p>\n<p>So I often run Cachegrind on two different versions of TraceMonkey:\u00a0 an unchanged copy of the current repository tip, and a copy of the current repository tip with a patch applied.\u00a0 I can then compare the results and get a very precise idea of how the patch affects performance.<\/p>\n<p>However, comparing the output of two Cachegrind runs manually is a pain.\u00a0 For example, here is part of Cachegrind&#8217;s output (lightly edited for clarity) for crypto-md5.js with an unchanged repository tip (as of a day or two ago):<\/p>\n<pre>--------------------------------------------------------------------------------\r\n\u00a0Ir\r\n--------------------------------------------------------------------------------\r\n48,923,280\u00a0 PROGRAM TOTALS\r\n--------------------------------------------------------------------------------\r\n\u00a0Ir\u00a0 file:function\r\n--------------------------------------------------------------------------------\r\n5,638,362\u00a0 ???:???\r\n4,746,990\u00a0 \/build\/buildd\/eglibc-2.10.1\/string\/..\/sysdeps\/i386\/i686\/strcmp.S:strcmp\r\n2,032,069\u00a0 jstracer.cpp:js::TraceRecorder::determineSlotType(int*)\r\n1,899,298\u00a0 jstracer.cpp:bool js::VisitFrameSlots&lt;js::CountSlotsVisitor&gt;(...)\r\n1,759,932\u00a0 jstracer.cpp:js::TraceRecorder::checkForGlobalObjectReallocation()\r\n1,232,425\u00a0 jsops.cpp:js_Interpret\r\n 885,168\u00a0 jstracer.cpp:bool js::VisitFrameSlots&lt;js::DetermineTypesVisitor&gt;(...)\r\n 871,197\u00a0 jstracer.cpp:js::TraceRecorder::set(int*, nanojit::LIns*, bool)\r\n\u00a0812,419\u00a0 \/build\/buildd\/eglibc-2.10.1\/iconv\/gconv_conf.c:insert_module\r\n\u00a0758,034\u00a0 jstracer.cpp:js::TraceRecorder::monitorRecording(JSOp)<\/pre>\n<p>At the top we have the total instruction count, and then we have the instruction counts for the top 10 functions.\u00a0 The ???:??? entry represents code generated by TraceMonkey&#8217;s JIT compiler, for which there is no debug information.\u00a0 &#8220;Ir&#8221; is short for &#8220;I-cache reads&#8221;, which is equivalent to &#8220;instructions executed&#8221;.<\/p>\n<p>Cachegrind tracks a lot more than just instruction counts, but I&#8217;m only showing them here to keep things simple.  It also gives per-line counts, but I&#8217;ve omitted them as well.<\/p>\n<p>And here is the corresponding output when a patch from <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=575529\">bug 575529<\/a> is applied:<\/p>\n<pre>--------------------------------------------------------------------------------\r\n        Ir\r\n--------------------------------------------------------------------------------\r\n42,332,998  PROGRAM TOTALS\r\n--------------------------------------------------------------------------------\r\n       Ir  file:function\r\n--------------------------------------------------------------------------------\r\n4,746,990  \/build\/buildd\/eglibc-2.10.1\/string\/..\/sysdeps\/i386\/i686\/strcmp.S:strcmp\r\n4,100,366  ???:???\r\n1,687,434  jstracer.cpp:bool js::VisitFrameSlots(js::CountSlotsVisitor&amp;, JSContext*, unsigned int, js::FrameRegsIter&amp;, JSStackFrame*)\r\n1,343,085  jstracer.cpp:js::TraceRecorder::checkForGlobalObjectReallocation()\r\n1,229,853  jsops.cpp:js_Interpret\r\n1,137,981  jstracer.cpp:js::TraceRecorder::determineSlotType(int*)\r\n  868,855  jstracer.cpp:js::TraceRecorder::set(int*, nanojit::LIns*, bool)\r\n  812,419  \/build\/buildd\/eglibc-2.10.1\/iconv\/gconv_conf.c:insert_module\r\n  755,753  jstracer.cpp:js::TraceRecorder::monitorRecording(JSOp)\r\n  575,200  jsscan.cpp:js::TokenStream::getTokenInternal()<\/pre>\n<p>It&#8217;s easy to see that the total instruction count has dropped from 48.9M to 42.3M, but seeing the changes at a per-function level is more difficult.  For a long time I would make this comparison manually by opening the two files side-by-side and reading carefully.\u00a0 Sometimes I&#8217;d also do some cutting-and-pasting to reorder entries.  The whole process was tedious, but the information revealed is so useful that I did it anyway.<\/p>\n<p>Then three months ago David Baron asked on Mozilla&#8217;s dev-platform mailing list if anybody knew of any good differential profiling tools.  This prompted me to realise that I wanted exactly such a tool for Cachegrind. Furthermore, as Cachegrind&#8217;s author, I was in a good place to understand exactly what was necessary \ud83d\ude42<\/p>\n<p>The end result is a new script, cg_diff, that can be used to compute the difference between two Cachegrind output files.  Here&#8217;s part of the difference between the above two versions:<\/p>\n<pre>--------------------------------------------------------------------------------\r\n        Ir\r\n--------------------------------------------------------------------------------\r\n-6,590,282  PROGRAM TOTALS\r\n--------------------------------------------------------------------------------\r\n        Ir  file:function\r\n--------------------------------------------------------------------------------\r\n-1,537,996  ???:???\r\n  -894,088  jstracer.cpp:js::TraceRecorder::determineSlotType(int*)\r\n  -416,847  jstracer.cpp:js::TraceRecorder::checkForGlobalObjectReallocation()\r\n  -405,271  jstracer.cpp:bool js::VisitFrameSlots(js::DetermineTypesVisitor&amp;, JSContext*, unsigned int, js::FrameRegsIter&amp;, JSStackFrame*)\r\n  -246,047  nanojit\/Containers.h:nanojit::StackFilter::read()\r\n  -238,121  nanojit\/Assembler.cpp:nanojit::Assembler::registerAlloc(nanojit::LIns*, int, int)\r\n   230,419  nanojit\/LIR.cpp:nanojit::interval::of(nanojit::LIns*)\r\n  -226,070  nanojit\/Assembler.cpp:nanojit::Assembler::asm_leave_trace(nanojit::LIns*)\r\n  -211,864  jstracer.cpp:bool js::VisitFrameSlots(js::CountSlotsVisitor&amp;, JSContext*, unsigned int, js::FrameRegsIter&amp;, JSStackFrame*)\r\n  -200,742  nanojit\/Assembler.cpp:nanojit::Assembler::findRegFor(nanojit::LIns*, int)<\/pre>\n<p>This makes it really easy to see what&#8217;s changed.  Negative values mean that the instruction count dropped, positive numbers mean that the instruction count increased.<\/p>\n<p>I&#8217;ve been using this script for a while now, and it&#8217;s really helped me analyse the performance effects of my patches.  Indeed, I have some scripts set up so that, with a single command, I can run all of SunSpider through two different versions of TraceMonkey and produce both normal profiles and difference profiles.  I can also get high-level instruction comparisons such as the one in <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=551885#c4\">this Bugzilla comment<\/a>.<\/p>\n<p>And now everybody else can use cg_diff too, because I just landed it on the Valgrind trunk.  If you want to try it, follow <a href=\"http:\/\/www.valgrind.org\/downloads\/repository.html\">these instructions<\/a> to setup a copy of the trunk.\u00a0 And note that if you want to compare two versions of a program that sit in different directories (as opposed to profiling a program, modifying it, then reprofiling it) you&#8217;ll need to use cg_diff&#8217;s &#8211;mod-filename option to get useful results.\u00a0 Feel free to ask me questions (via email, IRC or in the comments below) if you have troubles.<\/p>\n<p>Happy differencing!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I frequently use the SunSpider and V8 benchmark suites to compare the speed of different versions of TraceMonkey.\u00a0 The best metric for speed comparisons is always execution time.\u00a0 However, measuring execution time on modern machines is unreliable &#8212; you get different lots of variation between runs.\u00a0 This is a particular problem in this cae because [&hellip;]<\/p>\n","protected":false},"author":139,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[311,467,484],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/306"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/users\/139"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/comments?post=306"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/306\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/media?parent=306"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/categories?post=306"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/tags?post=306"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}