Nicholas Nethercote – Page 25 – Notes on Rust, Firefox, MemShrink, JavaScript, and more

Memory profiling Firefox with Massif, part 2

Post author By Nicholas Nethercote
Post date January 7, 2011
2 Comments on Memory profiling Firefox with Massif, part 2

To follow up from this post: we’ve made some good progress on reducing JaegerMonkey’s memory consumption in Firefox 4, though there’s still a way to go. Julian Seward will blog about this shortly. In the meantime, I thought I’d share a particularly useful Massif invocation that Rob Sayre inspired me to concoct:

  valgrind \
  --smc-check=all --trace-children=yes \
  --tool=massif \
  --pages-as-heap=yes --detailed-freq=1000000 \
  --threshold=0.5 \
  --alloc-fn=mmap \
  --alloc-fn=syscall \
  --alloc-fn=pages_map \
  --alloc-fn=chunk_alloc \
  --alloc-fn=arena_run_alloc \
  --alloc-fn=arena_bin_malloc_hard \
  --alloc-fn=malloc \
  --alloc-fn=realloc \
  --alloc-fn='operator new(unsigned long)' \
  --alloc-fn=huge_malloc \
  --alloc-fn=posix_memalign \
  --alloc-fn=moz_xmalloc \
  --alloc-fn=JS_ArenaAllocate \
  --alloc-fn=PL_ArenaAllocate \
  --alloc-fn=NS_Alloc_P \
  --alloc-fn=NS_Realloc_P \
  --alloc-fn='XPConnectGCChunkAllocator::doAlloc()' \
  --alloc-fn='PickChunk(JSRuntime*)' \
  --alloc-fn='RefillFinalizableFreeList(JSContext*, unsigned int)' \
  --alloc-fn=sqlite3MemMalloc \
  --alloc-fn=mallocWithAlarm \
  --alloc-fn=sqlite3Malloc \
  <insert-firefox-command-here>

Good grief! What a mess. Don’t blame Massif for this, though; it’s because Firefox has so many custom memory allocators.

With that invocation, the output of ms_print becomes something that is comprehensible to people other than Massif’s author 🙂 Here’s an extraction of the output which gives a high-level view of Firefox’s memory consumption on 64-bit Linux after loading 20 tabs, each with a random comic from http://www.cad-comic.com/cad/, which is a JavaScript-heavy site:

31.04% (366,878,720B) _dl_map_object_from_fd (dl-load.c:1195)
15.73% (185,998,724B) in 3693 places, all below massif's threshold (00.00%)
15.62% (184,639,488B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
05.68% (67,112,960B) pa_shm_create_rw (in /usr/lib/libpulsecommon-0.9.21.so)
04.35% (51,372,032B) JSC::ExecutablePool::systemAlloc(unsigned long) (ExecutableAllocatorPosix.cpp:43)
03.30% (38,993,920B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7644)
03.11% (36,741,120B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7643)
02.87% (33,935,360B) js::PropertyTree::newShape(JSContext*, bool) (jspropertytree.cpp:97)
02.84% (33,554,432B) js_NewFunction(JSContext*, JSObject*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, unsigned int, JSObject*, JSAtom*) (jsgcinlines.h:127)
02.79% (32,923,648B) js::InitJIT(js::TraceMonitor*) (jstracer.cpp:7642)
01.99% (23,555,684B) js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (jsutil.h:213)
01.69% (19,934,784B) JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) (jsutil.h:209)
01.53% (18,067,456B) pcache1Alloc (sqlite3.c:33368)
01.48% (17,457,388B) nsStringBuffer::Alloc(unsigned long) (nsSubstring.cpp:206)
01.31% (15,478,784B) g_mapped_file_new (in /lib/libglib-2.0.so.0.2400.1)
00.89% (10,486,784B) JS_NewObject (jsgcinlines.h:127)
00.71% (8,388,608B) js::StackSpace::init() (jscntxt.cpp:164)
00.68% (8,093,696B) GCGraphBuilder::NoteScriptChild(unsigned int, void*) (mozalloc.h:229)
00.68% (8,024,064B) NewOrRecycledNode(JSTreeContext*) (jsparse.cpp:495)
00.67% (7,974,936B) js::Vector<unsigned short, 32ul, js::ContextAllocPolicy>::growStorageBy(unsigned long) (jsutil.h:217)
00.53% (6,291,456B) js_CloneRegExpObject(JSContext*, JSObject*, JSObject*) (jsgcinlines.h:127)
00.52% (6,190,836B) nsTArray_base<nsTArrayDefaultAllocator>::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:68)

The total is 1,182,094,880 bytes.

31.04% is from _dl_map_object_from_fd. This corresponds to code and data segments, mostly from libraries.
15.73% is from allocation points small enough that they fell below the threshold (0.5%) that I used for this run.
15.62% is from pthread_create, i.e. thread stacks. Hopefully most of this space also won’t be mapped in.
5.68% is from pa_shm_create_rw. Bug 617852 is open about this. It won’t be fixed until after Firefox 4.0, but that’s not so bad because /proc/pid/smaps tells me that hardly any of it is mapped into physical memory.
That leaves 31.93% of big, heap-ish allocations. It’s pretty obvious that for this workload, the JS engine is being greedy, accounting for 26.42% of that 31.83%. One piece of good news is that the three js::InitJIT() entries, which together account for 9.2%, will be greatly improved by bug 623428; I’m hoping to reduce them by a factor of 10 or more.

If anyone wants Massif’s full output, I’ll be happy to give it to them. The full output contains full stack traces, which can be useful.

Some conclusions.

I’m still worred about our memory consumption, and I intend to keep pushing on it, both before Firefox 4.0 is released and afterwards.
Massif takes a bit of getting used to, particularly when you are profiling a huge, messy program like Firefox. But it’s the only space profiler I know of that gives information that is detailed enough to be really useful in reducing memory consumption. Without it, we wouldn’t have made much progress on reducing Firefox 4.0’s space consumption. I’d love for other people to run it, it works on Linux and Mac (not Windows, unfortunately). I’m happy to help anyone who wants to try it via IRC or email. For all the improvements done lately, I’ve only looked at a single workload on a single machine! There’s much more analysis to be done.
If anyone knows of other decent memory profilers that can handle programs as complex as Firefox, I’d love to hear about it. In particular, note that if you only measure the heap (malloc et al) you’re only getting part of the story; this is again because we have multiple allocators which bypass malloc and use mmap/VirtualAlloc directly.
I wonder if we need better memory benchmarks. I’d like to have some that are as easy to run as, say, SunSpider. Better telemetry would also be great.

Uncategorized

What is this T.3670 function produced by GCC?

Post author By Nicholas Nethercote
Post date December 17, 2010
2 Comments on What is this T.3670 function produced by GCC?

When I do differential profiling with cg_diff I see a lot of entries like this:

-333,110  js/src/optg32/../assembler/assembler/AssemblerBuffer.h:T.3670
 333,110  js/src/optg32/../assembler/assembler/AssemblerBuffer.h:T.3703

T.3670 and T.3703 are clearly the same function, one which must be auto-generated by GCC but given different randomized suffixes in different builds. These entries are uninteresting and I plan to add an option to cg_diff that allows the user to munge function names with a search-and-replace expression so I can get rid of them.

But I’d like to know what the are first, so I can talk about them in the cg_diff documentation. Does anybody know? I tried googling, but it’s a very hard thing to find out with a search engine. (If someone can find something about it with a search engine I’d love to know what search term you used.)

Uncategorized

Memory profiling Firefox with Massif

Post author By Nicholas Nethercote
Post date December 9, 2010
9 Comments on Memory profiling Firefox with Massif

I’m worried about Firefox 4.0’s memory consumption. Bug 598466 indicates that it’s significantly higher than Firefox 3.6’s memory consumption. So I did some profiling with Massif, a memory profiler built with Valgrind.

My test run involved a Firefox profile in which 20 tabs were open to random comics from http://www.cad-comic.com/cad/. This is a site that uses lots of JavaScript code. I ran a 64-bit version of Firefox on a Ubuntu 10.10 machine.

Here’s the command line I used:

  valgrind --smc-check=all --trace-children=yes --tool=massif --pages-as-heap=yes \
    --detailed-freq=1000000 optg64/dist/bin/firefox -P cad20 -no-remote

Here’s what that means:

--smc-check=all tells Valgrind that the program may use self-modifying code (or, in Firefox’s case, overwrite dynamically generated code).
--trace-children tells Valgrind to trace into child processes exec’d by the program. This is necessary because optg64/dist/bin/firefox is a wrapper script.
--tool=massif tells Valgrind to run Massif.
--pages-as-heap=yes tells Massif to profile allocations of all memory at the page level, rather than just profiling the heap (ie. memory allocated via malloc/new). This is important because the heap is less than half of Firefox’s memory consumption.
--detailed-freq=1000000 tells Massif to do detailed snapshots (which are more informative but more costly) only every 1,000,000th snapshot, which is less often than the default of every 10th snapshot. This makes it run a bit faster. This is fine because Massif always takes a detailed snapshot at the peak memory consumption point, and that’s the one I’m interested in.
optg64/ is the name of my build directory.
-P cad20 tells Firefox to use a particular profile that I set up appropriately.
-no-remote tells Firefox to start a new instance; this is necessary because I had a Firefox 3.6 process already running.

Massif produced a number of files, one per invoked process. They have names like massif.out.22722, where the number is the process ID. I worked out which one was the main Firefox executable; this is not hard because the output file has the invoked command on its second line. I then viewed it in two ways. The first was with Massif’s ms_print script, using this command:

    ms_print massif.out.22722

This produces a text representation of the information that Massif collects.

The second was with the massif-visualizer program, which is not part of the Valgrind distribution, but is available here. The information it shows is much the same as that shown by ms_print, but it’s much prettier. Below is a screenshot that shows the basic progression of the memory consumption (click on it for a full-size version).

At the peak memory usage point, a total of 1,297,612,808 bytes were mapped.

I went through the detailed snapshot taken at the point of peak memory consumption. Massif takes the stack trace of every memory allocation in the program, and collapses them into a tree structure in which common stack trace prefixes are merged. See the Massif user manual for more details of this tree structure; what’s relevant for this blog post is that I (manually) picked out various places in the code that cause significant amounts of memory to be mapped. These places together cover 68.85% (893,463,472 bytes) of the total mapped memory; the other 31.15% was taken up by smaller allocations that are less worth singling out. Each place contains a partial stack trace as identification. I’ve listed them from largest to smallest.

28.40% (368,484,352 bytes) here:

  mmap (syscall-template.S:82)
  _dl_map_object_from_fd (dl-load.c:1195)
  _dl_map_object (dl-load.c:2234)

This is for the loading of shared objects, both data and code.

12.94% (167,854,080 bytes) here:

  mmap (syscall-template.S:82)
  pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
  _PR_CreateThread (ptthread.c:424)

This is for thread stacks. Robert O’Callahan and Chris Jones tell me that Linux thread stacks are quite large (8MB?) but although that space is mapped, most of it won’t be committed (i.e. actually used).

6.54% (84,828,160 bytes) here:

  mmap (syscall-template.S:82)
  JSC::ExecutablePool::systemAlloc(unsigned long) (ExecutableAllocatorPosix.cpp:43)
  JSC::ExecutablePool::create(unsigned long) (ExecutableAllocator.h:374)
  js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (ExecutableAllocator.h:)

This is for native code generated by JaegerMonkey. Bug 615199 has more about this.

5.17% (67,112,960 bytes) here:

  mmap (syscall-template.S:82)
  pa_shm_create_rw (in /usr/lib/libpulsecommon-0.9.21.so)
  pa_mempool_new (in /usr/lib/libpulsecommon-0.9.21.so)
  pa_context_new_with_proplist (in /usr/lib/libpulse.so.0.12.2)
  ??? (in /usr/lib/libcanberra-0.22/libcanberra-pulse.so)
  pulse_driver_open (in /usr/lib/libcanberra-0.22/libcanberra-pulse.so)
  ??? (in /usr/lib/libcanberra.so.0.2.1)
  ??? (in /usr/lib/libcanberra.so.0.2.1)
  ca_context_play_full (in /usr/lib/libcanberra.so.0.2.1)
  ca_context_play (in /usr/lib/libcanberra.so.0.2.1)
  nsSound::PlayEventSound(unsigned int) (nsSound.cpp:467)
  nsMenuPopupFrame::ShowPopup(int, int) (nsMenuPopupFrame.cpp:749)
  nsXULPopupManager::ShowPopupCallback(nsIContent*, nsMenuPopupFrame*, int, int) (nsXULPopupManager.cpp:709)
  nsXULPopupManager::FirePopupShowingEvent(nsIContent*, int, int) (nsXULPopupManager.cpp:1196)
  nsXULPopupShowingEvent::Run() (nsXULPopupManager.cpp:2196)
  nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:626)
  NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
  mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (MessagePump.cpp:110)
  MessageLoop::Run() (message_loop.cc:202)
  nsBaseAppShell::Run() (nsBaseAppShell.cpp:192)
  nsAppStartup::Run() (nsAppStartup.cpp:191)
  XRE_main (nsAppRunner.cpp:3691)
  main (nsBrowserApp.cpp:158)

This is sound-related stuff, which is a bit surprising, because the CAD website doesn’t produce any sound, as far as I can tell.

3.58% (46,505,328 bytes) here:

  js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (jsutil.h:213)
  js::mjit::Compiler::performCompilation(js::mjit::JITScript**) (Compiler.cpp:208)
  js::mjit::Compiler::compile() (Compiler.cpp:134)
  js::mjit::TryCompile(JSContext*, JSStackFrame*) (Compiler.cpp:245)
  js::mjit::stubs::UncachedCallHelper(js::VMFrame&, unsigned int, js::mjit::stubs::UncachedCallResult*) (InvokeHelpers.cpp:387)
  js::mjit::ic::Call(js::VMFrame&, js::mjit::ic::CallICInfo*) (MonoIC.cpp:831)

This is auxiliary info generated by JaegerMonkey. Bug 615199 again has more on this, and some good progress has been made towards reducing this.

2.42% (31,457,280 bytes) here:

  huge_malloc (jemalloc.c:4654)
  posix_memalign (jemalloc.c:4022)
  XPConnectGCChunkAllocator::doAlloc() (xpcjsruntime.cpp:1169)
  PickChunk(JSRuntime*) (jsgcchunk.h:68)
  RefillFinalizableFreeList(JSContext*, unsigned int) (jsgc.cpp:468)
  js_NewFunction(JSContext*, JSObject*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, unsigned int, JSObject*, JSAtom*) (jsgcinlines.h:127)

2.04% (26,472,576 bytes) here:

  js::PropertyTable::init(js::Shape*, JSContext*) (jsutil.h:213)
  JSObject::addPropertyInternal(JSContext*, long, int (*)(JSContext*, JSObject*, long, js::Value*), int (*)(JSContext*, JSObject*, long, js::Value*), unsigned int, unsigned int, unsigned int, int, js::Shape**) (jsscope.cpp:859)
  JSObject::putProperty(JSContext*, long, int (*)(JSContext*, JSObject*, long, js::Value*), int (*)(JSContext*, JSObject*, long, js::Value*), unsigned int, unsigned int, unsigned int, int) (jsscope.cpp:905)

Bug 610070 is open about this.

1.58% (20,484,096 bytes) here:

  moz_xmalloc (mozalloc.cpp:98)
  GCGraphBuilder::NoteScriptChild(unsigned int, void*) (mozalloc.h:229)

1.44% (18,698,240 bytes) here:

  JS_ArenaAllocate (jsutil.h:209)
  NewOrRecycledNode(JSTreeContext*) (jsparse.cpp:487)
  JSParseNode::create(JSParseNodeArity, JSTreeContext*) (jsparse.cpp:557)

1.38% (17,898,944 bytes) here:

  JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short) (jsutil.h:209)
  JSScript::NewScriptFromCG(JSContext*, JSCodeGenerator*) (jsscript.cpp:1171)
  js_EmitFunctionScript (jsemit.cpp:3767)
  js_EmitTree (jsemit.cpp:4629)

1.20% (15,560,704 bytes) here:

  sqlite3MemMalloc (sqlite3.c:13855)
  mallocWithAlarm (sqlite3.c:17333)

1.10% (14,315,520 bytes) here:

  JS_ArenaAllocate (jsutil.h:209)
  js::PropertyTree::newShape(JSContext*, bool) (jspropertytree.cpp:97)
  js::PropertyTree::getChild(JSContext*, js::Shape*, js::Shape const&) (jspropertytree.cpp:428)
  JSObject::getChildProperty(JSContext*, js::Shape*, js::Shape&) (jsscope.cpp:580)
  JSObject::addPropertyInternal(JSContext*, long, int (*)(JSContext*, JSObject*, long, js::Value*), int (*)(JSContext*, JSObject*, long, js::Value*), unsigned int, unsigned int, unsigned int, int, js::Shape**) (jsscope.cpp:829)
  JSObject::putProperty(JSContext*, long, int (*)(JSContext*, JSObject*, long, js::Value*), int (*)(JSContext*, JSObject*, long, js::Value*), unsigned int, unsigned int, unsigned int, int) (jsscope.cpp:905)

1.06% (13,791,232 bytes) here:

  mmap (syscall-template.S:82)
  g_mapped_file_new (in /lib/libglib-2.0.so.0.2400.1)
  ??? (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  ??? (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  ??? (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  ??? (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  ??? (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  gtk_icon_theme_lookup_icon (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  gtk_icon_theme_load_icon (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  gtk_icon_set_render_icon (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  gtk_widget_render_icon (in /usr/lib/libgtk-x11-2.0.so.0.2000.1)
  nsIconChannel::Init(nsIURI*) (nsIconChannel.cpp:497)
  nsIconProtocolHandler::NewChannel(nsIURI*, nsIChannel**) (nsIconProtocolHandler.cpp:115)
  nsIOService::NewChannelFromURI(nsIURI*, nsIChannel**) (nsIOService.cpp:609)
  NewImageChannel(nsIChannel**, nsIURI*, nsIURI*, nsIURI*, nsILoadGroup*, nsCString const&, unsigned int, nsIChannelPolicy*) (nsNetUtil.h:228)
  imgLoader::LoadImage(nsIURI*, nsIURI*, nsIURI*, nsILoadGroup*, imgIDecoderObserver*, nsISupports*, unsigned int, nsISupports*, imgIRequest*, nsIChannelPolicy*, imgIRequest**)(imgLoader.cpp:1621)
  nsContentUtils::LoadImage(nsIURI*, nsIDocument*, nsIPrincipal*, nsIURI*, imgIDecoderObserver*, int, imgIRequest**) (nsContentUtils.cpp:2550)
  nsCSSValue::Image::Image(nsIURI*, nsStringBuffer*, nsIURI*, nsIPrincipal*, nsIDocument*) (nsCSSValue.cpp:1290)
  nsCSSValue::StartImageLoad(nsIDocument*) const (nsCSSValue.cpp:549)
  nsCSSCompressedDataBlock::MapRuleInfoInto(nsRuleData*) const (nsCSSDataBlock.cpp:190)
  nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*, nsRuleData*, nsCSSStruct*) (nsRuleNode.cpp:2050)
  nsRuleNode::GetListData(nsStyleContext*) (nsRuleNode.cpp:1869)
  nsRuleNode::GetStyleData(nsStyleStructID, nsStyleContext*, int) (nsStyleStructList.h:81)
  nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*, nsRuleData*, nsCSSStruct*) (nsRuleNode.cpp:2141)
  nsRuleNode::GetListData(nsStyleContext*) (nsRuleNode.cpp:1869)
  nsRuleNode::GetStyleList(nsStyleContext*, int) (nsStyleStructList.h:81)
  nsImageBoxFrame::DidSetStyleContext(nsStyleContext*) (nsStyleStructList.h:81)
  nsFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) (nsFrame.cpp:369)
  nsLeafBoxFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) (nsLeafBoxFrame.cpp:98)
  nsImageBoxFrame::Init(nsIContent*, nsIFrame*, nsIFrame*) (nsImageBoxFrame.cpp:233)

I’m hoping that someone (or multiple people) who reads this will be able to identify places where memory consumption can be reduced. I’m happy to provide the raw data file to anyone who asks for it, though I also encourage people to try Massif themselves on different workloads.

Update. I forgot to mention that there was also the plugin-container process that ran in parallel to the main Firefox process. It was invoked by Firefox like so:

  /home/njn/moz/ws0/optg64/dist/bin/plugin-container \
    /home/njn/.mozilla/plugins/libflashplayer.so 22658 false plugin

It allocated 458,676,874 bytes at peak. 62.7% was due to the loading of shared objects, 12.8% was due to thread stacks, and the remaining 24.5% was due to unknown allocation sites (probably because the flash binary was missing symbols) and allocations too small to be worth singling out.

Firefox JägerMonkey Performance Tracemonkey

Multi-faceted JavaScript speed improvements

Post author By Nicholas Nethercote
Post date November 15, 2010

Firefox 4.0 beta 7’s release announcement was accompanied by the following graphs that show great improvements in JavaScript speed:

Impressive! The graphs claim speed-ups of 3x, 3x and 5x; by my calculations the more precise numbers are 3.49x, 2.94x and 5.24x.

The Sunspider and V8bench results are no surprise to anyone who knows about JägerMonkey and has been following AWFY, but the excellent Kraken results really surprised me. Why?

Sunspider and V8bench have been around for ages. They are the benchmarks most commonly used (for better or worse) to gauge JavaScript performance and so they have been the major drivers of performance improvements. To put it more bluntly, like all the other browser vendors, we tune for these benchmarks a lot. In contrast, Kraken was only released on September 14th, and so we’ve done very little tuning for it yet.
Unlike Sunspider and V8bench, Kraken contains a lot of computationally intensive code such as image and audio processing. These benchmarks are dominated by tight loops containing numerous array accesses. As a result, they trace really well, and so even 4b7 spends most of its Kraken time (I’d estimate 90%+) in code generated by TraceMonkey, the trace JIT.

We can draw two happy conclusions from Kraken’s improvement.

Our speed-ups apply widely, not just to Sunspider and V8bench.
Our future performance eggs are not all in one basket: the JavaScript team has made and will continue to make great improvements to the non-JägerMonkey parts of the JavaScript engine.

Firefox 4.0 is going to be great release!

Ubuntu

Upgrading Ubuntu (sigh)

I finally got around to upgrading my desktop from Ubuntu 9.10 to Ubuntu 10.04.

First of all, the bluetooth connection to my wireless Logitech keyboard broke. It’s always the first thing to break, so I did what I’ve done in the past: I repeated the “set up a new bluetooth device” steps about 30 times in a row, with minor variations, until it finally registered correctly. I may have rebooted once or twice along the way, I can’t remember now.

Second, my wired connection has been flaky, sometimes working and sometimes not. (Wireless has been fine, however.) Rebooting had fixed the problem until today. Eventually I found this thread which told me that I needed to not just reboot, but hard reboot: turn off my machine and disconnect it from all power sources for several minutes. Apparently this causes the network card’s firmware to be reloaded. Miraculously enough, this worked. Fingers crossed it’ll continue to work next time I reboot or restart.

Unfortunately, the machine’s speaker still doesn’t beep when someone pings me in Chatzilla. That stopped working when I upgraded to 9.10. Actually, the machine’s speaker stopped working at all after that upgrade, so I investigated and modified some config file so that it worked again with the ‘beep’ command, but it still didn’t work in Chatzilla. It’s the one time when I’d like any kind of sound to work in Linux (I can use my Mac laptop for watching videos and all that stuff).

Having used Ubuntu 8.10, 9.04, 9.10 and now 10.04, I’m glad 10.04 is an LTS release. It’ll be good not having to go through these steps every 6 months.

Bugzilla Gmail

Using Gmail filters to identify important Bugzilla mail

Post author By Nicholas Nethercote
Post date September 16, 2010
4 Comments on Using Gmail filters to identify important Bugzilla mail

Like most Mozilla developers, I get a lot of bugmail. Maybe 10% of that is important, e.g. bugs I filed, bugs I have to review patches for, etc. The other 90% of that is stuff I have a passing interest in.

I have a couple of Gmail filters that I use to separate these two streams of email. They’re non-obvious, so I promised Paul Biggar that I would blog about these so that he and others could do the same thing.

To catch interesting bugmail, on Gmail’s “Create a Filter” screen, in the “From:” field put:

bugzilla-daemon@mozilla.org

and in the “Has the words:” field put:

"review?(nnethercote" OR "you are the assignee" OR "you reported" OR "you are on the CC list" OR subject:"review granted" OR subject:"review requested" OR subject:"review canceled" OR subject:"feedback requested" OR subject:"feedback granted" OR subject:"feedback canceled"

To catch less interesting bugmail, on Gmail’s “Create a Filter” screen, in the “From:” field put:

bugzilla-daemon@mozilla.org

and in the “Doesn’t have:” field put:

("you are the assignee" OR "you reported" OR "you are on the CC list" OR subject:"review granted" OR subject:"review requested" OR subject:"review canceled" OR subject:"feedback requested" OR subject:"feedback granted" OR subject:"feedback canceled")

I’ve modified them a few times and they work very well for me. It’s possible there are some cases that they miscategorize but I haven’t seen that happen for a long time.

Update: In the first “Has the words:” field, you’ll obviously need to change nnethercote to something else.

Firefox

Reasons not to worry (part 2)

Post author By Nicholas Nethercote
Post date September 10, 2010
17 Comments on Reasons not to worry (part 2)

About a month ago I wrote about the negative reaction Firefox often gets on sites like Slashdot, Reddit and Metafilter, and how I found this reaction dispiriting.

The post received lots of interesting comments. Interestingly, there was a huge variety of reactions: suggestions for improving Firefox, explanations for how Firefox had gone wrong, etc, but there was certainly nothing like a consensus of opinion.

This got me thinking about why people would use each of the main five browsers. My overly short, tongue-in-cheek list looked like this:

Firefox: add-ons!
Chrome: speed! (and, for the moment, new shiny!)
IE: I’m a Windows user and I don’t know how to change my browser.
Safari: I’m a Mac user and I don’t know how to change my browser.
Opera: Hey, look how quirky I am!

More seriously, on a technical level the five main browsers are converging. When one of them implements a new compelling feature, the others will get something similar eventually. Firefox introduced the awesome bar, and now all the browsers track history in a sophisticated way in the address bar. Chrome pushed the envelope on JS speed, but once Firefox 4.0 and IE9 are released the gap will have mostly closed. And so on.

(An aside: Firefox 1.5 and 2.0 had bad memory behaviour, ie. lots of leaks. That was mostly fixed by Firefox 3.0, but the reputation has stuck, primarily through the word “bloat”. But “bloat” has various meanings, so any time a new feature is added to Firefox that someone thinks isn’t useful, they’ll cry “bloat!” even though that feature may not affect memory footprint at all. Cue Twain’s quote: “Give a man a reputation as an early riser, and he can sleep until noon.”)

So if you assume technical convergence (which isn’t entirely true, but it’s not so far off) then Firefox really is special, because it’s the only browser made by a non-profit organisation whose desire to create good software isn’t sullied by commercial interests. As a single example, consider Firefox Sync, which allows you to synchronize browser history, passwords, etc., between different machines. The Sync protocol is encrypted, so Mozilla can’t read it. Furthermore, if you don’t believe that, you can run your own Sync server. I don’t see Google implementing that in Chrome. (Indeed, although I have Chrome installed on my laptop I’ve barely used it because I’m uncomfortable wondering exactly what information it’s sending back to Google HQ. I already have a gmail account, they’ve got enough on me already without knowing my browsing history, thanks very much.)

In other words: It’s the mission, stupid!

I was reminded of this with my favourite comment on my earlier post, from Ryan:

The add-ons are nice, sure. But to me, Mozilla is about hard working, smart web-wonks undeterred by hairballs of code from netscape, miniscule market share vs. Microsoft or really, reality in general. That’s awesome – and worth celebrating.

In a similar vein, Phil Ringnalda on IRC pointed me at a blog post that ended with this quote:

I believe in keeping the web free and open. I believe in building a better Internet, and helping people take control. These ideas align with those of Mozilla, btw… and it’s one more reason I’m sticking with Firefox as my browser (and Mozilla) instead of abandoning it for Chrome or Safari, or another browser created by a for-profit company interested in controlling my browsing experience. Mozilla was there for us, they saved us from the big bad IE Monster, and helped keep the web open and free, and they’re still doing that.

Words to live by!

Go Programming

Another Go at language design

Post author By Nicholas Nethercote
Post date September 10, 2010
2 Comments on Another Go at language design

The other day I attended a very interesting talk at the University of Melbourne given by Rob Pike. The title was “Another Go at language design” and it was all about Google’s new programming language, Go. It was a high level overview of the language with an emphasis on why certain design decisions were made.

Here is a random selection of the things that struck me as most interesting.

Pike lamented the state of modern industrial languages, by which he meant C++ and Java. He started with a quote from Dick Gabriel “Old programs read like quiet conversations between a well-spoken research worker and a well-studied mechanical colleague, not as a debate with a compiler.” In particular, one of the aims of Go is to show programmers whose only exposure to static typing is through C++ and Java that statically typed languages can have the simpler, more concise feel (fewer declarations!) of dynamically typed languages like JavaScript and Python.
Compile times of Go programs are small. This is because the compiler doesn’t need to know about transitive dependencies between packages (their name for modules). For example, if you have a file A.go which depends on B.go which depends on C.go, at first you have to compile C.go, then B.go, then A.go. But all the information exported from A.go is stored in the compiled B.o file, which means that if you recompile A.go you don’t have to recompile anything else. Hmm, now I’m unsure if that’s exactly right. But the broader point is that the language avoids C++’s problem where bazillions of header files have to be read for every module. He said with a completely straight face that their goal was to have a 1,000,000x speed-up over C++ for the compilation time of large programs, though they’d probably be satisfied with 100,000x. Impressive! Imagine if Firefox compiled in less than a second.
Identifiers that start with an upper-case letter are public, and identifiers that start with a lower-case letter are private. The other language I’m familiar with that has a similar distinction is Haskell, where identifiers that start with an upper-case letter are used for types, and identifiers that start with a lower-case letter are used for values. The nice thing about Go’s approach is that it gives you strictly more information: you can determine from an identifier’s use point whether it’s public or private, which saves you from having to find it’s declaration.
There is no automatic conversion between numeric types. This is for simplicity; Pike said (probably exaggerating) that 1/3 of the C standard deals with this topic. But the handling of numeric constants avoids many cases where explicit conversions are needed, because numeric constants are platonic in the sense that they don’t have a particular type until they are assigned to a variable.
They have a reformatting program, gofmt, that rewrites Go code into an “approved” layout. This ensures that all Go code looks consistent, and avoids fights over style. (Robert O’Callahan would approve!) Interestingly, gofmt is separate from the compiler, and the idea is that you run it once you have finished a change. At Google when code is committed into a repository, gofmt is run and if the layout doesn’t match the code is rejected. I asked why they made it a separate program, rather than having stronger syntax/layout checking in of the compiler. He said that (a) they hadn’t thought of putting it in the compiler, (b) it seemed like it would be less painful to just run it occasionally rather than having to worry about it every time you compile, and (c) it would slow down the compiler.

It was a good talk, Pike is an engaging speaker. I’m glad he made the trip to Melbourne.

Firefox

Reasons not to worry

The higher-ups at Mozilla like to say that we should focus on building the best browser we can, and not to pay too much attention to our competitors. This is probably wise when you consider that we are competing against Microsoft, Apple and Google.

But I can’t help reading comment threads relating to browsers on sites like Slashdot, Reddit and Metafilter. Judging from these threads, few people love Firefox; a few say things like “I can’t switch because I couldn’t live without Adblock Plus/NoScript/etc.” Lots of people complain about Firefox being slow/bloated/a memory hog. Lots of people praise Google Chrome, mostly for its speed.

I realize these communities aren’t reflective of web users overall, but they do represent something of a leading edge. As do Linux users, which is why I am troubled by Ubuntu’s plan to use Chromium as their default browser. Also, IE9 looks like it will be a high-quality modern browser; the sleeping Microsoft giant has finally awoken.

It’s all rather depressing. So, gentle reader, please tell me why my perception is wrong. What am I overlooking?

Performance Tracemonkey Valgrind

cg_diff: a differential profiling tool

Post author By Nicholas Nethercote
Post date June 30, 2010
1 Comment on cg_diff: a differential profiling tool

I frequently use the SunSpider and V8 benchmark suites to compare the speed of different versions of TraceMonkey. The best metric for speed comparisons is always execution time. However, measuring execution time on modern machines is unreliable — you get different lots of variation between runs. This is a particular problem in this cae because the run-times of these benchmarks is very small — SunSpider takes less than 700 ms on my laptop, and V8 takes about 6.5 seconds. Run-to-run variations can be larger than the difference I’m trying to measure. This is annoying: the best speed metric cannot be measured exactly.

So I frequently use Cachegrind to measure the number of executed instructions. This is a worse metric than execution time — the number of instructions doesn’t directly relate to the execution time, although it’s usually a good indicator — but it has the advantage that it can be measured exactly. Most of the SunSpider and V8 tests are deterministic, and if I measure them twice in a row I’ll get the same result. Cachegrind also gives instruction counts on a per-function and per-line basis, which is very useful.

So I often run Cachegrind on two different versions of TraceMonkey: an unchanged copy of the current repository tip, and a copy of the current repository tip with a patch applied. I can then compare the results and get a very precise idea of how the patch affects performance.

However, comparing the output of two Cachegrind runs manually is a pain. For example, here is part of Cachegrind’s output (lightly edited for clarity) for crypto-md5.js with an unchanged repository tip (as of a day or two ago):

--------------------------------------------------------------------------------
 Ir
--------------------------------------------------------------------------------
48,923,280  PROGRAM TOTALS
--------------------------------------------------------------------------------
 Ir  file:function
--------------------------------------------------------------------------------
5,638,362  ???:???
4,746,990  /build/buildd/eglibc-2.10.1/string/../sysdeps/i386/i686/strcmp.S:strcmp
2,032,069  jstracer.cpp:js::TraceRecorder::determineSlotType(int*)
1,899,298  jstracer.cpp:bool js::VisitFrameSlots<js::CountSlotsVisitor>(...)
1,759,932  jstracer.cpp:js::TraceRecorder::checkForGlobalObjectReallocation()
1,232,425  jsops.cpp:js_Interpret
 885,168  jstracer.cpp:bool js::VisitFrameSlots<js::DetermineTypesVisitor>(...)
 871,197  jstracer.cpp:js::TraceRecorder::set(int*, nanojit::LIns*, bool)
 812,419  /build/buildd/eglibc-2.10.1/iconv/gconv_conf.c:insert_module
 758,034  jstracer.cpp:js::TraceRecorder::monitorRecording(JSOp)

At the top we have the total instruction count, and then we have the instruction counts for the top 10 functions. The ???:??? entry represents code generated by TraceMonkey’s JIT compiler, for which there is no debug information. “Ir” is short for “I-cache reads”, which is equivalent to “instructions executed”.

Cachegrind tracks a lot more than just instruction counts, but I’m only showing them here to keep things simple. It also gives per-line counts, but I’ve omitted them as well.

And here is the corresponding output when a patch from bug 575529 is applied:

--------------------------------------------------------------------------------
        Ir
--------------------------------------------------------------------------------
42,332,998  PROGRAM TOTALS
--------------------------------------------------------------------------------
       Ir  file:function
--------------------------------------------------------------------------------
4,746,990  /build/buildd/eglibc-2.10.1/string/../sysdeps/i386/i686/strcmp.S:strcmp
4,100,366  ???:???
1,687,434  jstracer.cpp:bool js::VisitFrameSlots(js::CountSlotsVisitor&, JSContext*, unsigned int, js::FrameRegsIter&, JSStackFrame*)
1,343,085  jstracer.cpp:js::TraceRecorder::checkForGlobalObjectReallocation()
1,229,853  jsops.cpp:js_Interpret
1,137,981  jstracer.cpp:js::TraceRecorder::determineSlotType(int*)
  868,855  jstracer.cpp:js::TraceRecorder::set(int*, nanojit::LIns*, bool)
  812,419  /build/buildd/eglibc-2.10.1/iconv/gconv_conf.c:insert_module
  755,753  jstracer.cpp:js::TraceRecorder::monitorRecording(JSOp)
  575,200  jsscan.cpp:js::TokenStream::getTokenInternal()

It’s easy to see that the total instruction count has dropped from 48.9M to 42.3M, but seeing the changes at a per-function level is more difficult. For a long time I would make this comparison manually by opening the two files side-by-side and reading carefully. Sometimes I’d also do some cutting-and-pasting to reorder entries. The whole process was tedious, but the information revealed is so useful that I did it anyway.

Then three months ago David Baron asked on Mozilla’s dev-platform mailing list if anybody knew of any good differential profiling tools. This prompted me to realise that I wanted exactly such a tool for Cachegrind. Furthermore, as Cachegrind’s author, I was in a good place to understand exactly what was necessary 🙂

The end result is a new script, cg_diff, that can be used to compute the difference between two Cachegrind output files. Here’s part of the difference between the above two versions:

--------------------------------------------------------------------------------
        Ir
--------------------------------------------------------------------------------
-6,590,282  PROGRAM TOTALS
--------------------------------------------------------------------------------
        Ir  file:function
--------------------------------------------------------------------------------
-1,537,996  ???:???
  -894,088  jstracer.cpp:js::TraceRecorder::determineSlotType(int*)
  -416,847  jstracer.cpp:js::TraceRecorder::checkForGlobalObjectReallocation()
  -405,271  jstracer.cpp:bool js::VisitFrameSlots(js::DetermineTypesVisitor&, JSContext*, unsigned int, js::FrameRegsIter&, JSStackFrame*)
  -246,047  nanojit/Containers.h:nanojit::StackFilter::read()
  -238,121  nanojit/Assembler.cpp:nanojit::Assembler::registerAlloc(nanojit::LIns*, int, int)
   230,419  nanojit/LIR.cpp:nanojit::interval::of(nanojit::LIns*)
  -226,070  nanojit/Assembler.cpp:nanojit::Assembler::asm_leave_trace(nanojit::LIns*)
  -211,864  jstracer.cpp:bool js::VisitFrameSlots(js::CountSlotsVisitor&, JSContext*, unsigned int, js::FrameRegsIter&, JSStackFrame*)
  -200,742  nanojit/Assembler.cpp:nanojit::Assembler::findRegFor(nanojit::LIns*, int)

This makes it really easy to see what’s changed. Negative values mean that the instruction count dropped, positive numbers mean that the instruction count increased.

I’ve been using this script for a while now, and it’s really helped me analyse the performance effects of my patches. Indeed, I have some scripts set up so that, with a single command, I can run all of SunSpider through two different versions of TraceMonkey and produce both normal profiles and difference profiles. I can also get high-level instruction comparisons such as the one in this Bugzilla comment.

And now everybody else can use cg_diff too, because I just landed it on the Valgrind trunk. If you want to try it, follow these instructions to setup a copy of the trunk. And note that if you want to compare two versions of a program that sit in different directories (as opposed to profiling a program, modifying it, then reprofiling it) you’ll need to use cg_diff’s –mod-filename option to get useful results. Feel free to ask me questions (via email, IRC or in the comments below) if you have troubles.

Happy differencing!