Author: Nicholas Nethercote

Community Service Announcement

Post author By Nicholas Nethercote
Post date March 24, 2011
11 Comments on Community Service Announcement

Prepending ‘@’ to somebody’s name does something useful in Twitter. Outside of Twitter, it’s extra typing and looks silly.

MemShrink

Memory consumption is really important in a web browser. Firefox has some room for improvement on that front, and so Jeff Muizelaar and I are working to start up an effort, called “MemShrink”, to reduce memory consumption in Firefox 5 (and beyond).

We’ve started a wiki page outlining some ideas on ways to improve our tracking of memory consumption. Please read it and comment.

I’ve also opened bug 640452, which is a tracking bug for memory leaks in Firefox 5, and bug 640457, which is a tracking bug for other memory improvements in Firefox 5. Please CC yourself if you’re interested.

Update: I just added bug 640791, which is a tracking bug for improvements to memory profiling.

Mercurial

Mercurial is getting slower

Post author By Nicholas Nethercote
Post date March 10, 2011
7 Comments on Mercurial is getting slower

I swear Mercurial is getting slower. I regularly wait 10 seconds or more for basic operations like ‘hg diff’, ‘hg revert –all’, ‘hg qref’ (and don’t ask about ‘hg up’). I haven’t done any real measurements but I’m sure it didn’t used to be like that.

I wonder if it’s because (a) we have more revisions in the tree (more than twice as many as when I started working for Mozilla two years ago), or (b) we have more code in the repository, or (c) something else. I’m pretty sure it’s not (d) something wrong with my machine, because I’ve done disk tests that indicate it’s working fine.

Correctness Cplusplus Firefox

Limits of reliability

Julian Seward asked me an interesting question a while ago: “what are the factors that limit Firefox’s reliability?” (You can use “crash rate” as a reasonable definition of “reliability”.)

He suggested two things:

Firefox depends on external code, such as plug-ins.
Many crashes are hard to reproduce and so don’t get fixed.

For the first, Electrolysis (a.k.a. process separation) is on track to pretty much make it a non-problem. It’s already in place for Flash, and will eventually be for other plug-ins. So that’s good.

For the second, I see two main sub-factors.

Firefox is implemented in C++ which is prone to memory-related bugs and data races, both of which can make crash reproduction difficult. Using a safer language like Rust would make many (all?) of these bugs impossible. Unfortunately, Rust isn’t production-ready, and rewriting even parts of the browser is a huge undertaking. So we better get started ASAP 🙂
Second, Firefox has some nasty low-level code like the garbage collector; bugs in it be very difficult to reproduce. I don’t see an obvious way to improve this other than the usual: testing, code review, using simple algorithms, etc.

about:memory Firefox Massif Memory consumption

A vision for better memory profiling with about:memory

Post author By Nicholas Nethercote
Post date February 9, 2011
4 Comments on A vision for better memory profiling with about:memory

I’ve been doing lots of memory profiling of Firefox. I’ve mostly used Massif and about:memory for this. Both have their uses, but also big problems. I have some ideas for combining their best features into a killer profiling tool.

Massif’s pros and cons

I’ve mostly used Massif. It’s both invaluable and a total pain. More specifically, it has the following advantages:

Time series. It shows how memory consumption changes over time, both graphically (for total allocations) and with periodic detailed snapshots.
Level of detail. Each detailed snapshot consists of an allocation tree that shows which parts of the code are responsible for every single byte allocated.

But it has lots of disadvantages.

Slow. There’s somewhere between a 10x and 100x slowdown; that’s a rough estimate, I haven’t actually measured it.
It’s implemented with Valgrind, so it doesn’t work on Windows.
It’s not easy to use. The command-line needed to get good results with Firefox is huge.
There’s little control over when snapshots occur. That could be improved relatively easily with client requests, though they require code modifications so they’re a bit painful.
The superblock allocation problem. You can profile with Massif at the OS page level, or at the heap (malloc) level. In both cases, the results can be misleading because an allocation request doesn’t always result in a visible allocation occurring. For example, if profiling at the OS page level, most calls to malloc won’t result in pages being allocated with mmap, because when jemalloc needs pages from the OS it will usually request more than necessary for the current request, and then hand out pieces of those pages itself on subsequent calls to malloc. If profiling at the heap level, this problem also occurs with custom allocators such as JSArenaPool that are layered on top of malloc. As a result, many allocation requests don’t get recorded by Massif, and a small number of them are blamed for allocating much more memory than they actually did. (The whole question of “what level do I measure at?” is one of the trickiest things about memory profiling.)

The result is that using Massif and understanding its output is difficult for anyone who isn’t an expert.

about:memory’s pros and cons

The other tool I sometimes use is about:memory. It has the following advantages:

Lightweight, trivial to view.
Full control over when measurements occur.

And the disadvantages:

Minimal data, mostly just the heap.
The values are not as easy to understand as they seem.
No time series.

A better about:memory

I want the best of both worlds: time series, detailed measurements, control when measurements occur, ease-of-use. Plus a bit more: both OS page and heap level measurements, accurate numbers, global and per-compartment measurements, and good visualizations of both snapshots and time series data. Basically, I want to re-implement a decent chunk of Massif within about:memory. What follows is a 4am braindump, apologies for the density, hopefully it’s mostly understandable.

Merge about:memory and Shaver’s nascent about:compartments, because they’re really doing the same thing.

More global measurements:

Add current and peak total memory and RSS (resident set size), as measured by the system (eg. /proc/pid/status on Linux).
Keep malloc stats as they currently are (eg. malloc/allocated).
More individual counts. Eg. for JS we currently have js/gc-heap, js/string-data, js/mjit-code. I want to split js/mjit-code into several counts: JaegerMonkey code (inline, out-of-line, ICs) and JaegerMonkey data (JITScripts and IC data). And also add counts for: TraceMonkey code and data, Shapes, JSFunctions, JSScripts, cx->tempAlloc (which holds JSParseNodes plus other things), JSFunctions, js::Vector, js::HashTable… basically anything that shows up reasonably high in Massif profiles. We currently have 19 individual counts, I imagine having 50+. Obviously there’ll be lots of non-JS counters added as well.
Allow individual counts to be shown in a standard order, or in order from biggest to smallest.
For a lot of these, both the number of bytes allocated and the number of allocations might be useful.

Per-compartment measurements: show all the individual counts, but on a per-compartment basis.

Clearer meanings of measurements:

Add links/tooltips/whatever to every measurement with a brief description so that it’s clear what it means.
For individual counts, clearly distinguish heap allocations from non-heap allocations (eg. executable code space allocated by the JITs).
Have “everything else” measurements for both the heap and the total, found by subtracting individual counts from the overall and heap totals.

Visualization of measurements. E.g.:

Show the heap as proportion of total memory.
Show each individual global count as a proportion of the total.
Show each compartment’s individual count sum as a proportion of the global individual count sum.
All this via pie charts or similar.

Time series:

Allow about:memory snapshots to be dumped to file in some manner, probably as JSON.
Do it in a way that allows multiple snapshots to be processed easily by an external tool.
Basically, Sayre’s membuster on steroids.
Furthermore, make that data available to the browser itself as well.

Visualization of time series data:

Use the time series data to show a graph in about:memory.
Make the graph interactive, eg. allow drilling down to individual counts, going back to previous snapshots.
I have a hacky prototype canvas implementation of something like this that reads Massif’s output files. SVG would probably be better, though.

Diffs of some kind between snapshots would be great. It would allow you to answer questions like “how much memory is allocated when I open a new tab?”

If telemetry is ever implemented, sending this data back from users would be great, though that’s a lot harder.

Potential difficulties:

Not that many, as far as I can tell. A lot of the infrastructure is already in place.
Keeping the counter code up-to-date may be tricky. If it’s used frequently by many people, that’ll increase the likelihood that it’ll be kept up to date. Better descriptions will help make it clearer if the counters are counting what they’re supposed to.
about:memory will itself use some memory. It’s unclear how to avoid measuring that. Maybe putting the more advanced features like the graphical stuff in a separate about:morememory page might mitigate this; you could take a bunch of snapshots via about:memory and then open up about:morememory.
Performance… will more counters be noticeable? Hopefully not since we already have a bunch of them anyway.
Unsophisticated users might file unhelpful “hey, Firefox is using too much memory”. But sophisticated users might file helpful “hey, Firefox is using too much memory” bugs.

Basically, if this can be made as attractive and useful in reality as it currently is in my imagination, I figure no-one will ever need to use an external memory profiler for Firefox again.

Uncategorized

A note about umlauts

Umlauts appear on three German letters: ä, ö, ü. If typing non-English characters is difficult, you can substitute ‘ae’, ‘oe’, ‘ue’ respectively.

If you can’t type “JägerMonkey”, you should type “JaegerMonkey”. “JagerMonkey” is wrong.

Uncategorized

Memory profiling Firefox on TechCrunch

Post author By Nicholas Nethercote
Post date February 1, 2011

Rob Sayre suggested TechCrunch to me as a good stress test for Firefox’s memory usage:

  {sayrer} take a look at techcrunch.com if you have a chance. that one is brutal.

So I measured space usage with Massif for a single TechCrunch tab, on 64-bit Linux. Here’s the high-level result:

  34.65% (371,376,128B) _dl_map_object_from_fd (dl-load.c:1199)
  21.14% (226,603,008B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
  08.93% (95,748,276B) in 4043 places, all below massif's threshold (00.20%)
  06.26% (67,112,960B) pa_shm_create_rw (in /usr/lib/libpulsecommon-0.9.21.so)
  03.10% (33,263,616B) JSC::ExecutablePool::systemAlloc(unsigned long) (ExecutableAllocatorPosix.cpp:43)
  02.67% (28,618,752B) NewOrRecycledNode(JSTreeContext*) (jsparse.cpp:670)
  01.90% (20,414,464B) js::PropertyTree::newShape(JSContext*, bool) (jspropertytree.cpp:97)
  01.57% (16,777,216B) GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*) (nsCycleCollector.cpp:596)
  01.48% (15,841,208B) JSScript::NewScript(JSContext*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned short, unsigned short, JSVersion) (jsutil.h:210)
  01.45% (15,504,752B) ChangeTable (pldhash.c:563)
  01.44% (15,478,784B) g_mapped_file_new (in /lib/libglib-2.0.so.0.2400.1)
  01.41% (15,167,488B) GCGraphBuilder::NoteScriptChild(unsigned int, void*) (mozalloc.h:229)
  01.37% (14,680,064B) js_NewFunction(JSContext*, JSObject*, int (*)(JSContext*, unsigned int, js::Value*), unsigned int, unsigned int, JSObject*, JSAtom*) (jsgcinlines.h:127)
  00.97% (10,383,040B) js::mjit::Compiler::finishThisUp(js::mjit::JITScript**) (jsutil.h:214)
  00.78% (8,388,608B) js::StackSpace::init() (jscntxt.cpp:164)
  00.69% (7,360,512B) pcache1Alloc (sqlite3.c:33491)
  00.62% (6,601,324B) PL_DHashTableInit (pldhash.c:268)
  00.59% (6,291,456B) js_NewStringCopyN(JSContext*, unsigned short const*, unsigned long) (jsgcinlines.h:127)
  00.59% (6,287,516B) nsTArray_base::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:88)
  00.52% (5,589,468B) gfxImageSurface::gfxImageSurface(gfxIntSize const&, gfxASurface::gfxImageFormat) (gfxImageSurface.cpp:111)
  00.49% (5,292,184B) js::Vector::growStorageBy(unsigned long) (jsutil.h:218)
  00.49% (5,283,840B) nsHTTPCompressConv::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned int, unsigned int) (nsMemory.h:68)
  00.49% (5,255,168B) js::Parser::markFunArgs(JSFunctionBox*) (jsutil.h:210)
  00.49% (5,221,320B) nsStringBuffer::Alloc(unsigned long) (nsSubstring.cpp:209)
  00.43% (4,558,848B) _dl_map_object_from_fd (dl-load.c:1250)
  00.42% (4,554,740B) nsTArray_base::EnsureCapacity(unsigned int, unsigned int) (nsTArray.h:84)
  00.39% (4,194,304B) js_NewGCString(JSContext*) (jsgcinlines.h:127)
  00.39% (4,194,304B) js::NewCallObject(JSContext*, js::Bindings*, JSObject&, JSObject*) (jsgcinlines.h:127)
  00.39% (4,194,304B) js::NewNativeClassInstance(JSContext*, js::Class*, JSObject*, JSObject*) (jsgcinlines.h:127)
  00.39% (4,194,304B) JS_NewObject (jsgcinlines.h:127)
  00.35% (3,770,972B) js::PropertyTable::init(JSRuntime*, js::Shape*) (jsutil.h:214)
  00.35% (3,743,744B) NS_NewStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*, nsPresContext*) (nsPresContext.h:306)
  00.34% (3,621,704B) XPT_ArenaMalloc (xpt_arena.c:221)
  00.31% (3,346,532B) nsCSSSelectorList::AddSelector(unsigned short) (mozalloc.h:229)
  00.30% (3,227,648B) js::InitJIT(js::TraceMonitor*) (jsutil.h:210)
  00.30% (3,207,168B) js::InitJIT(js::TraceMonitor*) (jsutil.h:210)
  00.30% (3,166,208B) js_alloc_temp_space(void*, unsigned long) (jsatom.cpp:689)
  00.28% (2,987,548B) nsCSSExpandedDataBlock::Compress(nsCSSCompressedDataBlock**, nsCSSCompressedDataBlock**) (mozalloc.h:229)
  00.27% (2,883,584B) js::detail::HashTable::SetOps, js::SystemAllocPolicy>::add(js::detail::HashTable::SetOps, js::SystemAllocPolicy>::AddPtr&, unsigned long const&) (jsutil.h:210)
  00.26% (2,752,512B) FT_Stream_Open (in /usr/lib/libfreetype.so.6.3.22)
  00.24% (2,564,096B) PresShell::AllocateFrame(nsQueryFrame::FrameIID, unsigned long) (nsPresShell.cpp:2098)
  00.21% (2,236,416B) nsRecyclingAllocator::Malloc(unsigned long, int) (nsRecyclingAllocator.cpp:170)

Total memory usage at peak was 1,071,940,088 bytes. Lets go through some of these entries one by one.

  34.65% (371,376,128B) _dl_map_object_from_fd (dl-load.c:1199)
  21.14% (226,603,008B) pthread_create@@GLIBC_2.2.5 (allocatestack.c:483)
  06.26% (67,112,960B) pa_shm_create_rw (in /usr/lib/libpulsecommon-0.9.21.so)

These three, although the biggest single entries, can be more or less ignored; I explained why previously.

  03.10% (33,263,616B) JSC::ExecutablePool::systemAlloc() (ExecutableAllocatorPosix.cpp:43)

This is for code generated by JaegerMonkey. I know very little about JaegerMonkey’s code generation so I don’t have any good suggestions for reducing it. As I understand it very little effort has been made to minimize the size of the generated code so there may well be some easy wins there.

  02.67% (28,618,752B) NewOrRecycledNode() (jsparse.cpp:670)

This is for JSParseNode, the basic type from which JS parse trees are constructed. Bug 626932 is open to shrink JSParseNode; there are a couple of good ideas but not much progress has been made. I hope to do more here but probably not in time for Firefox 4.0.

  01.90% (20,414,464B) js::PropertyTree::newShape() (jspropertytree.cpp:97)

Shapes are a structure used to speed up JS property accesses. Increasing the MAX_HEIGHT constant from 64 to 128 (which reduces the number of JS objects that are converted to “dictionary mode”, and thus the number of Shapes that are allocated) may reduce this by 3 or 4 MB with negligible speed cost. I opened bug 630456 for this.

  01.57% (16,777,216B) GCGraphBuilder::AddNode() (nsCycleCollector.cpp:596)
  01.41% (15,167,488B) GCGraphBuilder::NoteScriptChild() (mozalloc.h:229)

This is the cycle collector. I know almost nothing about it, but I see it allocates 32,768 PtrInfo structs at a time. I wonder if that strategy could be improved.

  01.48% (15,841,208B) JSScript::NewScript() (jsutil.h:210)
  01.37% (14,680,064B) js_NewFunction() (jsgcinlines.h:127)

Each JS function has a JSFunction associated with it, and each JSFunction has a JSScript associated with it. Each of them stores various bits of information about the function. I don’t have any good ideas for how to shrink these structures. Both of them are reasonably large, with lots of fields.

  00.97% (10,383,040B) js::mjit::Compiler::finishThisUp() (jsutil.h:214)

Each function compiled by JaegerMonkey also has some additional information associated with it, including all the inline caches. This is allocated here. Some good progress has already been made here, and I have some more ideas for getting it down a bit further.

  00.59% (6,291,456B) js_NewStringCopyN() (jsgcinlines.h:127)
  00.49% (5,292,184B) js::Vector::growStorageBy() (jsutil.h:218)

These entries are for space used during JS scanning (a.k.a. lexing, tokenizing). Identifiers and strings get atomized, i.e. put into a table so there’s a single copy of each one. Take an identifier as an example. It starts off stored in a buffer of characters. It gets scanned and copied into a js::Vector, with any escaped chars being converted along the way. Then the copy in the js::Vector is atomized, which involves copying it again into a malloc’d buffer of just the right size. I thought about avoiding this copying in bug 588648, but it turned out to be difficult. (I did manage to remove another extra copy of every character, though!)

In summary, there is definitely room for more improvement. I hope to get a few more space optimizations in before Firefox 4.0 is released, but there’ll be plenty of other work to do afterwards. If anyone can see other easy wins for the entries above, I’d love to hear about them.

Cplusplus Firefox

The dangers of `-fno-exceptions`

Post author By Nicholas Nethercote
Post date January 18, 2011
17 Comments on The dangers of -fno-exceptions

When Firefox is built with GCC, the -fno-exceptions option is used, which means that exception-handling is disabled. I’ve been told that this is because the performance of code that uses exceptions is unacceptable.

Sounds simple, until you realize that libraries such as libstdc++.so are not built with this option. This means, for example, that the vanilla operator new will throw an exception if it fails, because it’s in libstdc++.so, but Firefox code cannot catch the exception, because -fno-exceptions is specified. (If you write a try-block, GCC will give you an error.)

This has important consequences: if you compile your application with -fno-exceptions, you cannot use any standard library functions that might throw exceptions. SpiderMonkey’s C++ coding standard is succinct, perhaps overly so: “No exceptions, so std is hard to use.”

Another fine example of the “so you think you’ll be able to use a subset of C++, eh?” fallacy. See bug 624878 for a specific manifestation of this problem. I wonder if there are others case like that in Firefox.

Correctness Firefox

My best patch of 2010

I was going to write one of those “everything I did last year” posts, but now I don’t feel like it. Here’s a “one thing I did last year” post instead.

The most important patch I landed in 2010 was probably the one that added LIR type-checking to Nanojit (which was based on an earlier patch from Julian Seward). At the time of writing, it’s caught at least 14 type errors, most of which could have caused a crash or security problem.

Consistency checks within compilers are wonderful things.

Programming Valgrind

Using Valgrind to get stack traces

Post author By Nicholas Nethercote
Post date January 11, 2011
10 Comments on Using Valgrind to get stack traces

Sometimes I want to do some printf-style debugging where I print not only some values, but also the stack trace each time a particular code point is hit. GNU provides a backtrace() function that supposedly does this, but I tried it and got hopeless results, little more than code addresses.

Fortunately, you can do this pretty easily with Valgrind. First, add this line somewhere in your source code:

  #include <valgrind/valgrind.h>

Then, at the point where you want to print the stack trace, add this:

  VALGRIND_PRINTF_BACKTRACE("foo");

You can of course print something other than “foo”. In fact, VALGRIND_PRINTF_BACKTRACE is a variadic printf-style function, so you can do stuff like this:

  VALGRIND_PRINTF_BACKTRACE("%s: %d\n", str, i);

You then have to run the program under Valgrind as usual, except you probably should use --tool=none because that’ll run the quickest.

This is a trick I find occasionally invaluable.