OS X and virtual “bloat”

There’s a lot of work going on these days to improve Mozilla’s memory usage, and it’s a complicated issue with different facets. When discussing this with users, one thing that sometimes comes up is the difference between a process’s working set, and its total virtual memory size. To simplify grossly, the working set is often the more important number, as it’s the amount of physical memory actually being used. A process could have gigabytes of virtual memory assigned to it without any measurable performance impact to the system, as long as the working set stays small. I’m skimming over a lot of details, but the point is that a large virtual memory size may or may not be a practical problem.

I’ve noticed that on OS X, in particular, the amount of virtual memory a process is using seems to be a rather strange value. Here’s a few lines of output from the “top” command on my MacBook. Note the rightmost VSIZE column (total address space allocated) and the RSIZE column (the resident size, or working set) next to it. You can also use the OS X “Activity Monitor” tool, which reports the same numbers as “Real Memory” and “Virtual Memory”.

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
  436 bash         0.0%  0:00.00   1    14    16   212K   832K   784K  27.1M
  425 firefox-bin  0.7%  5:32.43  12   151   758  98.9M  59.0M   125M   542M 
  349 Terminal     1.9%  0:48.52   6    94   150  2.39M  17.4M  15.0M+  370M 
  230 Colloquy     0.0%  5:57.88   7   151   952  46.5M  23.6M  53.1M   422M 
  228 iCal         0.0%  0:14.24   5   121   332  19.2M  15.2M  28.4M   383M 
  196 ntpd         0.0%  0:00.07   1     8    19  68.0K   708K   236K  27.1M

Gosh, there’s Firefox with 542MB of virtual memory. I’ve been browsing a while with lots of tabs, so maybe I shouldn’t expect it to be tiny. Then again, starting it with a blank page results in just a 39MB RSIZE, but VSIZE is still over 540MB. Look at iCal and Colloquy (an IRC client), which both weigh in around 400MB… Hmm, that seems like a lot. Quite a few other processes are also in the 350MB ballpark; in fact, top reports a total of over 10GB of virtual memory assigned on my system. And, hmmmmmm, even standard Unix programs like bash and ntpd are grabbing 27MB of VM — what’s going on?

OS X has a nifty little utility called vmmap that lets you see exactly what’s consuming address space in a process. The full output is rather verbose, but it has a summary too:

==== Summary for process 436
ReadOnly portion of Libraries: Total=2960KB resident=2684KB(91%) swapped_out_or_unallocated=276KB(9%)
Writable regions: Total=26816KB written=76KB(0%) resident=220KB(1%) swapped_out=0KB(0%) unallocated=26596KB(99%)

REGION TYPE             [ VIRTUAL]
===========             [ =======]
MALLOC                  [  18536K]
Stack                   [   8192K]
__DATA                  [    188K]
__IMPORT                [     24K]
__LINKEDIT              [    500K]
__PAGEZERO              [      4K]
__TEXT                  [   2460K]

That’s the summary for the “27MB” bash process. It looks like 8MB is reserved for the stack, 18.5MB is reserved for the “DefaultMallocZone”, and about 2.5MB (__TEXT) is code and static data. [The full listing shows that the bash code is only about 500K, the rest of the 2.5MB is all system libraries.] Another nifty OS X utility, heap, confirms that only 85K of that 18.5MB malloc area is actually being used. So, the conclusion here is that most of the alarming 27MB of bash‘s VM size is just unused address space (which is dirt cheap) and default system stuff. The amount of memory usage directly attributable to bash is really quite small. Smaller, in fact, than the 784K working set top reports.

So, now the $542,000,000 question… What’s up with Mozilla’s virtual memory size? (after the jump, to avoid annoying planet.mozilla.org readers!)

vmmap dumps out over a thousand lines of data for the firefox-bin process like:

...
__TEXT                 931d4000-931e0000 [   48K] r-x/r-x SM=COW  /System/Library/Frameworks/OpenGL.framework/Versions/A/OpenGL
__LINKEDIT             931e0000-931e4000 [   16K] r--/r-- SM=COW  /System/Library/Frameworks/OpenGL.framework/Versions/A/OpenGL
__TEXT                 9326f000-93270000 [    4K] r-x/r-x SM=COW  /System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa
__LINKEDIT             93270000-93271000 [    4K] r--/r-- SM=COW  /System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa
__TEXT                 93271000-93928000 [ 6876K] r-x/r-x SM=COW  /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
__IMAGE                93928000-93a38000 [ 1088K] r--/r-- SM=COW  /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
__LINKEDIT             93a38000-93ca8000 [ 2496K] r--/r-- SM=COW  /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
__TEXT                 93ca8000-93d24000 [  496K] r-x/r-x SM=COW  /System/Library/Frameworks/CoreData.framework/Versions/A/CoreData
__LINKEDIT             93d24000-93d5c000 [  224K] r--/r-- SM=COW  /System/Library/Frameworks/CoreData.framework/Versions/A/CoreData
__TEXT                 93d5c000-93e17000 [  748K] r-x/r-x SM=COW  /System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox
__LINKEDIT             93e17000-93e59000 [  264K] r--/r-- SM=COW  /System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox
...

Again, here’s just the summary section: :-)

ReadOnly portion of Libraries: Total=108092KB resident=60900KB(56%) swapped_out_or_unallocated=47192KB(44%)
Writable regions: Total=455848KB written=48836KB(11%) resident=109256KB(24%) swapped_out=0KB(0%) unallocated=346592KB(76%)

REGION TYPE             [ VIRTUAL]
===========             [ =======]
ATS (font support)      [  34796K]
Carbon                  [   1148K]
CoreGraphics            [  12384K]
IOKit                   [ 262144K]
MALLOC                  [ 104512K]
STACK GUARD             [     40K]
Stack                   [  13312K]
VM_ALLOCATE ?           [  10388K]
__DATA                  [   6484K]
__IMAGE                 [   1088K]
__IMPORT                [    504K]
__LINKEDIT              [  17520K]
__OBJC                  [    884K]
__PAGEZERO              [      4K]
__TEXT                  [  90572K]
mapped file             [  49904K]
shared memory           [  16908K]

The full output’s breakout shows a few specific large items (a combined 72MB):
* 9.6MB /usr/share/icu/icudt32l.dat
* 5.2MB Flash Player
* 10.2MB AppKit Framework
* 18.0 MB QuickTime
* 29MB of various memory-mapped font files [in addition to the "ATS (font support)" item in the summary]

The heap utility reports that 29% (17MB) of the 96MB of malloc space is unused.

I think the most surprising thing about these numbers is just how much VM space is consumed by system stuff: 256MB (!) for IOKit, 64MB for fonts, and 13MB for Carbon/CoreGraphics. [A peek at the vmmap for Colloquy, iCal, and Terminal show basically the same thing.] That, plus the unused heap space accounts for 66% of the 534MB address space. [I'm really curious what the 256MB IOKit item is. Maybe graphics memory mapped into the process? Anyone know?]

So, what do the numbers say about performance and bloat? Well, that’s hard to say. But I hope it is clearer that there’s more to a program’s memory usage than just a single, simple number.

About Justin Dolske

Mostly harmless.
This entry was posted in Firefox, PlanetMozilla. Bookmark the permalink.

7 Responses to OS X and virtual “bloat”

  1. fredrik says:

    Tautology alert: “a large virtual memory size may or may be a problem”.

    [fixed, thanks :-) --dolske]

  2. Hoa says:

    Hello,

    1. For malloc memory:
    On Mac OS X, you can use MallocDebug:
    /Developer/Applications/Performance\ Tools/MallocDebug.app

    You have to preload libMallocDebug when you are running your application:
    export DYLD_INSERT_LIBRARIES=/usr/lib/libMallocDebug.A.dylib

    MallocDebug can show where the malloc allocation were initiated. So that you can know where most of malloc memory is going.

    2. For other memory:

    $ vmmap -resident
    REGION TYPE [ VIRTUAL/RESIDENT]
    =========== [ =======/========]
    ATS (font support) [ 33856K/ 1076K]
    Carbon [ 1144K/ 1144K]
    CoreGraphics [ 5016K/ 3232K]
    IOKit [ 262144K/ 0K]
    MALLOC [ 33884K/ 21944K]
    STACK GUARD [ 48K/ 0K]
    Stack [ 14336K/ 152K]
    VM_ALLOCATE ? [ 2028K/ 608K]
    __DATA [ 5300K/ 3244K]
    __IMAGE [ 1088K/ 480K]
    __IMPORT [ 500K/ 500K]
    __LINKEDIT [ 18328K/ 18028K]
    __OBJC [ 864K/ 848K]
    __PAGEZERO [ 4K/ 0K]
    __TEXT [ 83044K/ 47560K]
    mapped file [ 13476K/ 7700K]
    shared memory [ 16376K/ 128K]

    in short, that shows that IOKit allocation is only address space allocation and it seems that it is never mapped into real memory. So that we can ignore IOKit allocation.

    It looks like your output has much mapped file address space:
    You can know which files are mapped using the following:

    $ vmmap -w -resident | grep mapped
    mapped file 00fcd000-00fe1000 [ 80K/ 76K] r–/rwx SM=COW /System/Library/CoreServices/CharacterSets/CFUnicodeData-L.mapping
    mapped file 0185d000-018b4000 [ 348K/ 348K] r–/rwx SM=COW /System/Library/CoreServices/CharacterSets/CFCharacterSetBitmaps.bitmap
    mapped file 12808000-1280d000 [ 20K/ 8K] r–/rwx SM=COW /System

  3. Håkan Waara says:

    IIRC we link IOKit into widget just for something silly like the idle service. If it’s costing us a lot, maybe we should reconsider?

    Shouldn’t most of these frameworks (which by and large are just collections of shared libraries) be in shared memory anyway? Or is this the allocation of private memory every user of these shared libraries need to put up with?

    /H

  4. Hao:

    Yikes, I can’t believe I overlooked the “-resident” flag for vmmap! Very useful. It does seem to confirm my suspicion that most of the biggest VM regions are not being used. Basically what one would expect, but now with numbers to prove it.

    I deliberately avoided getting unto examining malloc usage; that’s a topic for some other blog post! Rather, the interesting data point here is how much *isn’t* being used. I’ve been planning to poke around with libumem on Solaris, and it would be interesting to compare its capabilities with libMallocDebug and leaks(1)… in a different blog post. :-)

    Håkan:

    Yeah, IOKit is what originally motived me to post this, as it looked like an interesting example of how “cost” can mean different things in different contexts. It felt unlikely to be a serious problem (I think we would have noticed if OS X was thrashing around an extra 256MB, compared to Linux/Windows), not to mention the “vmmap -resident” data. But still, it does have some sort of cost, as measured by the impression users get if they look at the VM size and think Firefox is using gobs of memory.

    Eliminating or pruning IOKit would reduce that “cost”… But it’s sort of bullshit performance tuning (because the only effect is in someone’s head, and I think we’re far more interested in real gains). Then again, a big reduction in a fairly visible and confusing number like VSIZE might we worth taking if it was easy. I’d first want to understand it better.

    Finally (phew!), yes, most of the framework and system stuff should be shared data, and so the real cost of having it in physical memory is spread across all the processes sharing it. Yet another reason why a single number like VSIZE doesn’t tell the whole story!

  5. Jeff Walden says:

    Wait, you put an after-the-break link to *not* annoy planet.mozilla.org readers? Thanks for forcing me to break out of my feed reader! :-P

  6. Brendan Eich says:

    When I asked about IOKit, hyatt pointed me at

    http://www.usenix.org/events/bsdcon02/full_papers/gerbarg/gerbarg_html/index.html

    See section 6.

    Large VM mappings are not free, they take up page table space. Probably not enough to worry about, but it depends on the OS.

    /be

  7. Hoa says:

    After I played a long time with mallocdebug/leaks and Mac OS X debugging stuff, I can say that those are really good tools.
    – leaks can operate at runtime (when using MallocStackLogging, you can get the stack trace of allocation)
    – the same for mallocdebug and it can tell you what memory blocks have been added between two moments.
    – you can also have a look at objectalloc which is also useful for malloc besides being useful for objective-C object allocations. This may be a better user interface than mallocdebug

Comments are closed.