What does the OS X Activity Monitor’s “Energy Impact” actually measure?

[Update: this post has been updated with significant new information. Look to the end.]

Activity Monitor is a tool in Mac OS X that shows a variety of real-time process measurements. It is well-known and its “Energy Impact” measure (which was added in Mac OS X 10.9) is often consulted by users to compare the power consumption of different programs. Apple support documentation specifically recommends it for troubleshooting battery life problems, as do countless articles on the web.

However, despite its prominence, the exact meaning of the “Energy Impact” measure is unclear. In this blog post I use a combination of code inspection, measurements, and educated guesses to hypothesize how it is computed in Mac OS X 10.9 and 10.10.

What is known about “Energy Impact”?

The following screenshot shows the Activity Monitor’s “Energy” tab.

There are no units given for “Energy Impact” or “Avg Energy Impact”.

The Activity Monitor documentation says the following.

Energy Impact: A relative measure of the current energy consumption of the app. Lower numbers are better.

Avg Energy Impact: The average energy impact for the past 8 hours or since the Mac started up, whichever is shorter.

That is vague. Other Apple documentation says the following.

The Energy tab of Activity Monitor displays the Energy Impact of each open app based on a number of factors including CPU usage, network traffic, disk activity and more. The higher the number, the more impact an app has on battery power.

More detail, but still vague. Enough so that various other  people have wondered what it means. The most precise description I have found says the following.

If my recollection of the developer presentation slide on App Nap is correct, they are an abstract unit Apple created to represent several factors related to energy usage meant to compare programs relatively.

I don’t believe you can directly relate them to one simple unit, because they are from an arbitrary formula of multiple factors.

[…] To get the units they look at CPU usage, interrupts, and wakeups… track those using counters and apply that to the energy column as a relative measure of an app.

This sounds plausible, and we will soon see that it appears to be close to the truth.

A detour: top

First, a necessary detour. top is a program that is similar to Activity Monitor, but it runs from the command-line. Like Activity Monitor, top performs periodic measurements of many different things, including several that are relevant to power consumption: CPU usage, wakeups, and a “power” measure. To see all these together, invoke it as follows.

top -stats pid,command,cpu,idlew,power -o power -d

(A non-default invocation is necessary because the wakeups and power columns aren’t shown by default unless you have an extremely wide screen.)

It will show real-time data, updated once per second, like the following.

PID            COMMAND                  %CPU         IDLEW        POWER
50300          firefox                  12.9         278          26.6
76256          plugin-container         3.4          159          11.3
151            coreaudiod               0.9          68           4.3
76505          top                      1.5          1            1.6 
76354          Activity Monitor         1.0          0            1.0

The PID, COMMAND and %CPU columns are self-explanatory.

The IDLEW column is the number of package idle exit wakeups. These occur when the processor package (containing the cores, GPU, caches, etc.) transitions from a low-power idle state to the active state. This happens when the OS schedules a process to run due to some kind of event. Common causes of wakeups include scheduled timers going off and blocked I/O system calls receiving data.

What about the POWER column? top is open source, so its meaning can be determined conclusively by reading the powerscore_insert_cell function in the source code. (The POWER measure was added to top in OS X 10.9.0 and the code has remain unchanged all the way through to OS X 10.10.2, which is the most recent version for which the code is available.)

The following is a summary of what the code does, and it’s easier to understand if the %CPU and POWER computations are shown side-by-side.

|elapsed_us| is the length of the sample period
|used_us| is the time this process was running during the sample period

  %CPU = (used_us * 100.0) / elapsed_us

  POWER = if is_a_kernel_process()
            0
          else
            ((used_us + IDLEW * 500) * 100.0) / elapsed_us
          

The %CPU computation is as expected.

The POWER computation is a function of CPU and IDLEW. It’s basically the same as %CPU but with a “tax” of 500 microseconds for each wakeup and an exception for kernel processes. The value of this function can easily exceed 100 — e.g. a program with zero CPU usage and 3,000 wakeups per second will have a POWER score of 150 — so it is not a percentage. In fact, POWER is a unitless measure because it is a semi-arbitrary combination of two measures with incompatible units.

Back to Activity Monitor and “Energy Impact”

MacBook Pro running Mac OS X 10.9.5

First, I did some measurements with a MacBook Pro with an i7-4960HQ processor running Mac OS X 10.9.5.

I did extensive testing with a range of programs: ones that trigger 100% CPU usage; ones that trigger controllable numbers of idle wakeups; ones that stress the memory system heavily; ones that perform frequent disk operations; and ones that perform frequent network operations.

In every case, Activity Monitor’s “Energy Impact” was the same as top‘s POWER measure. Every indication is that the two are computed identically on this machine.

For example, consider the data in the following table,  The data was gathered with a small test program that fires a timer N times per second; other than extreme cases (see below) each timer firing causes an idle platform wakeup.

-----------------------------------------------------------------------------
Hz     CPU ms/s   Intr        Pkg Idle   Pkg Power  Act.Mon. top
-----------------------------------------------------------------------------
     2     0.14        2.00       1.80     2.30W     0.1    0.1
   100     4.52      100.13      95.14     3.29W       5      5
   500     9.26      499.66     483.87     3.50W      25     25
  1000    19.89     1000.15     978.77     5.23W      50     50
  5000    17.87     4993.10    4907.54    14.50W     240    240
 10000    32.63     9976.38    9194.70    17.61W     485    480
 20000    66.66    19970.95   17849.55    21.81W     910    910
 30000    99.62    28332.79   25899.13    23.89W    1300   1300
 40000   132.08    37255.47   33070.19    24.43W    1610   1650
 50000   160.79    46170.83   42665.61    27.31W    2100   2100
 60000   281.19    58871.47   32062.39    29.92W    1600   1650
 70000   276.43    67023.00   14782.03    31.86W     780    750
 80000   304.16    81624.60     258.22    35.72W      43     45
 90000   333.20    90100.26     153.13    37.93W      40     42
100000   363.94    98789.49      44.18    39.31W      38     38

The table shows a variety of measurements for this program for different values of N. Columns 2–5 are from powermetrics, and show CPU usage, interrupt frequency, and package idle wakeup frequency, respectively. Column 6 is Activity Monitor’s “Energy Impact”, and column 7 is top‘s POWER measurement. Column 6 and 7 (which are approximate measurements) are identical, modulo small variations due to the noisiness of these measurements.

MacBook Air running Mac OS X 10.10.4

I also tested a MacBook Air with an i5-4250U processor running Mac OS X 10.10.4. The results were substantially different.

-----------------------------------------------------------------------------
Hz     CPU ms/s   Intr        Pkg Idle   Pkg Power Act.Mon. top
-----------------------------------------------------------------------------
     2     0.21        2.00       2.00     0.63W   0.0     0.1
   100     6.75       99.29      96.69     0.81W   2.4     5.2
   500    22.52      499.40     475.04     1.15W   10       25
  1000    44.07      998.93     960.59     1.67W   21       48
  3000   109.71     3001.05    2917.54     3.80W   60      145
  5000    65.02     4996.13    4781.43     3.79W   90      230
  7500   107.53     7483.57    7083.90     4.31W   140     350
 10000   144.00     9981.25    9381.06     4.37W   190     460

The results from top are very similar to those from the other machine. But Activity Monitor’s “Energy Impact” no longer matches top‘s POWER measure. As a result it is much harder to say with confidence what “Energy Impact” represents on this machine. I tried tweaking the previous formula so that the idle wakeup “tax” drops from 500 microseconds to 180 or 200 microseconds and that gives results that appear to be in the ballpark but don’t match exactly. I’m a bit skeptical whether Activity Monitor is doing all its measurements at the same time or not. But it’s also quite possible that other inputs have been added to the function that computes “Energy Impact”.

What about “Avg Energy Impact”?

What about the “Avg Energy Impact”? It seems reasonable to assume it is computed in the same way as “Energy Impact”, but averaged over a longer period. In fact, we already know that period from the Apple documentation that says it is the “average energy impact for the past 8 hours or since the Mac started up, whichever is shorter.”

Indeed, when the Energy tab of Activity Monitor is first opened, the “Avg Energy Impact” column is empty and the title bar says “Activity Monitor (Processing…)”. After a few seconds the “Avg Energy Impact” column is populated with values and the title bar changes to “Activity Monitor (Applications in last 8 hours)”. If you have top open during those 5–10 seconds can you see that systemstats is running and using a lot of CPU, and so presumably the measurements are obtained from it.

systemstats is a program that runs all the time and periodically measures, among other things, CPU usage and idle wakeups for each running process (visible in the “Processes” section of its output.) I’ve done further tests that indicate that the “Avg Energy Impact” is almost certainly computed using the same formula as “Energy Impact”. The difference is that the the measurements are from the past 8 hours of wake time — i.e. if a laptop is closed for several hours and then reopened, those hours are not included in the calculation — as opposed to the 1, 2 or 5 seconds of wake time used for “Energy Impact”.

battery status menu

Even more prominent than Activity Monitor is OS X’s battery status menu. When you click on the battery icon in the OS X menu bar you get a drop-down menu which includes a list of “Apps Using Significant Energy”.

Screenshot of the OS X battery status menu

How is this determined? When you open this menu for the first time in a while it says “Collecting Power Usage Information” for a few seconds, and if you have top open during that time you see that, once again, systemstats is running and using a lot of CPU. Furthermore, if you click on an application name in the menu Activity Monitor will be opened and that application’s entry will be highlighted. Based on these facts it seems reasonable to assume that “Energy Impact” is again being used to determine which applications show up in the battery status menu.

I did some more tests (on my MacBook Pro running 10.9.5) and it appears that once an energy-intensive application is started it takes about 20 or 30 seconds for it to show up in the battery status menu. And once the application stops using high amounts of energy I’ve seen it take between 4 and 10 minutes to disappear. The exception is if the application is closed, in which case it disappears immediately.

Finally, I tried to determine the significance threshold. It appears that a program with an “Energy Impact” of roughly 20 or more will eventually show up as significant, and programs that have much higher “Energy Impact” values tend to show up more quickly.

All of these battery status menu observations are difficult to make reliably and so should be treated with caution. They may also be different in OS X 10.10. It is clear, however, that the window used by the battery status menu is measured in seconds or minutes, which is much less than the 8 hour window used for “Avg Energy Impact”.

An aside: systemstats is always running on OS X. The particular invocation used for the long-running instance — the one used by both Activity Monitor and the battery status menu — takes the undocumented --xpc flag. When I tried running it with that flag I got an error message saying “This mode should only be invoked by launchd”. So it’s hard to know how often it’s making measurements. The output from vanilla command-line invocations indicate it’s about every 10 minutes.

But it’s worth noting that systemstats has a -J option which causes the CPU usage and wakeups for child processes to be attributed to their parents. It seems likely that the --xpc option triggers the same behaviour because the Activity Monitor does not show “Avg Energy Impact” for child processes (as can be seen in the screenshot above for the login, bash and vim processes that are children of the Terminal process). This hypothesis also matches up with the battery status menu, which never shows child processes. One consequence of this is that if you ssh into a Mac and run a power-intensive program from the command line it will not show up in Activity Monitor’s energy tab or the battery status menu, because it’s not attributable to a top-level process such as Terminal! Such processes will show up in top and in Activity Monitor’s CPU tab, however.

How good a measure is “Energy Impact”?

We’ve now seen that “Energy Impact” is used widely throughout OS X. How good a measure is it?

The best way to measure power consumption is to actually measure power consumption. One way to do this is to use an ammeter, but this is difficult. Another way is to measure how long it takes for the battery to drain, which is easier but slow and requires steady workloads. Alternatively, recent Intel hardware provides high-quality estimates of processor and memory power consumption that are relatively easy to obtain.

These approaches all have the virtue of measuring or estimating actual power consumption (i.e. Watts). But the big problem is that they are machine-wide measures that cannot be used on a per-process basis. This is why Activity Monitor uses several proxy measures — ones that correlate with power consumption — which can be measured on a per-process basis. “Energy Impact” is a hybrid of at least two different proxy measures: CPU usage and wakeup frequency.

The main problem with this is that “Energy Impact” is an exaggerated measure. Look at the first table above, with data from the 10.9.5 machine. The variation in the “Pkg Power” column — which shows the package power from the above-mentioned Intel hardware estimates — is vastly smaller than the variation in the “Energy Impact” measurements. For example, going from 1,000 to 10,000 wakeups per second increases the package power by 3.4x, but the “Energy Impact” increases by 9.7x, and the skew gets even worse at higher wakeup frequencies. “Energy Impact” clearly weights wakeups too heavily. (In the second table, with data from the 10.10.4 machine, the weight given to wakeups is less, but still too high.)

Also, in the first table “Energy Impact” actually decreases when the timer frequency gets high enough. Presumably this is because the timer interval is so short that the OS has trouble putting the package into a idle power state. This leads to the absurd result that firing a timer at 1,000 Hz has about the same “Energy Impact” value as firing one at 100,000 Hz, when the package power of the latter is about 7.5x higher.

Having said all that, it’s understandable why Apple uses formulations of this kind for “Energy Impact”.

  • CPU usage and wakeup frequency are probably the two most important factors affecting a process’s power consumption, and they are factors that can be measured on a per-process basis.
  • Having a single measure makes things easy for users; evaluating the relative important of multiple measures is more difficult.
  • The exception for kernel processes (which always have an “Energy Impact” of 0) avoids OS X itself being blamed for high power consumption. This makes a certain amount of sense — it’s not like users can close the kernel — while also being somewhat misleading.

If I were in charge of Apple’s Activity Monitor product, I’d do two things.

  1. I would compute a new formula for “Energy Impact”. I would measure the CPU usage, wakeup frequency (and any other inputs) and actual power consumption for a range of real-world programs, on a range of different Apple machines. From this data, hopefully a reasonably accurate model could be constructed. It wouldn’t be perfect, and it wouldn’t need to be perfect, but it should be possible to come up with something that reflects actual power consumption better than the existing formulations. Once formulated, I would then test the new version against synthetic microbenchmarks, like the ones I used above, to see how it holds up. Given the choice between accurately modelling real-world applications and accurately modelling synthetic microbenchmarks, I would definitely favour the former.
  2. I would publicly document the formula that is used so that developers can actually tell how their applications are being evaluated, and can optimize for that measure. You may think “but then developers will be optimizing for a synthetic measure rather than a real one” and you’d be right. That’s an inevitable consequence of giving a synthetic measure such prominence, and all the more reason for improving it.

Conclusion

“Energy Impact” is a flawed measure of an application’s power consumption. Nonetheless, it’s what many people use at this moment to evaluate the power consumption of OS X applications, so it’s worth understanding. And if you are an OS X application developer who wants to reduce the “Energy Impact” of your application, it’s clear that it’s best to focus first on reducing wakeup frequency, and then on reducing CPU usage.

Because Activity Monitor is closed source code I don’t know if I’ve characterized “Energy Impact” exactly correctly. The evidence given above indicates that I am close on 10.9.5, but not as close on 10.10.4. I’d love to hear if anybody has evidence that either corroborates or contradicts the conclusions I’ve made here. Thank you.

Update

A commenter named comex has done some great detective work and found on 10.10 and 10.11 Activity Monitor consults a Mac model-specific file in the /usr/share/pmenergy/ directory. (Thank you, comex.)

For example, my MacBook Air has a model number 7DF21CB3ED6977E5 and the file Mac-7DF21CB3ED6977E5.plist has the following list of key/value pairs under the heading “energy_constants”.

kcpu_time               1.0
kcpu_wakeups            2.0e-4

This matches the previously seen formula, but with the wakeups “tax” being 200 microseconds, which matches what I hypothesized above.

kqos_default            1.0e+00
kqos_background         5.2e-01
kqos_utility            1.0e+00
kqos_legacy             1.0e+00         
kqos_user_initiated     1.0e+00
kqos_user_interactive   1.0e+00

“QoS” refers to quality of service classes which allow an application to mark some of its own work as lower priority. I’m not sure exactly how this is factored in, but from the numbers above it appears that operations done in the lowest-priority “background” class is considered to have an energy impact of about half that done in all the other classes.

kdiskio_bytesread       0.0
kdiskio_byteswritten    5.3e-10

These ones are straightforward. Note that the “tax” for disk reads is zero, and for disk writes it’s a very small number. I wrote a small program that wrote endlessly to disk and saw that the “Energy Impact” was slightly higher than the CPU percentage alone, which matches expectations.

kgpu_time               3.0e+00

It makes sense that GPU usage is included in the formula. It’s not clear if this refers to the integrated GPU or the separate (higher performance, higher power) GPU. It’s also interesting that the weighting is 3x.

knetwork_recv_bytes     0.0 
knetwork_recv_packets   4.0e-6
knetwork_sent_bytes     0.0
knetwork_sent_packets   4.0e-6

These are also straightforward. In this case, the number of bytes sent is ignored, and only the number of packets matter, and the cost of reading and writing packets is considered equal.

So, in conclusion, on 10.10 and 10.11, the formula used to compute “Energy Impact” is machine model-specific, and includes the following factors: CPU usage, wakeup frequency, quality of service class usage, and disk, GPU, and network activity.

This is definitely an improvement over the formula used in 10.9, which is great to see. The parameters are also visible, if you know where to look! It would be wonderful if all these inputs, along with their relative weightings, could be seen at once in Activity Monitor. That way developers would have a much better sense of exactly how their application’s “Energy Impact” is determined.

Source code for powermetrics?

Mac OS X has a nice command-line utility called powermetrics that can perform a range of power-related measurements. Does anybody know if the source code is available for it? I’ve looked on Apple’s Open Source site but couldn’t find it. (In contrast, I did find the source code for top, which has some much simpler power measurements.)

I’d also love to see source code for the Activity Monitor, but given that it’s a graphical application rather than a command-line application, it seems less likely it will be available.

A work-around for Tree Style Tab breakage on Firefox Nightly caused by mozRequestAnimationFrame removal

This post is aimed at Firefox Nightly users who also use the Tree Style Tab extension. Bug 909154 landed last week. It removed support for the prefixed mozRequestionAnimationFrame function, and broke Tree Style Tab. The GitHub repository that hosts Tree Style Tab’s code has been updated, but that has not yet made it into the latest Tree Style Tab build, which has version number 0.15.2015061300a003855.

Fortunately, it’s fairly easy to modify your installed version of Tree Style Tab to fix this problem. (“Fairly easy”, at least, for the technically-minded users who run Firefox Nightly.)

  • Find the Tree Style Tabs .xpi file. On my Linux machine, it’s at ~/.mozilla/firefox/ndbcibpq.default-1416274259667/extensions/treestyletab@piro.sakura.ne.jp.xpi. Your profile name will not be exactly the same. (In general, you can find your profile with these instructions.)
  • That file is a zip file. Edit the modules/lib/animationManager.js file within that file, and change the two occurrences of mozRequestAnimationFrame to requestAnimationFrame. Save the change.

I did the editing in vim, which was easy because vim has the ability to edit zip files in place. If your editor does not support that, it might work if you unzip the code, edit the file directly, and then rezip, but I haven’t tried that myself. Good luck.

“Thank you” is a wonderful phrase

Last year I contributed a number of patches to pdf.js. Most of my patches were reviewed and merged to the codebase by Yury Delendik. Every single time he merged one of my patches, Yury wrote “Thank you for the patch”. It was never “thanks” or “thank you!” or “thank you :)”. Nor was it “awesome” or “great work” or “+1″ or “\o/” or “\m/”. Just “Thank you for the patch”.

Oddly enough, this unadorned use of one of the most simple and important phrases in the English language struck me as quaint, slightly formal, and perhaps a little old-fashioned. Not a bad thing by any means, but… notable. And then, as Yury merged more of my patches, I started getting used to it. Tthen I started enjoying it. Each time he wrote it — I’m pretty sure he wrote it every time — it made me smile. I felt a small warm glow inside. All because of a single, simple, specific phrase.

So I started using it myself. (“Thank you for the patch.”) I tried to use it more often, in situations I previously wouldn’t have. (“Thank you for the fast review”.) I mostly kept to this simple form and eschewed variations. (“Thank you for the additional information.”) I even started using it each time somebody answered one of my questions on IRC. (“glandium: thank you”)

I’m still doing this. I don’t always use this exact form, and I don’t always remember to thank people who have helped me. But I do think it has made my gratitude to those around me more obvious, more consistent, and more sincere. It feels good.

Compacting GC

Go read Jon Coppeard’s description of the compacting GC algorithm now used by SpiderMonkey!

Firefox 41 will use less memory when running AdBlock Plus

Last year I wrote about AdBlock Plus’s effect on Firefox’s memory usage. The most important part was the following.

First, there’s a constant overhead just from enabling ABP of something like 60–70 MiB. […] This appears to be mostly due to additional JavaScript memory usage, though there’s also some due to extra layout memory.

Second, there’s an overhead of about 4 MiB per iframe, which is mostly due to ABP injecting a giant stylesheet into every iframe. Many pages have multiple iframes, so this can add up quickly. For example, if I load TechCrunch and roll over the social buttons on every story […], without ABP, Firefox uses about 194 MiB of physical memory. With ABP, that number more than doubles, to 417 MiB.

An even more extreme example is this page, which contains over 400 iframes. Without ABP, Firefox uses about 370 MiB. With ABP, that number jumps to 1960 MiB.

(This description was imprecise; the overhead is actually per document, which includes both top-level documents in a tab and documents in iframes.)

Last week Mozilla developer Cameron McCormack landed patches to fix bug 77999, which was filed more than 14 years ago. These patches enable sharing of CSS-related data — more specifically, they add data structures that share the results of cascading user agent style sheets — and in doing so they entirely fix the second issue, which is the more important of the two.

For example, on the above-mentioned “extreme example” (a.k.a. the Vim Color Scheme Test) memory usage dropped by 3.62 MiB per document. There are 429 documents on that page, which is a total reduction of about 1,550 MiB, reducing memory usage for that page down to about 450 MiB, which is not that much more than when AdBlock Plus is absent. (All these measurements are on a 64-bit build.)

I also did measurements on various other sites and confirmed the consistent saving of ~3.6 MiB per document when AdBlock Plus is enabled. The number of documents varies widely from page to page, so the exact effect depends greatly on workload. (I wanted to test TechCrunch again, but its front page has been significantly changed so it no longer triggers such high memory usage.) For example, for one of my measurements I tried opening the front page and four articles from each of nytimes.com, cnn.com and bbc.co.uk, for a total of 15 tabs. With Cameron’s patches applied Firefox with AdBlock Plus used about 90 MiB less physical memory, which is a reduction of over 10%.

Even when AdBlock Plus is not enabled this change has a moderate benefit. For example, in the Vim Color Scheme Test the memory usage for each document dropped by 0.09 MiB, reducing memory usage by about 40 MiB.

If you want to test this change out yourself, you’ll need a Nightly build of Firefox and a development build of AdBlock Plus. (Older versions of AdBlock Plus don’t work with Nightly due to a recent regression related to JavaScript parsing). In Firefox’s about:memory page you’ll see the reduction in the “style-sets” measurements. You’ll also see a new entry under “layout/rule-processor-cache”, which is the measurement of the newly shared data; it’s usually just a few MiB.

This improvement is on track to make it into Firefox 41, which is scheduled for release on September 22, 2015. For users on other release channels, Firefox 41 Beta is scheduled for release on August 11, and Firefox 41 Developer Edition is scheduled to be released in the next day or two.

Measuring data structure sizes: Firefox (C++) vs. Servo (Rust)

Firefox’s about:memory page presents fine-grained measurements of memory usage. Here’s a short example.

725.84 MB (100.0%) -- explicit
├──504.36 MB (69.49%) -- window-objects
│ ├──115.84 MB (15.96%) -- top(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound, id=2147483655)
│ │ ├───85.30 MB (11.75%) -- active
│ │ │ ├──84.75 MB (11.68%) -- window(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound)
│ │ │ │ ├──36.51 MB (05.03%) -- dom
│ │ │ │ │ ├──16.46 MB (02.27%) ── element-nodes
│ │ │ │ │ ├──13.08 MB (01.80%) ── orphan-nodes
│ │ │ │ │ └───6.97 MB (00.96%) ++ (4 tiny)
│ │ │ │ ├──25.17 MB (03.47%) -- js-compartment(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound)
│ │ │ │ │ ├──23.29 MB (03.21%) ++ classes
│ │ │ │ │ └───1.87 MB (00.26%) ++ (7 tiny)
│ │ │ │ ├──21.69 MB (02.99%) ++ layout
│ │ │ │ └───1.39 MB (00.19%) ++ (2 tiny)
│ │ │ └───0.55 MB (00.08%) ++ window(https://login.persona.org/communication_iframe)
│ │ └───30.54 MB (04.21%) ++ js-zone(0x7f131ed6e000)

A typical about:memory invocation contains many thousands of measurements. Although they can be hard for non-experts to interpret, they are immensely useful to Firefox developers. For this reason, I’m currently implementing a similar system in Servo, which is a next-generation browser engine that’s implemented in Rust. Although the implementation in Servo is heavily based on the Firefox implementation, Rust has some features that make the Servo implementation a lot nicer than the Firefox implementation, which is written in C++. This blog post is a deep dive that explains how and why.

Measuring data structures in Firefox

A lot of the measurements done for about:memory are of heterogeneous data structures that live on the heap and contain pointers. We want such data structures to be able to measure themselves. Consider the following simple example.

struct CookieDomainTuple
{
  nsCookieKey key;
  nsRefPtr<nsCookie> cookie;
 
  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) const;
};

The things to immediately note about this type are as follows.

  • The details of nsCookieKey and nsCookie don’t matter here.
  • nsRefPtr is a smart pointer type.
  • There is a method, called SizeOfExcludingThis, for measuring the size of a CookieDomainTuple.

That measurement method has the following form.

size_t
CookieDomainTuple::SizeOfExcludingThis(MallocSizeOf aMallocSizeOf) const
{
  size_t amount = 0;
  amount += key.SizeOfExcludingThis(aMallocSizeOf);
  amount += cookie->SizeOfIncludingThis(aMallocSizeOf);
  return amount;
}

Things to note here are as follows.

  • aMallocSizeOf is a pointer to a function that takes a pointer to a heap block and returns the size of that block in bytes. Under the covers it’s implemented with a function like malloc_usable_size. Using a function like this is superior to computing the size analytically, because (a) it’s less error-prone and (b) it measures the actual size of heap blocks, which is often larger than the requested size because heap allocators round up some request sizes. It will also naturally measure any padding between members.
  • The two data members are measured by invocations to size measurement methods that they provide.
  • The first of these is called SizeOfExcludingThis. The “excluding this” here is necessary because key is an nsCookieKey that sits within a CookieDomainTuple. We don’t want to measure the nsCookieKey struct itself, just any additional heap blocks that it has pointers to.
  • The second of these is called SizeOfIncludingThis. The “including this” here is necessary because cookie is just a pointer to an nsCookie struct, which we do want to measure, along with any additional heap blocks it has pointers to.
  • We need to be careful with these calls. If we call SizeOfIncludingThis when we should call SizeOfExcludingThis, we’ll likely get a crash due to calling aMallocSizeOf on a non-heap pointer. And if we call SizeOfExcludingThis when we should call SizeOfIncludingThis, we’ll miss measuring the struct.
  • If this struct had a pointer to a raw heap buffer — e.g. a char* member — it would measure it by calling aMallocSizeOf directly on the pointer.

With that in mind, you can see that this method is itself a SizeOfExcludingThis method, and indeed, it doesn’t measure the memory used by the struct instance itself. A method that did include that memory would look like the following.

size_t
CookieDomainTuple::SizeOfIncludingThis(MallocSizeOf aMallocSizeOf)
{ 
  return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
}

All it does is measure the CookieDomainTuple struct itself — i.e. this — and then call the SizeOfExcludingThis method, which measures all child structures.

There are a few other wrinkles.

  • Often we want to ignore a data member. Perhaps it’s a scalar value, such as an integer. Perhaps it’s a non-owning pointer to something and that thing would be better measured as part of the measurement of another data structure. Perhaps it’s something small that isn’t worth measuring. In these cases we generally use comments in the measurement method to explain why a field isn’t measured, but it’s easy for these comments to fall out-of-date. It’s also easy to forget to update the measurement method when a new data member is added.
  • Every SizeOfIncludingThis method body looks the same: return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
  • Reference-counting complicates things, because you end up with pointers that conceptually own a fraction of another structure.
  • Inheritance complicates things.

(The full documentation goes into more detail.)

Even with all the wrinkles, it all works fairly well. Having said that, there are a lot of SizeOfExcludingThis and SizeOfIncludingThis methods that are boilerplate-y and tedious to write.

Measuring data structures in SERVO

When I started implementing a similar system in Servo, I naturally followed a similar design. But I soon found I was able to improve upon it.

With the same functions defined for lots of types, it was natural to define a Rust trait, like the following.

pub trait HeapSizeOf { 
  fn size_of_including_self(&self) -> usize;
  fn size_of_excluding_self(&self) -> usize; 
}

Having to repeatedly define size_of_including_self when its definition always looks the same is a pain. But heap pointers in Rust are handled via the parameterized Box type, and it’s possible to implement traits for this type. This means we can implement size_of_excluding_this for all Box types — thus removing the need for size_of_including_this — in one fell swoop, as the following code shows.

impl<T: HeapSizeOf> HeapSizeOf for Box<T> {
  fn size_of_excluding_self(&self) -> usize {
    heap_size_of(&**self as *const T as *const c_void) + (**self).size_of_excluding_self()
  }
}

The pointer manipulations are hairy, but basically it says that if T implements the HeapSizeOf trait, then we can measure Box<T> by measuring the T struct itself (via heap_size_of, which is similar to the aMallocSizeOf function in the Firefox example), and then measuring the things hanging off T (via the size_of_excluding_self call). Excellent!

With the including/excluding distinction gone, I renamed size_of_excluding_self as heap_size_of_children, which I thought communicated the same idea more clearly; it seems better for the name to describe what it is measuring rather than what it is not measuring.

But there was still a need for a lot of tedious boilerplate code, as this example shows.

pub struct DisplayList {
  pub background_and_borders: LinkedList<DisplayItem>,
  pub block_backgrounds_and_borders: LinkedList<DisplayItem>,
  pub floats: LinkedList<DisplayItem>,
  pub content: LinkedList<DisplayItem>,
  pub positioned_content: LinkedList<DisplayItem>,
  pub outlines: LinkedList<DisplayItem>,
  pub children: LinkedList<Arc<StackingContext>>,
}

impl HeapSizeOf for DisplayList {
  fn heap_size_of_children(&self) -> usize {
    self.background_and_borders.heap_size_of_children() +
    self.block_backgrounds_and_borders.heap_size_of_children() +
    self.floats.heap_size_of_children() +
    self.content.heap_size_of_children() +
    self.positioned_content.heap_size_of_children() +
    self.outlines.heap_size_of_children() +
    self.children.heap_size_of_children()
  }
}

However, the Rust compiler has the ability to automatically derive implementations for some built-in traits. Even better, the compiler lets you write plug-ins that do arbitrary transformations of the syntax tree, which makes it possible to write a plug-in that does the same for non-built-in traits on request. And the delightful Manish Goregaokar has done exactly that. This allows the example above to be reduced to the following.

#[derive(HeapSizeOf)]
pub struct DisplayList {
  pub background_and_borders: LinkedList<DisplayItem>,
  pub block_backgrounds_and_borders: LinkedList<DisplayItem>,
  pub floats: LinkedList<DisplayItem>,
  pub content: LinkedList<DisplayItem>,
  pub positioned_content: LinkedList<DisplayItem>,
  pub outlines: LinkedList<DisplayItem>,
  pub children: LinkedList<Arc<StackingContext>>,
}

The first line is an annotation that triggers the plug-in to do the obvious thing: generate a heap_size_of_children definition that just calls heap_size_of_children on all the struct fields. Wonderful!

But you may remember that I mentioned that sometimes in Firefox’s C++ code we want to ignore a particular member. This is also true in Servo’s Rust code, so the plug-in supports an ignore_heap_size annotation which can be applied to any field in the struct definition; the plug-in will duly ignore any such field.

If a new field is added which has a type for which HeapSizeOf has not been implemented, the compiler will complain. This means that we can’t add a new field to a struct and forget to measure it. The ignore_heap_size_of annotation also requires a string argument, which (by convention) holds a brief explanation why the member is ignored, as the following example shows.

#[ignore_heap_size_of = "Because it is a non-owning reference."]
pub image: Arc<Image>,

(An aside: the best way to handle Arc is an open question. If one of the references is clearly the owner, it probably makes sense to count the full size for that one reference. Otherwise, it is probably best to divide the size equally among all the references.)

The plug-in also has a known_heap_size_of! macro that lets us easily dictate the heap size of built-in types (such as integral types, whose heap size is zero). This works because Rust allows implementations of custom traits for built-in types. It provides additional uniformity because built-in types don’t need special treatment. The following line says that all the built-in signed integer types have a heap_size_of_children value of zero.

known_heap_size_of!(0, i8, i16, i32, i64, isize);

Finally, if there is a type for which the measurement needs to do something more complicated, we can still implement heap_size_of_children manually.

Conclusion

The Servo implementation is much nicer than the Firefox implementation, in the following ways.

  •  There is no need for an including/excluding split thanks to trait implementations on Box. This avoids boilerplate some code and makes it impossible to accidentally call the wrong method.
  • Struct fields that use built-in types are handled the same way as all others, because Rust trait implementations can be defined for built-in types.
  • Even more boilerplate is avoided thanks to the compiler plug-in that auto-derives HeapSizeOf implementations; it can even ignore fields.
  • For ignored fields, the required string parameter makes it impossible to forget to explain why the field is ignored.

These are possible due to several powerful language and compiler features of Rust that C++ lacks. There may be some C++ features that could improve the Firefox code — and I’d love to hear suggestions — but it’s never going to be as nice as the Rust code.

On vacation for a month

I’m taking a month of vacation. Today is my last working day for March, and I will be back on April 30th. While I won’t be totally incommunicado, for the most part I won’t be reading email. While I’m gone, any management-type inquiries can be passed on to Naveed Ihsannullah.

Fix your damned data races

Nathan Froyd recently wrote about how he has been using ThreadSanitizer to find data races in Firefox, and how a number of Firefox developers — particular in the networking and JS GC teams — have been fixing these.

This is great news. I want to re-emphasise and re-state one of the points from Nathan’s post, which is that data races are a class of bug that almost everybody underestimates. Unless you have, say, authored a specification of the memory model for a systems programming language, your intuition about the potential impact of many data races is probably wrong. And I’m going to give you three links to explain why.

Hans Boehm’s paper How to miscompile programs with “benign” data races explains very clearly that it’s possible to correctly conclude that a data race is benign at the level of machine code, but it’s almost impossible at the level of C or C++ code. And if you try to do the latter by inspecting the code generated by a C or C++ compiler, you are not allowing for the fact that other compilers (including future versions of the compiler you used) can and will generate different code, and so your conclusion is both incomplete and temporary.

Dmitri Vyukov’s blog post Benign data races: what could possibly go wrong? covers similar ground, giving more examples of how compilers can legitimately compile things in surprising ways. For example, at any point the storage used by a local variable can be temporarily used to hold a different variable’s value (think register spilling). If another thread reads this storage in an racy fashion, it could read the value of an unrelated value.

Finally, John Regehr’s blog has many posts that show how C and C++ compilers take advantage of undefined behaviour to do surprising (even shocking) program transformations, and how the extent of these transformations has steadily increased over time. Compilers genuinely are getting smarter, and are increasingly pushing the envelope of what a language will let them get away with. And the behaviour of a C or C++ programs is undefined in the presence of data races. (This is sometimes called “catch-fire” semantics, for reasons you should be able to guess.)

So, in summary: if you write C or C++ code that deals with threads in Firefox — and that probably includes everybody who writes C or C++ code in Firefox — you should have read at least the first two links I’ve given above. If you haven’t, please do so now. If you have read them and they haven’t made you at least slightly terrified, you should read them again. And if TSan identifies a data race in code that you are familiar with, please take it seriously, and fix it. Thank you.

Using GDB to get stack traces at particular program points

A while back I wrote how I sometimes use Valgrind to print a stack trace every time a particular program point is reached. I just learned how to do likewise with GDB. Here’s an example session that illustrates what to do.

(gdb) break je_chunk_alloc_mmap     # set a breakpoint
Breakpoint 1 at 0x1aa32c0
(gdb) commands                      # enter breakpoint command sequence
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>bt                                 # print stack trace
>c                                  # continue execution
>end                                # end command sequence
(gdb) set pagination off            # don't pause when the screen fills up
(gdb) set logging on                # copy output to a file
Copying output to gdb.txt.

This is better than the Valgrind technique because (a) it doesn’t require modifying the source code, and (b) programs tend to run faster under GDB than they do under Valgrind.

Thanks to Mike Hommey for helping with this.