Servo – Nicholas Nethercote

Firefox’s about:memory page presents fine-grained measurements of memory usage. Here’s a short example.

725.84 MB (100.0%) -- explicit
├──504.36 MB (69.49%) -- window-objects
│ ├──115.84 MB (15.96%) -- top(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound, id=2147483655)
│ │ ├───85.30 MB (11.75%) -- active
│ │ │ ├──84.75 MB (11.68%) -- window(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound)
│ │ │ │ ├──36.51 MB (05.03%) -- dom
│ │ │ │ │ ├──16.46 MB (02.27%) ── element-nodes
│ │ │ │ │ ├──13.08 MB (01.80%) ── orphan-nodes
│ │ │ │ │ └───6.97 MB (00.96%) ++ (4 tiny)
│ │ │ │ ├──25.17 MB (03.47%) -- js-compartment(https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound)
│ │ │ │ │ ├──23.29 MB (03.21%) ++ classes
│ │ │ │ │ └───1.87 MB (00.26%) ++ (7 tiny)
│ │ │ │ ├──21.69 MB (02.99%) ++ layout
│ │ │ │ └───1.39 MB (00.19%) ++ (2 tiny)
│ │ │ └───0.55 MB (00.08%) ++ window(https://login.persona.org/communication_iframe)
│ │ └───30.54 MB (04.21%) ++ js-zone(0x7f131ed6e000)

A typical about:memory invocation contains many thousands of measurements. Although they can be hard for non-experts to interpret, they are immensely useful to Firefox developers. For this reason, I’m currently implementing a similar system in Servo, which is a next-generation browser engine that’s implemented in Rust. Although the implementation in Servo is heavily based on the Firefox implementation, Rust has some features that make the Servo implementation a lot nicer than the Firefox implementation, which is written in C++. This blog post is a deep dive that explains how and why.

Measuring data structures in Firefox

A lot of the measurements done for about:memory are of heterogeneous data structures that live on the heap and contain pointers. We want such data structures to be able to measure themselves. Consider the following simple example.

struct CookieDomainTuple
{
  nsCookieKey key;
  nsRefPtr<nsCookie> cookie;
 
  size_t SizeOfExcludingThis(mozilla::MallocSizeOf aMallocSizeOf) const;
};

The things to immediately note about this type are as follows.

The details of nsCookieKey and nsCookie don’t matter here.
nsRefPtr is a smart pointer type.
There is a method, called SizeOfExcludingThis, for measuring the size of a CookieDomainTuple.

That measurement method has the following form.

size_t
CookieDomainTuple::SizeOfExcludingThis(MallocSizeOf aMallocSizeOf) const
{
  size_t amount = 0;
  amount += key.SizeOfExcludingThis(aMallocSizeOf);
  amount += cookie->SizeOfIncludingThis(aMallocSizeOf);
  return amount;
}

Things to note here are as follows.

aMallocSizeOf is a pointer to a function that takes a pointer to a heap block and returns the size of that block in bytes. Under the covers it’s implemented with a function like malloc_usable_size. Using a function like this is superior to computing the size analytically, because (a) it’s less error-prone and (b) it measures the actual size of heap blocks, which is often larger than the requested size because heap allocators round up some request sizes. It will also naturally measure any padding between members.
The two data members are measured by invocations to size measurement methods that they provide.
The first of these is called SizeOfExcludingThis. The “excluding this” here is necessary because key is an nsCookieKey that sits within a CookieDomainTuple. We don’t want to measure the nsCookieKey struct itself, just any additional heap blocks that it has pointers to.
The second of these is called SizeOfIncludingThis. The “including this” here is necessary because cookie is just a pointer to an nsCookie struct, which we do want to measure, along with any additional heap blocks it has pointers to.
We need to be careful with these calls. If we call SizeOfIncludingThis when we should call SizeOfExcludingThis, we’ll likely get a crash due to calling aMallocSizeOf on a non-heap pointer. And if we call SizeOfExcludingThis when we should call SizeOfIncludingThis, we’ll miss measuring the struct.
If this struct had a pointer to a raw heap buffer — e.g. a char* member — it would measure it by calling aMallocSizeOf directly on the pointer.

With that in mind, you can see that this method is itself a SizeOfExcludingThis method, and indeed, it doesn’t measure the memory used by the struct instance itself. A method that did include that memory would look like the following.

size_t
CookieDomainTuple::SizeOfIncludingThis(MallocSizeOf aMallocSizeOf)
{ 
  return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
}

All it does is measure the CookieDomainTuple struct itself — i.e. this — and then call the SizeOfExcludingThis method, which measures all child structures.

There are a few other wrinkles.

Often we want to ignore a data member. Perhaps it’s a scalar value, such as an integer. Perhaps it’s a non-owning pointer to something and that thing would be better measured as part of the measurement of another data structure. Perhaps it’s something small that isn’t worth measuring. In these cases we generally use comments in the measurement method to explain why a field isn’t measured, but it’s easy for these comments to fall out-of-date. It’s also easy to forget to update the measurement method when a new data member is added.
Every SizeOfIncludingThis method body looks the same: return aMallocSizeOf(this) + SizeOfExcludingThis(aMallocSizeOf);
Reference-counting complicates things, because you end up with pointers that conceptually own a fraction of another structure.
Inheritance complicates things.

(The full documentation goes into more detail.)

Even with all the wrinkles, it all works fairly well. Having said that, there are a lot of SizeOfExcludingThis and SizeOfIncludingThis methods that are boilerplate-y and tedious to write.

Measuring data structures in SERVO

When I started implementing a similar system in Servo, I naturally followed a similar design. But I soon found I was able to improve upon it.

With the same functions defined for lots of types, it was natural to define a Rust trait, like the following.

pub trait HeapSizeOf { 
  fn size_of_including_self(&self) -> usize;
  fn size_of_excluding_self(&self) -> usize; 
}

Having to repeatedly define size_of_including_self when its definition always looks the same is a pain. But heap pointers in Rust are handled via the parameterized Box type, and it’s possible to implement traits for this type. This means we can implement size_of_excluding_this for all Box types — thus removing the need for size_of_including_this — in one fell swoop, as the following code shows.

impl<T: HeapSizeOf> HeapSizeOf for Box<T> {
  fn size_of_excluding_self(&self) -> usize {
    heap_size_of(&**self as *const T as *const c_void) + (**self).size_of_excluding_self()
  }
}

The pointer manipulations are hairy, but basically it says that if T implements the HeapSizeOf trait, then we can measure Box<T> by measuring the T struct itself (via heap_size_of, which is similar to the aMallocSizeOf function in the Firefox example), and then measuring the things hanging off T (via the size_of_excluding_self call). Excellent!

With the including/excluding distinction gone, I renamed size_of_excluding_self as heap_size_of_children, which I thought communicated the same idea more clearly; it seems better for the name to describe what it is measuring rather than what it is not measuring.

But there was still a need for a lot of tedious boilerplate code, as this example shows.

pub struct DisplayList {
  pub background_and_borders: LinkedList<DisplayItem>,
  pub block_backgrounds_and_borders: LinkedList<DisplayItem>,
  pub floats: LinkedList<DisplayItem>,
  pub content: LinkedList<DisplayItem>,
  pub positioned_content: LinkedList<DisplayItem>,
  pub outlines: LinkedList<DisplayItem>,
  pub children: LinkedList<Arc<StackingContext>>,
}

impl HeapSizeOf for DisplayList {
  fn heap_size_of_children(&self) -> usize {
    self.background_and_borders.heap_size_of_children() +
    self.block_backgrounds_and_borders.heap_size_of_children() +
    self.floats.heap_size_of_children() +
    self.content.heap_size_of_children() +
    self.positioned_content.heap_size_of_children() +
    self.outlines.heap_size_of_children() +
    self.children.heap_size_of_children()
  }
}

However, the Rust compiler has the ability to automatically derive implementations for some built-in traits. Even better, the compiler lets you write plug-ins that do arbitrary transformations of the syntax tree, which makes it possible to write a plug-in that does the same for non-built-in traits on request. And the delightful Manish Goregaokar has done exactly that. This allows the example above to be reduced to the following.

#[derive(HeapSizeOf)]
pub struct DisplayList {
  pub background_and_borders: LinkedList<DisplayItem>,
  pub block_backgrounds_and_borders: LinkedList<DisplayItem>,
  pub floats: LinkedList<DisplayItem>,
  pub content: LinkedList<DisplayItem>,
  pub positioned_content: LinkedList<DisplayItem>,
  pub outlines: LinkedList<DisplayItem>,
  pub children: LinkedList<Arc<StackingContext>>,
}

The first line is an annotation that triggers the plug-in to do the obvious thing: generate a heap_size_of_children definition that just calls heap_size_of_children on all the struct fields. Wonderful!

But you may remember that I mentioned that sometimes in Firefox’s C++ code we want to ignore a particular member. This is also true in Servo’s Rust code, so the plug-in supports an ignore_heap_size annotation which can be applied to any field in the struct definition; the plug-in will duly ignore any such field.

If a new field is added which has a type for which HeapSizeOf has not been implemented, the compiler will complain. This means that we can’t add a new field to a struct and forget to measure it. The ignore_heap_size_of annotation also requires a string argument, which (by convention) holds a brief explanation why the member is ignored, as the following example shows.

#[ignore_heap_size_of = "Because it is a non-owning reference."]
pub image: Arc<Image>,

(An aside: the best way to handle Arc is an open question. If one of the references is clearly the owner, it probably makes sense to count the full size for that one reference. Otherwise, it is probably best to divide the size equally among all the references.)

The plug-in also has a known_heap_size_of! macro that lets us easily dictate the heap size of built-in types (such as integral types, whose heap size is zero). This works because Rust allows implementations of custom traits for built-in types. It provides additional uniformity because built-in types don’t need special treatment. The following line says that all the built-in signed integer types have a heap_size_of_children value of zero.

known_heap_size_of!(0, i8, i16, i32, i64, isize);

Finally, if there is a type for which the measurement needs to do something more complicated, we can still implement heap_size_of_children manually.

Conclusion

The Servo implementation is much nicer than the Firefox implementation, in the following ways.

There is no need for an including/excluding split thanks to trait implementations on Box. This avoids boilerplate some code and makes it impossible to accidentally call the wrong method.
Struct fields that use built-in types are handled the same way as all others, because Rust trait implementations can be defined for built-in types.
Even more boilerplate is avoided thanks to the compiler plug-in that auto-derives HeapSizeOf implementations; it can even ignore fields.
For ignored fields, the required string parameter makes it impossible to forget to explain why the field is ignored.

These are possible due to several powerful language and compiler features of Rust that C++ lacks. There may be some C++ features that could improve the Firefox code — and I’d love to hear suggestions — but it’s never going to be as nice as the Rust code.