In light of the recent announcement that all of our Tier-1 platforms now have a C++11-supporting standard library, I received some questions about whether we should continue encouraging the use of Gecko-specific data structures. My answer was “yes”, and as I was writing the justification for said answer, I felt that the justification was worth broadcasting to a wider audience. Here are the reasons I came up with; feel free to agree or disagree in the comments.
- Gecko’s data structures can be customized extensively for our purposes, whereas we don’t have the same control over the standard library. Our string classes, for instance, permit sharing structure between strings (whether via something like nsDependentString or reference-counted string buffers); that functionality isn’t currently supported in the standard library. While the default behavior on allocation failure in Gecko is to crash, our data structures provide interfaces for failing gracefully when allocations fail. Allocation failures in standard library data structures are reported via exceptions, which we don’t use. If you’re not using exceptions, allocation failures in those data structures simply crash, which isn’t acceptable in a number of places throughout Gecko.
- Gecko data structures can assume things about the environment that the standard library can’t. We ship the same memory allocator on all our platforms, so our hashtables and our arrays can attempt to make their allocation behavior line up with what the memory allocator efficiently supports. It’s possible that the standard library implementations we’re using do things like this, but it’s not guaranteed by the standard.
- Along similar lines as the first two, Gecko data structures provide better visibility for things like debug checks and memory reporting. Some standard libraries we support come with built-in debug modes, but not all of them, and not all debug modes are equally complete. Where possible, we should have consistent support for these sorts of things across all our platforms.
- Custom data structures may provide better behavior than standard data structures by relaxing the specifications provided by the standard. The WebKit team had a great blog post on their new mutex implementation, which optimizes for cases that OS-provided mutexes aren’t optimized for, either because of compatibility constraints or because of outside specifications. Chandler Carruth has a CppCon talk where he mentions the non-ideal interfaces in many of the standard library data structures. We can do better with custom data structures.
- Data structures in the standard library may provide inconsistent performance across platforms, or disagree on the finer points of the standard. Love them or hate them, Gecko’s data structures at least provide consistent behavior everywhere.
Most of these arguments are not new; if you look at the documentation for Facebook’s open-source Folly library, for instance, you’ll find a number of these arguments, if not expressed in quite the same way. Browsing through WebKit’s WTF library shows they have a number of the same things that we do in xpcom/
or mfbt/
as well, presumably for some of the same reasons.
All of this is not to say that our data structures are perfect: the APIs for our hashtables could use some improvements, our strings and nsTArray
do a poor job of separating “data structure” from “algorithm”, nsDeque
serves as an excellent excuse to go use the standard library instead, and XPCOM’s synchronization primitives should stop going through NSPR and use the underlying OS’s primitives directly (or simply be rewritten to use something like WebKit’s locking primitives, above). This is a non-exhaustive list; I have more ideas if people are interested.
Having a C++11 standard library on all platforms brings opportunities to remove dead polyfills; MFBT contains a number of these (Atomics.h, Tuple.h, TypeTraits.h, UniquePtr.h, etc.) But we shouldn’t flock to the standard library’s functionality just because it’s the standard. If the standard library’s functionality doesn’t fit our use cases, we should definitely write our own replacement(s) and use them widely.