why gecko data structures should be preferred to std:: ones

In light of the recent announcement that all of our Tier-1 platforms now have a C++11-supporting standard library, I received some questions about whether we should continue encouraging the use of Gecko-specific data structures. My answer was “yes”, and as I was writing the justification for said answer, I felt that the justification was worth broadcasting to a wider audience. Here are the reasons I came up with; feel free to agree or disagree in the comments.

Gecko’s data structures can be customized extensively for our purposes, whereas we don’t have the same control over the standard library. Our string classes, for instance, permit sharing structure between strings (whether via something like nsDependentString or reference-counted string buffers); that functionality isn’t currently supported in the standard library. While the default behavior on allocation failure in Gecko is to crash, our data structures provide interfaces for failing gracefully when allocations fail. Allocation failures in standard library data structures are reported via exceptions, which we don’t use. If you’re not using exceptions, allocation failures in those data structures simply crash, which isn’t acceptable in a number of places throughout Gecko.
Gecko data structures can assume things about the environment that the standard library can’t. We ship the same memory allocator on all our platforms, so our hashtables and our arrays can attempt to make their allocation behavior line up with what the memory allocator efficiently supports. It’s possible that the standard library implementations we’re using do things like this, but it’s not guaranteed by the standard.
Along similar lines as the first two, Gecko data structures provide better visibility for things like debug checks and memory reporting. Some standard libraries we support come with built-in debug modes, but not all of them, and not all debug modes are equally complete. Where possible, we should have consistent support for these sorts of things across all our platforms.
Custom data structures may provide better behavior than standard data structures by relaxing the specifications provided by the standard. The WebKit team had a great blog post on their new mutex implementation, which optimizes for cases that OS-provided mutexes aren’t optimized for, either because of compatibility constraints or because of outside specifications. Chandler Carruth has a CppCon talk where he mentions the non-ideal interfaces in many of the standard library data structures. We can do better with custom data structures.
Data structures in the standard library may provide inconsistent performance across platforms, or disagree on the finer points of the standard. Love them or hate them, Gecko’s data structures at least provide consistent behavior everywhere.

Most of these arguments are not new; if you look at the documentation for Facebook’s open-source Folly library, for instance, you’ll find a number of these arguments, if not expressed in quite the same way. Browsing through WebKit’s WTF library shows they have a number of the same things that we do in xpcom/ or mfbt/ as well, presumably for some of the same reasons.

All of this is not to say that our data structures are perfect: the APIs for our hashtables could use some improvements, our strings and nsTArray do a poor job of separating “data structure” from “algorithm”, nsDeque serves as an excellent excuse to go use the standard library instead, and XPCOM’s synchronization primitives should stop going through NSPR and use the underlying OS’s primitives directly (or simply be rewritten to use something like WebKit’s locking primitives, above). This is a non-exhaustive list; I have more ideas if people are interested.

Having a C++11 standard library on all platforms brings opportunities to remove dead polyfills; MFBT contains a number of these (Atomics.h, Tuple.h, TypeTraits.h, UniquePtr.h, etc.) But we shouldn’t flock to the standard library’s functionality just because it’s the standard. If the standard library’s functionality doesn’t fit our use cases, we should definitely write our own replacement(s) and use them widely.

Tags: c++, data structures, mfbt, mozilla, xpcom

This entry was posted on Tuesday, May 31st, 2016 at 11:48 am and is filed under Uncategorized. You can follow any comments to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

10 comments

Benoit Girard

2016/05/31 at 12:59 pm

Thanks, I knew some of the reasons behind it but not all. Very informative. It’s nice to have this written down to refer back to as this discussion comes up from time to time.
- Nathan Froyd
  
  2016/05/31 at 3:46 pm
  
  You’re welcome!
Chris Pearce

2016/05/31 at 7:47 pm

Additionally, our string classes handle different character sets correctly. Whereas as I discovered in https://bugzilla.mozilla.org/show_bug.cgi?id=1184333 (after several months trying to remote-debug the issue in various ways), std::string can mangle non-ASCII characters resulting in unexpected results. Whereas using Mozilla strings all these issues are well tested and work as expected.
jonasfj

2016/05/31 at 7:57 pm

Those are some solid reasons. But as you also point out rolling your own data structures can have down sides like quality issues.
My favorite example of classic computer science results being neglected is the bug where nsTArray didn’t have asymptotic complexity of O(1) for append/delete:
https://blog.mozilla.org/nnethercote/2014/11/04/please-grow-your-buffers-exponentially/
Another downside is the lack of documentation.

Both of these risk could be mitigated by developing custom data structures as a separate library with extensive unit tests and documentation.

When I played with gecko lack of documentation for data structures and strings was a huge issue to me. For example string splitting and searching was pretty much guess work, or fallback to C strings – which I then managed to leak 🙂
- Nathan Froyd
  
  2016/06/01 at 3:06 pm
  
  Yeah, documentation is a powerful argument for using the standard containers. I’m definitely open to suggestions as to what would help!
Mayank

2016/05/31 at 10:31 pm

Just looked at the Webkit blog on their new mutexes. If they are as efficient, shouldnt Gecko be using those?
In case Gecko’s hand rolled mutexes are better than the standard/Webkit ones, maybe somebody should write a blog like that, highlighting where we are better.
- Nathan Froyd
  
  2016/06/01 at 2:59 pm
  
  Somebody would have to port WebKit’s locking data structures over to Gecko, and then take measurements showing that they’re better. Thanks to WebKit, we are guessing that the second part is true, but we’d need to verify that. And of course somebody needs to do the first part. 🙂
Jim Porter

2016/06/01 at 2:51 pm

“If the standard library’s functionality doesn’t fit our use cases, we should definitely write our own replacement(s) and use them widely.”

If the standard library is lacking in some way, we should be expressing this to the ISO C++ WG so that they’re aware of the issues and can address them. This is especially relevant given the recent discussions in the WG about a “version 2” of the standard library. N4456 has a pretty good summary of issues with the standard library from the PoV of games and other low-latency systems.
- Nathan Froyd
  
  2016/06/01 at 3:04 pm
  
  Thanks for the pointer to N4456 and the note about a standard library version 2; I always wondered what happened to the EASTL feedback.
Aaron Klotz

2016/06/02 at 4:53 pm

I’ve got a 90% done WIP to move Mutex and Monitor over to Win32 primatives.

I’m also interested in being involved with porting / writing new userspace locks if that ever comes about.

Nathan's Blog writing code to help other people write code

why gecko data structures should be preferred to std:: ones

10 comments

Recent Posts

Recent Comments

Archives

Categories

Meta