31
May 16

why gecko data structures should be preferred to std:: ones

In light of the recent announcement that all of our Tier-1 platforms now have a C++11-supporting standard library, I received some questions about whether we should continue encouraging the use of Gecko-specific data structures. My answer was “yes”, and as I was writing the justification for said answer, I felt that the justification was worth broadcasting to a wider audience. Here are the reasons I came up with; feel free to agree or disagree in the comments.

  • Gecko’s data structures can be customized extensively for our purposes, whereas we don’t have the same control over the standard library.  Our string classes, for instance, permit sharing structure between strings (whether via something like nsDependentString or reference-counted string buffers); that functionality isn’t currently supported in the standard library.  While the default behavior on allocation failure in Gecko is to crash, our data structures provide interfaces for failing gracefully when allocations fail.  Allocation failures in standard library data structures are reported via exceptions, which we don’t use.  If you’re not using exceptions, allocation failures in those data structures simply crash, which isn’t acceptable in a number of places throughout Gecko.
  • Gecko data structures can assume things about the environment that the standard library can’t.  We ship the same memory allocator on all our platforms, so our hashtables and our arrays can attempt to make their allocation behavior line up with what the memory allocator efficiently supports.  It’s possible that the standard library implementations we’re using do things like this, but it’s not guaranteed by the standard.
  • Along similar lines as the first two, Gecko data structures provide better visibility for things like debug checks and memory reporting.  Some standard libraries we support come with built-in debug modes, but not all of them, and not all debug modes are equally complete. Where possible, we should have consistent support for these sorts of things across all our platforms.
  • Custom data structures may provide better behavior than standard data structures by relaxing the specifications provided by the standard.  The WebKit team had a great blog post on their new mutex implementation, which optimizes for cases that OS-provided mutexes aren’t optimized for, either because of compatibility constraints or because of outside specifications.  Chandler Carruth has a CppCon talk where he mentions the non-ideal interfaces in many of the standard library data structures.  We can do better with custom data structures.
  • Data structures in the standard library may provide inconsistent performance across platforms, or disagree on the finer points of the standard.  Love them or hate them, Gecko’s data structures at least provide consistent behavior everywhere.

Most of these arguments are not new; if you look at the documentation for Facebook’s open-source Folly library, for instance, you’ll find a number of these arguments, if not expressed in quite the same way.  Browsing through WebKit’s WTF library shows they have a number of the same things that we do in xpcom/ or mfbt/ as well, presumably for some of the same reasons.

All of this is not to say that our data structures are perfect: the APIs for our hashtables could use some improvements, our strings and nsTArray do a poor job of separating “data structure” from “algorithm”, nsDeque serves as an excellent excuse to go use the standard library instead, and XPCOM’s synchronization primitives should stop going through NSPR and use the underlying OS’s primitives directly (or simply be rewritten to use something like WebKit’s locking primitives, above).  This is a non-exhaustive list; I have more ideas if people are interested.

Having a C++11 standard library on all platforms brings opportunities to remove dead polyfills; MFBT contains a number of these (Atomics.h, Tuple.h, TypeTraits.h, UniquePtr.h, etc.)  But we shouldn’t flock to the standard library’s functionality just because it’s the standard.  If the standard library’s functionality doesn’t fit our use cases, we should definitely write our own replacement(s) and use them widely.


27
Jan 15

examples of poor API design, 1/N – pldhash functions

The other day in the #content IRC channel:

<bz> I have learned so many things about how to not define APIs in my work with Mozilla code ;)
<bz> (probably lots more to learn, though)

I, too, am still learning a lot about what makes a good API. Like a lot of other things, it’s easier to point out poor API design than to describe examples of good API design, and that’s what this blog post is about. In particular, the venerable XPCOM datastructure PLDHashTable has been undergoing a number of changes lately, all aimed at bringing it up to date. (The question of why we have our own implementations of things that exist in the C++ standard library is for a separate blog post.)

The whole effort started with noticing that PL_DHashTableOperate is not a well-structured API. It’s necessary to quote some more of the API surface to fully understand what’s going on here:

typedef enum PLDHashOperator {
    PL_DHASH_LOOKUP = 0,        /* lookup entry */
    PL_DHASH_ADD = 1,           /* add entry */
    PL_DHASH_REMOVE = 2,        /* remove entry, or enumerator says remove */
    PL_DHASH_NEXT = 0,          /* enumerator says continue */
    PL_DHASH_STOP = 1           /* enumerator says stop */
} PLDHashOperator;

typedef PLDHashOperator
(* PLDHashEnumerator)(PLDHashTable *table, PLDHashEntryHdr *hdr, uint32_t number,
                      void *arg);

uint32_t
PL_DHashTableEnumerate(PLDHashTable *table, PLDHashEnumerator etor, void *arg);

PLDHashEntryHdr*
PL_DHashTableOperate(PLDHashTable* table, const void* key, PLDHashOperator op);

(PL_DHashTableOperate no longer exists in the tree due to other cleanup bugs; the above is approximately what it looked like at the end of 2014.)

There are several problems with the above slice of the API:

  • PL_DHashTableOperate(table, key, PL_DHASH_ADD) is a long way to spell what should have been named PL_DHashTableAdd(table, key)
  • There’s another problem with the above: it’s making a runtime decision (based on the value of op) about what should have been a compile-time decision: this particular call will always and forever be an add operation. We shouldn’t have the (admittedly small) runtime overhead of dispatching on op. It’s worth noting that compiling with LTO and a quality inliner will remove that runtime overhead, but we might as well structure the code so non-LTO compiles benefit and the code at callsites reads better.
  • Given the above definitions, you can say PL_DHashTableOperate(table, key, PL_DHASH_STOP) and nothing will complain. The PL_DHASH_NEXT and PL_DHASH_STOP values are really only for a function of type PLDHashEnumerator to return, but nothing about the above definition enforces that in any way. Similarly, you can return PL_DHASH_LOOKUP from a PLDHashEnumerator function, which is nonsensical.
  • The requirement to always return a PLDHashEntryHdr* from PL_DHashTableOperate means doing a PL_DHASH_REMOVE has to return something; it happens to return nullptr always, but it really should return void. In a similar fashion, PL_DHASH_LOOKUP always returns a non-nullptr pointer (!); one has to check PL_DHASH_ENTRY_IS_{FREE,BUSY} on the returned value. The typical style for an API like this would be to return nullptr if an entry for the given key didn’t exist, and a non-nullptr pointer if such an entry did. The free-ness or busy-ness of a given entry should be a property entirely internal to the hashtable implementation (it’s possible that some scenarios could be slightly more efficient with direct access to the busy-ness of an entry).

We might infer corresponding properties of a good API from each of the above issues:

  • Entry points for the API produce readable code.
  • The API doesn’t enforce unnecessary overhead.
  • The API makes it impossible to talk about nonsensical things.
  • It is always reasonably clear what return values from API functions describe.

Fixing the first two bulleted issues, above, was the subject of bug 1118024, done by Michael Pruett. Once that was done, we really didn’t need PL_DHashTableOperate, and removing PL_DHashTableOperate and related code was done in bug 1121202 and bug 1124920 by Michael Pruett and Nicholas Nethercote, respectively. Fixing the unusual return convention of PL_DHashTableLookup is being done in bug 1124973 by Nicholas Nethercote. Maybe once all this gets done, we can move away from C-style PL_DHashTable* functions to C++ methods on PLDHashTable itself!

Next time we’ll talk about the actual contents of a PL_DHashTable and how improvements have been made there, too.


08
Sep 14

xpcom and move constructors

Benjamin Smedberg recently announced that he was handing over XPCOM module ownership duties to me.  XPCOM contains basic data structures used throughout the Mozilla codebase, so changes to its code can have wide-ranging effects.  I’m honored to have been given responsibility for a core piece of the Gecko platform.

One issue that’s come up recently and I’m sure will continue to come up is changing XPCOM data structures to support two new C++11 features, rvalue references and their killer app, move constructors.  If you aren’t familiar with C++11’s new rvalue references feature, I highly recommend C++ Rvalue References Explained.  Move constructors are already being put to good use elsewhere in the codebase, notably mozilla::UniquePtr, which can be used to replace XPCOM’s nsAutoPtr and nsAutoRef (bug 1055035).  And some XPCOM data structures have received the move constructor treatment, notably nsRefPtr (bug 980753) and nsTArray (bug 982212).

A recent discussion and the associated bug, however, decided that the core referenced-counted smart pointer class in XPCOM, nsCOMPtr, shouldn’t support move constructors.  While move constructors could have replaced the already_AddRefed usage associated with nsCOMPtr, such as:

already_AddRefed<nsIMyInterface>
NS_DoSomething(...)
{
  nsCOMPtr<nsIMyInterface> interface = ...;
  // do some initialization stuff
  return interface.forget();
}

with the slightly shorter:

nsCOMPtr<nsIMyInterface>
NS_DoSomething(...)
{
  nsCOMPtr<nsIMyInterface> interface = ...;
  // do some initialization stuff
  return interface;
}

There were two primary arguments against move constructor support.  The first argument was that the explicitness of having to call .forget() on an nsCOMPtr (along with the explicitness of the already_AddRefed type), rather than returning it, is valuable for the code author, the patch reviewer, and subsequent readers of the code.  When dealing with ownership issues in C++, it pays to be more explicit, rather than less.  The second argument was that due to the implicit conversion of nsCOMPtr<T> to a bare T* pointer (a common pattern in smart pointer classes), returning nsCOMPtr<T> from functions makes it potentially easy to write buggy code:

// What really happens in the below piece of code is something like:
//
// nsIMyInterface* p;
// {
//   nsCOMPtr<nsIMyInterface> tmp(NS_DoSomething(...));
//   p = tmp.get();
// }
//
// which is bad if NS_DoSomething is returning the only ref to the object.
// p now points to deleted memory, which is a security risk.
nsIMyInterface* p = NS_DoSomething(...);

(I should note that we can return nsCOMPtr<T> from functions today, and in most cases, thanks to compiler optimizations, it will be as efficient as returning already_AddRefed.  But Gecko culture is such that a function returning nsCOMPtr<T> would be quite unusual, and therefore unlikely to pass code review.)

The changes to add move constructors to nsRefPtr and nsTArray?  They were reviewed by me.  And the nixing of move constructors for nsCOMPtr?  That was also done by me (with a lot of feedback from other people).

I accept the charge of inconsistency.  However, I offer the following defense.  In the case of nsTArray, there are no ownership issues like there are with nsCOMPtr: you either own the array, or you don’t, so many of the issues raised about nsCOMPtr don’t apply in that case.

For the case of nsRefPtr, it is true that I didn’t seek out as much input from other people before approving the patch.  But the nsRefPtr patch was also done without the explicit goal of removing already_AddRefed from the code base, which made it both smaller in scope and more palatable.  Also, my hunch is that nsRefPtr is used somewhat less than nsCOMPtr (although this may be changing somewhat given other improvements in the codebase, like WebIDL), and so it provides an interesting testbed for whether move constructors and/or less explicit transfers of ownership are as much of a problem as argued above.