Main menu:

Site search



A few notes on string APIs

1. For my own comprehension, I used Dehydra GCC + Graphviz to print a class hierarchy diagram for the Mozilla string classes (i.e., everything defined in xpcom/string). The results are here.

I was a little surprised that nsSubstring_base is the root, because in nsTAString.h, the root is nsTAString_CharT. Turns out that is #defined to nsSubtring_base in string-template-def-unichar.h. That particular bit of confusion should go away in Mozilla 2.0 because it was apparently created for a legacy API which is already disabled.

2. All my staring at Mozilla string code has created in me the crazy idea of replacing the current string classes with templates as part of Mozilla 2.0 codebase modernization. Here’s a proposal for your consideration and comments:

As shown in the diagram, there are two parallel string class hierarchies: the nsAString versions for (16-bit) PRUnichar strings, and nsACString versions for (8-bit) char strings. This is the kind of thing a C++ programmer would expect to be done in templates. Current trunk does something completely different, for good reasons it turns out, but I think C++ and it’s compilers are now good enough to replace it.

The History. The something different works like this: There is a single implementation of the hierarchy in nsTSubstring{.h,.cpp}. (T is for “template”, or “template-like-thing”, I think.) Then, nsSubstring{.h,.cpp} have patterns like this:

// declare nsSubstring
#include "string-template-def-unichar.h"
#include "nsTSubstring.h"
#include "string-template-undef.h"

// declare nsCSubstring
#include "string-template-def-char.h"
#include "nsTSubstring.h"
#include "string-template-undef.h"

string-template-def-unichar.h #defines the “type names” (such as charT) in nsTSubstring.h to the PRUnichar/nsAString versions, while string-template-def-char.h #defines the same names to the char/nsACString versions. This renaming provides the effect of templates.

The reason for doing it this way instead of using templates, I’m told, is that string-template-def*.h also #defines either CharT_is_PRUnichar or CharT_is_char, and then nsTSubstring selects code accordingly using #ifdef. For example, nsTSubstring_CharT::AssignASCII (usually known as nsAString::AssignASCII) has completely different implementations. And that’s not the kind of thing you get with the commonly-known template container pattern, which just substitutes different types in the same implementation.

A Brave New World. But the C++ people have come up with a new way to do this: C++ traits. The concept seems to be very general, but for Mozilla strings, I can explain it this way: Traits allow a template class C<T> to select variant code and data for different types T. And it’s done by creating a traits template class Traits<T> with partial specialization to create the variants. The article linked above explains the details pretty well, so I’m not going to duplicate the author’s work.

To test this out, I wrote a simple template string class with the key feature of the Mozilla strings, namely an AppendASCII method that is just completely different for the variants. And it works perfectly on MacOS 10.5 g++ 4.0.1. It even optimizes nicely: the compiler was able to inline everything, so I’m pretty sure there is no performance cost for the method delegation.

I’m sure a template guru could improve my code. And I know there are other options. One interesting idea is for the traits class to have a boolean-valued (probably implemented as an enum) CharT_is_char. The code in String<CharT> could then use if statements to branch on that variable. In each instantiation of the template, the branch is constant, and any decent compiler, including all the ones we use, should inline it. The resulting code would have the same form as the current version, without all the cpp aftertaste.

The only potential gotcha I see is whether all of our compilers accept traits code and the partial specialization it requires. I’d bet they do; this stuff has been around for a few years at this point.


Comment from Brendan Eich
Time: February 28, 2008, 11:06 pm

The approach you prototyped good to me — thanks for blogging about this, it’s good stuff.


Comment from Robert O’Callahan
Time: February 29, 2008, 3:24 am

We could do that, but I’d rather simplify things even more by just using UTF8 for everything :-)

Comment from HÃ¥kan W
Time: March 2, 2008, 4:13 pm

Are we gonna move towards a string API that has its backend in the STL, or do we have good reason for reimplementing (?) most of this stuff, even in the Mozilla 2 world?