moving away from x-macros

In any large (and even many small) codebases, there comes a point where some knowledge about an entity X is needed at various points throughout the codebase, but the knowledge required at each point is ever-so-slightly different.  You could provide comments at each point, stating the list of other locations that need to be updated appropriately when changes occur at that point, but that gets tedious quickly; it’s also error-prone, since it’s easy to forget the updating.  One technique for handling this in Mozilla’s C++ codebase is known as X-macros: a header file filled with calls to some macro X:

X(name1, data1, ...)
X(name2, data2, ...)
X(name3, data3, ...)
...

This header is then included at various points.  At each point, X is defined to extract whatever data is necessary:

enum ID {
#define X(name, data, size) name
#include "Xmacro_header.h"
#undef X
};
...
static const uint32_t sizes[] = {
#define X(name, data, size) size,
#include "Xmacro_header.h"
#undef X
};
...
method(enum ID, ...)
{
  uint32_t size = sizes[ID];
  ...
}

A prime example of this is the header content/events/public/nsEventNameList.h, which gets included at various points for generating method declarations, method definitions, and name tables; we use a similar header toolkit/components/telemetry/TelemetryHistograms.h for defining Telemetry histograms and the associated IDs those histograms are identified with.

But we are moving away from this technique in Telemetry: we’re soon going to define our histograms using JSON and generate the necessary information from that JSON with Python scripts, rather than relying on the C preprocessor.

The main motivation for doing this is validation: we don’t have anything on the server-side of Telemetry which defines schemas for Telemetry data.  The client knows what are the bucket ranges for individual histograms, for instance, but the server doesn’t.  So the server has to accept all kinds of bogus data, whereas it could reject that data if some sort of schema for the data was provided.  And generating that information is (relatively) more easily done from a JSON definition than from a C preprocessor definition.

Some forms of client-side validation are eased by this process as well.  For example, there’s a quirk in how “linear” histograms are defined that makes it easy to lose data when trying to capture data than runs from 1 to N.  There are ways around this, but they don’t always get used properly.  With the histograms defined in JSON, we can make this sort of “enumerated” histogram a first class citizen and error on misdefinitions of “linear” histograms for capturing enumerated data.

Other future enhancements are also easier to do when you’re not limited by the C preprocessor, like eliminating relocations (and generally making the histogram information take up less space in the binary), automating field trials, providing labels for individual buckets, and so forth.  There are some downsides, notably that defining related histograms can’t be done with a little #define trickery, but the benefits more than make up for it.

Comments are closed.