22
Aug 12

humane practices for review requests

I dislike seeing “review not granted” or “review cancelled” in my bugmail.

Even if the reviewer provides helpful comments and points out things that you missed, those few words are the first thing you see about your patch.  Part of me understands that these headlines are informative messages, that they are not judgements on me as a person, merely some sort of judgement on the patch that I have written.  But the language doesn’t encourage that perspective.  Permissions are not granted, entrance is not granted, your favorite show on television is cancelled, and so forth.  These are not heartening words: they are words that shut you out, that indicate your contribution was not desirable.

I took a mathematics class in college where individual problems on homework sets were scored out of a maximum of three points.  Half points were virtually unheard of.  In the usual grade/percentage terms, the scores for a problem looked roughly like this: A, D, F, F.  Not encouraging.  One day, the professor announced that there had been some grumbling from students about this scale.  He then described a colleague’s grading scale for problems in a class of similar difficulty: check-plus, check, check-minus, and zero.  “Now it is trivial to show that these two scales are isomorphic to one another,” he said, smiling.  “But I don’t hear any [grumbling] from Dr. L’s students about his grading scale!”  The message was clear: the scale isn’t that bad, just get over it.[1][2]

However, it matters how you say things, not just what you say.  Some organizations use a plus/delta feedback framework instead of a plus/minus framework.  That is, instead of dividing things into good and bad, you divide things into good and “changes needed”.  Phrasing things as “deltas” nudges people to provide actionable items; it’s the difference between saying “it sucked that the meeting ran half an hour over” versus “the meeting should stay within the time allotted”.

With all this in mind, if I get r?‘d on a patch, I will provide one of three responses (within 24 hours, natch):

  • r+ the patch.  Minor changes may be requested; whether I ask to see the patch again depends on how confident I am in the author and/or the triviality of the changes.  For instance, syntax changes and so forth don’t usually require a re-review.  Documentation changes often can.
  • f+ the patch.  As I understand it, this shows up in bugmail as “feedback granted”, not “review cancelled” (props to the Bugzilla team for doing this!).  Doing this lets the patch author know that they are making progress and that their contribution is appreciated (you took the time to provide feedback), but there are still some changes to be made.
    If at all possible, once the author makes the changes you have requested, r+ the patch.  Your feedback can be thought of as a contract with the patch author: you do these things and the patch is good enough.  Don’t change the terms of the contract.  File a followup bug, if need be.
  • Cancel the review.  I don’t like doing this due to reasons given above, but sometimes you have to, e.g. the wrong patch is provided.  The following scenario is also, I think, reasonable grounds for cancellation, provided you make it clear:
    • A patch depends on prior patches in the same bug;
    • You have reviewed those prior patches and requested significant changes;
    • Those changes would require equally significant changes to the patch in question, thereby rendering any review that you would do moot.

[1] This class was Real Analysis.  Real Analysis is often graded on a very…generous curve, owing to the material and that the course is typically students’ first introduction to rigorous mathematical proof.  So your mental model of what a “good” grade is requires some adjustment anyway.  But still.

[2] I was taking Dr. L’s course simultaneously with Reals.  I can attest that his grading scale did not bother me; indeed, I don’t think I recognized the isomorphism until it was pointed out.


17
Aug 12

moving away from x-macros

In any large (and even many small) codebases, there comes a point where some knowledge about an entity X is needed at various points throughout the codebase, but the knowledge required at each point is ever-so-slightly different.  You could provide comments at each point, stating the list of other locations that need to be updated appropriately when changes occur at that point, but that gets tedious quickly; it’s also error-prone, since it’s easy to forget the updating.  One technique for handling this in Mozilla’s C++ codebase is known as X-macros: a header file filled with calls to some macro X:

X(name1, data1, ...)
X(name2, data2, ...)
X(name3, data3, ...)
...

This header is then included at various points.  At each point, X is defined to extract whatever data is necessary:

enum ID {
#define X(name, data, size) name
#include "Xmacro_header.h"
#undef X
};
...
static const uint32_t sizes[] = {
#define X(name, data, size) size,
#include "Xmacro_header.h"
#undef X
};
...
method(enum ID, ...)
{
  uint32_t size = sizes[ID];
  ...
}

A prime example of this is the header content/events/public/nsEventNameList.h, which gets included at various points for generating method declarations, method definitions, and name tables; we use a similar header toolkit/components/telemetry/TelemetryHistograms.h for defining Telemetry histograms and the associated IDs those histograms are identified with.

But we are moving away from this technique in Telemetry: we’re soon going to define our histograms using JSON and generate the necessary information from that JSON with Python scripts, rather than relying on the C preprocessor.

The main motivation for doing this is validation: we don’t have anything on the server-side of Telemetry which defines schemas for Telemetry data.  The client knows what are the bucket ranges for individual histograms, for instance, but the server doesn’t.  So the server has to accept all kinds of bogus data, whereas it could reject that data if some sort of schema for the data was provided.  And generating that information is (relatively) more easily done from a JSON definition than from a C preprocessor definition.

Some forms of client-side validation are eased by this process as well.  For example, there’s a quirk in how “linear” histograms are defined that makes it easy to lose data when trying to capture data than runs from 1 to N.  There are ways around this, but they don’t always get used properly.  With the histograms defined in JSON, we can make this sort of “enumerated” histogram a first class citizen and error on misdefinitions of “linear” histograms for capturing enumerated data.

Other future enhancements are also easier to do when you’re not limited by the C preprocessor, like eliminating relocations (and generally making the histogram information take up less space in the binary), automating field trials, providing labels for individual buckets, and so forth.  There are some downsides, notably that defining related histograms can’t be done with a little #define trickery, but the benefits more than make up for it.