01
Feb 13

gcc version comparison, part 1/n: libxul sizes

Some questions on #perf this morning got me wondering about how different versions of GCC compared in terms of the size of libxul.  (I have a lot of versions of GCC lying about for a sekret performance comparison project, so merely comparing sizes was pretty straightforward.)

The GCC versions I used are listed in the table below.  The Mozilla sources were from mozilla-central r120478, compiled with –disable-debug –disable-debug-symbols –enable-optimize.  The target was x86-64 Linux, the build did not use PGO, and the system linker, GNU ld 2.20.1, was used. Here’s what the size command has to say about libxul in each case; all sizes are in bytes:

GCC version Text size Data size Bss size .text section size .eh_frame section size
4.4.7 39120354 3410456 1611420 22969414 4212924
4.5.4 44833935 3791400 1625996 23449960 7481052
4.6.3 42819600 3774272 1625996 22970408 6467652
4.7.2 42103108 3769576 1631244 22297992 6519596
4.8 HEAD 39638390 3415424 1617260 21300806 6209220

The terms “text”, “data”, and “bss” aren’t just the similarly-named sections in binaries. “text” encompasses all code and read-only sections, so things like string constants (!) would be included in this number. “data” is everything non-constant that’s stored on disk: tables of function pointers, tables of non-constant data, and so forth. “bss” is everything that’s initialized to zero and can therefore be allocated by the system at program run time. I’ve provided the .text section sizes as a more useful (IMHO) number for the purposes of this comparison.

If you look at just the size of the .text section–that is, actual compiled code–there’s not much variation between the compiler versions. 4.5 is the outlier here, with a ~2% increase over 4.4, but 4.6 is back to 4.4’s codesize and 4.8 is smaller still. How or if these differences in code size translate into a difference in performance will have to wait until another blog post. So what’s with size reporting such a huge jump in “text” size for 4.5 through 4.7?

The .eh_frame section sizes help explain this increase. The corresponding .eh_frame_hdr sections show similar percentage-wise increases, but the absolute increases are somewhat smaller there, so I opted to not show data for those. GCC 4.5 started emitting unwind data for function epilogues and does so unconditionally whenever unwind data is emitted. This data is not needed for normal operation, since you never have to unwind the stack from epilogues. However, for unwinding the stack from arbitrary points in the program (e.g. as a sampling profiler similar to oprofile or perf might do), such data is absolutely necessary. (You could fake it by parsing the instruction stream, but that gets very messy very quickly. Been there, done that, don’t want to do it again.) So, extra unwind data leads to bigger section sizes.  No surprises there.

Before getting to other sources of the “text” size increase, we need to examine another interesting statistic: the data size increase seen in 4.5-4.7.  Why should different versions of the compiler differ so much in an essentially static figure?  I generated easy-to-compare lists of symbols from each version:

readelf --syms -W build-mozilla-gcc-${version}/dist/bin/libxul.so \
  | gawk '$4 == "OBJECT" && $7 != "UND" && $5 != "WEAK" {printf("% 6d %s\n", $3, $8); }' \
  | sort -n -k 1 -r > gcc-${version}-all-syms.txt

and diff‘d the 4.4 and the 4.5 version. (Comparing to 4.5 or 4.6 provides roughly the same data, and starting with a base of 4.7 provides the same information in the reverse direction.) While there were a few instances of user-specified variables that the compiler didn’t eliminate, the bulk of the hunks of the diff looked like this:

@@ -721,6 +791,10 @@
   1080 vtable for nsPrintSettings
   1080 keywords
   1080 sip_tcp_conn_tab
+  1072 vtable for js::ion::LIRGeneratorX86Shared
+  1072 vtable for js::ion::MInstructionVisitor
+  1072 vtable for js::ion::LIRGeneratorShared
+  1072 vtable for js::ion::LIRGeneratorX64
   1072 vtable for js::ion::LIRGenerator
   1072 vtable for nsBox
   1064 vtable for nsBaseWidget

or:

@@ -842,6 +934,10 @@
    864 vtable for imgRequestProxy
    864 mozilla::dom::NodeBinding::sAttributes_specs
    864 g_sip_table
+   856 vtable for nsIDOMSVGTextPositioningElement
+   856 vtable for nsIDOMSVGFECompositeElement
+   856 vtable for nsIDOMSVGTSpanElement
+   856 vtable for nsIDOMSVGTextElement
    855 sLayerMaskVS
    848 vtable for nsJARURI
    848 vtable for nsXMLHttpRequestUpload

And if you add up all the sizes of the vtables we’re now retaining:

diff -u gcc-44-all-syms.txt gcc-45-all-syms.txt \
  | c++filt | egrep '^\+' \
  | grep vtable | awk '{ sum += $2 } END { print sum }'

you get a total of about 325K, which accounts for a good chunk of the 375K difference between GCC 4.4’s generated data and GCC 4.5’s generated data.

How did GCC 4.4 make the vtables go away? [UPDATE: There’s a simple explanation for what happened here.] I haven’t analyzed the code, but I can see two possibilities. The first is that the compiler devirtualizes all the function calls associated with those classes and can tell that instances of the classes never escape outside of the library. And if you don’t have virtual function calls, you don’t need a vtable. As a second possibility the compiler can see that instances of the associated classes are never created. This is probably what happened for all the nsIDOM* vtables in the example hunk above. So the vtables are never referenced and discarded at link time, or never generated for any compilation unit in the first place. Whether these suspicions are correct or there’s some other mechanism at work, the key point is that 4.5-4.7 lost the ability to do this in some (all?) cases and dramatically increased data sizes as a result.

Also, since 4.5-4.7 are generating spurious vtables, there’s a lot of unnecessary relocations associated with those tables: the values of the function pointers in the vtables can’t be known until the binary is loaded, so relocations are necessary. This increase in relocations can be partially seen in the “text” numbers in the table above (relocations are constant data…). Going from 4.4 to 4.5 added about 1MB of relocation data and 4.8 benefited by eliminating the need for those extra relocations.

Between the changes in .text section size, the extra relocations, and the extra .eh_frame information, we’ve accounted for a good chunk of the fluctuations seen in the “text” and “data” numbers between compiler versions.  There’s other nickel-and-dime stuff that accounts for the remainder of the fluctuations, but I’m not going to cover those bits here. This post is already long enough! Ideally, the next post will have some Talos performance comparisons.


22
Jan 13

analyzing linker max vsize

mozilla-inbound is currently approval-only due to issues with Windows PGO builds.  The short explanation is that we turn on aggressive code optimization for our Windows builds.  This aggressive code optimization causes the linker than comes with Visual Studio to run out of virtual memory.  The current situation is especially problematic because we can’t increase the amount of virtual memory the linker can access (unlike last time, where we “just” moved the builds to 64-bit machines).

We don’t really have a good handle on what causes these issues (other than the obvious “more code”), but at least we are tracking the linker’s vsize and we’ll soon have pretty pictures of the same.  We hadn’t expected to have to deal with this problem for several more months.  The graph below helps explain why we’re hitting this problem a little sooner than before.  The data for this graph was taken from the Windows nightly build logs.

Win32 Linker max vsize

Notice the massive spike in October, as well as the ~100MB worth of growth in early January.  While the data is not especially fine-grained (nightly builds can include tens of changesets, and we’d really like information on the vsize growth on a per-changeset basis), looking at the biggest increases over the last ten months might prove helpful.  There have been ~300 nightly builds since we started recording data; below is a list of the top 20 daily increases in linker max vsize.  The date in the table is the date the nightly build was done; the newly-included changeset range is linked to for your perusal.

Nightly build date vsize increase (MB)
2012-05-18 282.363281
2012-10-06 103.609375
2012-10-08 90.769531
2013-01-10 49.699219
2012-06-02 49.199219
2012-10-19 32.976562
2012-12-25 32.332031
2013-01-06 32.015625
2013-01-20 30.144531
2013-01-22 27.222656
2012-10-04 19.273438
2012-05-10 18.234375
2012-11-23 17.937500
2012-08-03 17.738281
2013-01-07 17.671875
2012-09-08 17.386719
2012-12-23 17.269531
2012-12-27 17.156250
2012-11-11 17.085938
2012-12-06 17.003906

Mike Hommey suggested that trying to divine the whys and hows of extra memory usage would be a fruitless endeavor. Looking at the above pushlogs, I am inclined to agree with him. There’s nothing in any of them that jumps out. I didn’t try clicking through to individual changesets to figure out what might have added large chunks of code, though.


14
Jan 13

64-bit multiplication pitfalls

I’ve seen several instances of code recently that look something like this:

void madd(int64_t *sum, int32_t x, int32_t y)
{
  *sum += x * y;
}

Or this:

void func(int64_t);
...
  int32_t x, y = ...;
  ...
  func(x * y);

Unfortunately, neither of these cases do what the programmer intended.  The intended result was to compute the full 64-bit product from multiplying two 32-bit numbers.  What gets computed instead is the lower 32-bits of the desired product, sign-extended to 64-bits, which is quite different! The assembly produced by x86-64 GCC at -O2 for the first example looks like:

    imull   %edx, %esi    # int32_t multmp = x * y
    movslq  %esi, %rsi    # int64_t exttmp = static_cast<int64_t>(multmp)
    addq    %rsi, (%rdi)  # *sum += exttmp
    ret

If the full 64-bit product is desired, one of the arguments needs to be cast to a 64-bit value first.

void madd(int64_t *sum, int32_t x, int32_t y)
{
  *sum += static_cast<int64_t>(x) * y;
}

(The standard-ese for this is that operands are automatically promoted based on the types of the operands, not on the type of the result.  Integers smaller than int are promoted to int, which is what you want most of the time. Of course, here we’re dealing with things that are already int-sized[*], so we have to explicitly ask for promotion.)

which produces the desired:

    movslq  %esi, %rsi    # int64_t xtmp = static_cast<int64_t>(x)
    movslq  %edx, %rdx    # int64_t ytmp = static_cast<int64_t>(y)
    imulq   %rdx, %rsi    # int64_t multmp = xtmp * ytmp
    addq    %rsi, (%rdi)  # *sum += multmp
    ret

The above examples are semi-obvious instances, but when dealing with types whose sizes are not specified, similar problems occur. Consider replacing int64_t with off_t and int32_t with size_t in the example above. While such code will mostly work (most files are well under 2GB or 4GB in size), off_t and size_t do not need to be the same size: try compiling sizeof off_t with -D_FILE_OFFSET_BITS=64 on your favorite 32-bit Linux sometime.

[*] Assuming we’re on a fairly standard 32-bit or 64-bit machine, of course.


01
Nov 12

running js tests faster

I burned the tree today.  To verify my proposed fix, I had to run make check, which I don’t normally do.  Excuse me, make -srj20 check, because that’s what my development machine can handle.  Which is all fine until I got to check-jit-tests in js/src.  Which requires running nearly 30 thousand tests.  (N tests x 10 different JS engine option combinations starts to add up rather quickly…)

In serial.  A load average of 1 has never looked so pitiful.

Surely there must be a better way.  While my machine churned through these tests, one at a time, I had a chance to take a look around, and found bug 686240.  Said bug is mostly about reducing overhead by cutting down the number of processes spawned, but it does give a nod to running the tests in parallel.  But I wanted to improve the way the test harness runs first; fixing the innards of the test harness on top of that would just be gravy.

I considered modifying jit_test.py to handle spawning multiple processes and managing them.  But that’s just silly; we already have a tool for doing that: it’s called make.  So let’s use it:

diff --git a/js/src/Makefile.in b/js/src/Makefile.in
index 31dc5fb..7264cc5 100644
--- a/js/src/Makefile.in
+++ b/js/src/Makefile.in
@@ -602,7 +602,7 @@ endif
 endif

 check-jit-test::
-	$(wildcard $(RUN_TEST_PROGRAM)) $(PYTHON) -u $(srcdir)/jit-test/jit_test.py \
+	+$(wildcard $(RUN_TEST_PROGRAM)) $(PYTHON) -u $(srcdir)/jit-test/jit_test.py \
 	        --no-slow --no-progress --tinderbox --tbpl $(JITTEST_VALGRIND_FLAG) \
 	        $(DIST)/bin/js$(BIN_SUFFIX)

diff --git a/js/src/jit-test/jit_test.py b/js/src/jit-test/jit_test.py
index 7690548..e7d323b 100755
--- a/js/src/jit-test/jit_test.py
+++ b/js/src/jit-test/jit_test.py
@@ -502,6 +502,33 @@ def main(argv):
     if not OPTIONS.run_slow:
         test_list = [ _ for _ in test_list if not _.slow ]

+    if 'JS_PARALLELIZE' in os.environ and len(test_list) > 1:
+        makefile_path = tmppath('jsmakefile')
+        with open(makefile_path, 'w') as f:
+            makefile_targets = []
+            for (i, test) in enumerate(test_list):
+                target = "test%d" % i
+                subcmd = [sys.executable, sys.argv[0]]
+                if OPTIONS.tinderbox:
+                    subcmd.append('--tinderbox')
+                if OPTIONS.tbpl:
+                    subcmd.append('--tbpl')
+                if OPTIONS.ion:
+                    subcmd.append('--ion')
+                if OPTIONS.hide_progress:
+                    subcmd.append('--no-progress')
+                subcmd.append(JS)
+                subcmd.append(os.path.basename(test.path))
+                makefile_targets.append((target, subcmd))
+            f.write(".PHONY: %s\n" % ' '.join([x for (x, y) in makefile_targets]))
+            f.write("check: %s\n" % ' '.join([x for (x, y) in makefile_targets]))
+            for (target, cmdline) in makefile_targets:
+                f.write("%s:\n\t@%s\n" % (target, subprocess.list2cmdline(cmdline)))
+        make_cmd = ['make', '-f', makefile_path, 'check']
+        env = os.environ.copy()
+        del env['JS_PARALLELIZE']
+        os.execvpe(make_cmd[0], make_cmd, env)
+
     # The full test list is ready. Now create copies for each JIT configuration.
     job_list = []
     if OPTIONS.tbpl:

What the above code is doing is writing out a makefile that handles executing all the tests found by the test harness in parallel. There’s one target for each test, and then the check target depends on all of them being run. The script then runs a copy of make that executes all the commands found in this new makefile. The modification to Makefile.in is so make knows that you’re going to be executing recursive makes with that command. Credit where credit is due: I cribbed this idea from GCC, which uses it to run bits of LTO compilation in parallel.

Anyway, how fast does it go? JS_PARALLELIZE=1 time -p make -srj20 check-jit-test says:

real 539.33
user 5492.89
sys 653.84

Those times are in seconds. That’s for a --disable-optimize --enable-debug build, by the way.  An --enable-optimize --enable-debug build looks like:

real 144.20
user 1389.44
sys 308.25

So about a 10x speedup either way. Not too shabby for 30 extra lines of code!

The above patch is obviously only a proof-of-concept:

  • We leave our temporary makefile lying around;
  • Propagation of options to the sub-make needs to be expanded and cleaned up;
  • Output from the sub-make may need to be handled sanely;
  • I have no idea whether this works properly with pymake;
  • …and probably a few other things I haven’t thought about.

06
Oct 12

looking at talos differently, part 3

A few thoughts from several days of staring at these charts; I’m going to focus on the tests that generate the most email.  Because Talos generates so much email, developers are prone to ignore it.  And that’s not what we want them to do: getting a Talos email should be cause for consternation or celebration, not callous indifference.  By focusing on the tests that generate the most email at first, we’ll weed out the bulk of the redundancy in Talos emails.

Having said that:

  • The Dromaeo tests are very noisy. I understand they’re important, and we ought to have some way of testing JS/DOM/CSS performance. But consider the FF17 Dromaeo DOM tests: we can see three different changesets causing 6-7% swings in these tests. And those changesets don’t touch code anywhere near the JS/DOM bits being tested! If we’re going to keep these tests, we need to find some way of making them more stable.
  • Trace Malloc Allocs/MaxHeap/Leaks all send too much email for trivial changes: they’re often telling you about changes of tenths or even hundredths of a percent (to four significant digits!). I understand that many small changes eventually accumulate into big ones, but this seems a little excessive. There should be some sort of threshold, say a ~2% change either way, before warning emails get sent.
  • The Number of Constructors…numbers have been identical across x86 and x86-64 Linux the last three release cycles and the numbers keep going up. In the abstract, sure, x86 and x86-64 can have different behavior, but in practice, we just don’t add static constructors dependent on the word size of the platform. We should cut this back to one platform at the very least, and consider setting a threshold here as well.
  • The Tp5 No Network Row Major MozAfterPaint tests all generate a lot of email; it’s a bit of a tossup as to which one is going to stand out. Some of the numbers may be skewed due to DLBI’s cycles of landing and backouts, too. I will say that the more detailed tests, measuring Private Bytes, Main RSS, Content RSS, and %CPU, don’t identify regression candidates that our other tests don’t catch and are therefore not that useful.
  • a11y Row Major MozAfterPaint has the same problem: it’s meant to identify issues in the a11y implementation, but more often than not, winds up complaining about regressions that other tests have already caught for us.

All that is to say we could cut down the amount of email significantly with a couple of simple changes:

  • Set thresholds before email alerts are sent for the Trace Malloc tests;
  • Pick x86-64 or x86 Linux for Number of Constructors, possibly set a threshold here too;
  • Remove the specific measurements in the Tp5 test.

Other ideas:

  • The above analysis was only for Mozilla-Inbound; there are of course statistics from other trees that are sent to dev-tree-management. Maybe it’s worth splitting dev-tree-management up? Must compute statistics on what trees generate the most mail to the list.
  • The graphserver links sent in the emails are helpful; it would be even better if they featured multiple platforms. That way developers would have an easy(ier) way of assessing the usefulness of pursuing a given regression.
  • It’d be even better if regression emails weren’t sent unless there were regressions on multiple platforms. This would be a little tricky.

05
Oct 12

looking at talos differently, part 2

Over the past couple of weeks, I’ve been working on refining the Talos summary script I blogged about earlier.  The fruits of the labor can be seen in summary pages for the last three release cycles:

(It’s worth pointing out that the coloring for the JS-related tests is wrong; I think those tests are “bigger is better” tests.)

One meta-point before diving into suggestions for various tests: emails don’t always get triggered for the correct changesets.  To see what I mean, take a look at the following examples:

All this suggests that there’s a bug in how we’re benchmarking our trees and generating our results; I haven’t investigated any further.


19
Sep 12

looking at talos differently

At a meeting a couple of weeks ago, I volunteered to look at the Talos data for the last release cycle to see how stable the tests were.  I wasn’t going to do extensive statistical analysis on the numbers, since the Talos servers already do that when deciding whether a change is significant or not.  What I initially came up with was:

  1. Detecting important changes: do multiple platforms change consistently?  If the numbers for platform X go up by 5% while the numbers for platform Y go down by 3%, maybe the test isn’t very stable.
  2. Do the numbers wobble around a lot?  If the numbers for platform X were up one day and down the next, and this happened several times, then the test isn’t particularly stable on that platform.  However, if the test numbers keep going one direction, then our confidence in that test increases somewhat.

(I realize that the specifics of the changesets involved may invalidate the generalizations made above.  But as a good first cut, e.g. for somebody who’s just trying to see if there are significant Talos regressions, this seems like a good start.)

I started out trying to catalog these numbers manually and quickly decided that was for the birds.  I found myself mixing up numbers for different tests, different trees, and different platforms (to say nothing of PGO versus non-PGO builds on inbound!), due mostly to the large number of emails to look at and the similarity between the subjects of the emails.  A better approach was called for.

So I wrote talos-summarize for parsing dev.tree-management archives for some date range and generating useful visualizations from them.  Wanting some insight into the questions above, I chose a table whose rows represent ranges of changesets and whose columns contain data for how a particular platform’s numbers change.  This makes it fairly easy to address point 1 above; point 2 is somewhat addressed by eyeballing the table and looking at the distribution of +/-s, but it should also be addressed by computing cumulative changes from the beginning of the chosen range.  That last point hasn’t been implemented yet.

Showing pictures would be more useful; I generated visualizations for all the Talos tests that ran on inbound during the start of the Firefox 15 release cycle.  The correspondence between the filenames and the tests themselves should be pretty obvious.

I initially started looking at the Ts, MED Dirty Profile table.  This table made me happy, because you can see the consistent jump across all platforms as a result of bug 769960 landing and the corresponding decrease when the startup regression from that bug landed, bug 778855.  That test also looks like it has fairly stable numbers; the numbers don’t jump around too much.  The table also suggests that we significantly regressed startup time on multiple platforms with this pushlog; this regression didn’t get addressed during the initial cycle of 15, even though it was about as severe as the regression from bug 769960 (and touched our more critical Windows platforms, too).

For the most part–and this is just eyeballing–the numbers from all tests don’t jump around or provide contradictory information.  The one exception would be Tp5 No Network Row Major MozAfterPaint (what a mouthful!), where the row in the middle looks quite odd.  Unfortunately, stable numbers across all the tests also look like constantly increasing numbers across all the tests, which is not great.

I’m still looking at the numbers and tweaking the script; it doesn’t get everything right (I know it does strange things when summarizing the Number of Constructors test, for instance).  What do you think could be improved about the visualization or what extra information should the summary try to present?

 


03
Sep 12

running talos locally

We had a meeting about not regressing Talos last week.  One of the concerns brought up at the meeting was that reproducing Talos regressions was hard: dissimilar machines, not knowing where Talos is, inability to run Talos locally, etc. etc.

Joel Maher noted that Talos has its very own wiki page, complete with checkout instructions and information about running the tests.  In the interest of looking like a functioning member of the performance team (“benchmarks?  what are those?”), I decided to verify those instructions.  The monospace commands below were run on a Linux/x86-64 box.

  1. Checkout the repository: hg clone http://hg.mozilla.com/build/talos && cd talos
  2. Run the INSTALL.py script: ./INSTALL.py
  3. Run activation script for virtualenv: . bin/activate  This step will modify your shell prompt to include a (talos) string at the beginning; it also modifies PATH to include the location of talos scripts.
  4. Create a test configuration: PerfConfigurator --develop --executablePath /path/to/firefox --activeTests ts --results_url file:///${HOME}/ts.txt --output ts_desktop.yaml Note that providing a path to a firefox you just compiled (~/src/build-mc/dist/bin/firefox in my case) works just fine.
  5. Run tests with your configuration: talos -n ts_desktop.yaml Unfortunately, you have to shutdown any Firefox instances you have running or the test harness will complain. I filed bug 787980 for fixing this.
  6. Sit back and relax while you watch the test harness print progress messages on your terminal.  You may see messages about logs being posted to http://datazilla.mozilla.org/talos; the --develop flag ought to prevent that from happening,   It’s not clear to me whether the bug is in the wrong messages or that --develop doesn’t actually inhibit uploading.

And that’s it!  You’ve now run the Talos startup tests.  I haven’t tried running the pageload tests (yet!) because doing them properly requires downloading quite a number of webpages and arranging things just so.


22
Aug 12

humane practices for review requests

I dislike seeing “review not granted” or “review cancelled” in my bugmail.

Even if the reviewer provides helpful comments and points out things that you missed, those few words are the first thing you see about your patch.  Part of me understands that these headlines are informative messages, that they are not judgements on me as a person, merely some sort of judgement on the patch that I have written.  But the language doesn’t encourage that perspective.  Permissions are not granted, entrance is not granted, your favorite show on television is cancelled, and so forth.  These are not heartening words: they are words that shut you out, that indicate your contribution was not desirable.

I took a mathematics class in college where individual problems on homework sets were scored out of a maximum of three points.  Half points were virtually unheard of.  In the usual grade/percentage terms, the scores for a problem looked roughly like this: A, D, F, F.  Not encouraging.  One day, the professor announced that there had been some grumbling from students about this scale.  He then described a colleague’s grading scale for problems in a class of similar difficulty: check-plus, check, check-minus, and zero.  “Now it is trivial to show that these two scales are isomorphic to one another,” he said, smiling.  “But I don’t hear any [grumbling] from Dr. L’s students about his grading scale!”  The message was clear: the scale isn’t that bad, just get over it.[1][2]

However, it matters how you say things, not just what you say.  Some organizations use a plus/delta feedback framework instead of a plus/minus framework.  That is, instead of dividing things into good and bad, you divide things into good and “changes needed”.  Phrasing things as “deltas” nudges people to provide actionable items; it’s the difference between saying “it sucked that the meeting ran half an hour over” versus “the meeting should stay within the time allotted”.

With all this in mind, if I get r?‘d on a patch, I will provide one of three responses (within 24 hours, natch):

  • r+ the patch.  Minor changes may be requested; whether I ask to see the patch again depends on how confident I am in the author and/or the triviality of the changes.  For instance, syntax changes and so forth don’t usually require a re-review.  Documentation changes often can.
  • f+ the patch.  As I understand it, this shows up in bugmail as “feedback granted”, not “review cancelled” (props to the Bugzilla team for doing this!).  Doing this lets the patch author know that they are making progress and that their contribution is appreciated (you took the time to provide feedback), but there are still some changes to be made.
    If at all possible, once the author makes the changes you have requested, r+ the patch.  Your feedback can be thought of as a contract with the patch author: you do these things and the patch is good enough.  Don’t change the terms of the contract.  File a followup bug, if need be.
  • Cancel the review.  I don’t like doing this due to reasons given above, but sometimes you have to, e.g. the wrong patch is provided.  The following scenario is also, I think, reasonable grounds for cancellation, provided you make it clear:
    • A patch depends on prior patches in the same bug;
    • You have reviewed those prior patches and requested significant changes;
    • Those changes would require equally significant changes to the patch in question, thereby rendering any review that you would do moot.

[1] This class was Real Analysis.  Real Analysis is often graded on a very…generous curve, owing to the material and that the course is typically students’ first introduction to rigorous mathematical proof.  So your mental model of what a “good” grade is requires some adjustment anyway.  But still.

[2] I was taking Dr. L’s course simultaneously with Reals.  I can attest that his grading scale did not bother me; indeed, I don’t think I recognized the isomorphism until it was pointed out.


17
Aug 12

moving away from x-macros

In any large (and even many small) codebases, there comes a point where some knowledge about an entity X is needed at various points throughout the codebase, but the knowledge required at each point is ever-so-slightly different.  You could provide comments at each point, stating the list of other locations that need to be updated appropriately when changes occur at that point, but that gets tedious quickly; it’s also error-prone, since it’s easy to forget the updating.  One technique for handling this in Mozilla’s C++ codebase is known as X-macros: a header file filled with calls to some macro X:

X(name1, data1, ...)
X(name2, data2, ...)
X(name3, data3, ...)
...

This header is then included at various points.  At each point, X is defined to extract whatever data is necessary:

enum ID {
#define X(name, data, size) name
#include "Xmacro_header.h"
#undef X
};
...
static const uint32_t sizes[] = {
#define X(name, data, size) size,
#include "Xmacro_header.h"
#undef X
};
...
method(enum ID, ...)
{
  uint32_t size = sizes[ID];
  ...
}

A prime example of this is the header content/events/public/nsEventNameList.h, which gets included at various points for generating method declarations, method definitions, and name tables; we use a similar header toolkit/components/telemetry/TelemetryHistograms.h for defining Telemetry histograms and the associated IDs those histograms are identified with.

But we are moving away from this technique in Telemetry: we’re soon going to define our histograms using JSON and generate the necessary information from that JSON with Python scripts, rather than relying on the C preprocessor.

The main motivation for doing this is validation: we don’t have anything on the server-side of Telemetry which defines schemas for Telemetry data.  The client knows what are the bucket ranges for individual histograms, for instance, but the server doesn’t.  So the server has to accept all kinds of bogus data, whereas it could reject that data if some sort of schema for the data was provided.  And generating that information is (relatively) more easily done from a JSON definition than from a C preprocessor definition.

Some forms of client-side validation are eased by this process as well.  For example, there’s a quirk in how “linear” histograms are defined that makes it easy to lose data when trying to capture data than runs from 1 to N.  There are ways around this, but they don’t always get used properly.  With the histograms defined in JSON, we can make this sort of “enumerated” histogram a first class citizen and error on misdefinitions of “linear” histograms for capturing enumerated data.

Other future enhancements are also easier to do when you’re not limited by the C preprocessor, like eliminating relocations (and generally making the histogram information take up less space in the binary), automating field trials, providing labels for individual buckets, and so forth.  There are some downsides, notably that defining related histograms can’t be done with a little #define trickery, but the benefits more than make up for it.