A few thoughts from several days of staring at these charts; I’m going to focus on the tests that generate the most email. Because Talos generates so much email, developers are prone to ignore it. And that’s not what we want them to do: getting a Talos email should be cause for consternation or celebration, not callous indifference. By focusing on the tests that generate the most email at first, we’ll weed out the bulk of the redundancy in Talos emails.
Having said that:
- The Dromaeo tests are very noisy. I understand they’re important, and we ought to have some way of testing JS/DOM/CSS performance. But consider the FF17 Dromaeo DOM tests: we can see three different changesets causing 6-7% swings in these tests. And those changesets don’t touch code anywhere near the JS/DOM bits being tested! If we’re going to keep these tests, we need to find some way of making them more stable.
- Trace Malloc Allocs/MaxHeap/Leaks all send too much email for trivial changes: they’re often telling you about changes of tenths or even hundredths of a percent (to four significant digits!). I understand that many small changes eventually accumulate into big ones, but this seems a little excessive. There should be some sort of threshold, say a ~2% change either way, before warning emails get sent.
- The Number of Constructors…numbers have been identical across x86 and x86-64 Linux the last three release cycles and the numbers keep going up. In the abstract, sure, x86 and x86-64 can have different behavior, but in practice, we just don’t add static constructors dependent on the word size of the platform. We should cut this back to one platform at the very least, and consider setting a threshold here as well.
- The Tp5 No Network Row Major MozAfterPaint tests all generate a lot of email; it’s a bit of a tossup as to which one is going to stand out. Some of the numbers may be skewed due to DLBI’s cycles of landing and backouts, too. I will say that the more detailed tests, measuring Private Bytes, Main RSS, Content RSS, and %CPU, don’t identify regression candidates that our other tests don’t catch and are therefore not that useful.
- a11y Row Major MozAfterPaint has the same problem: it’s meant to identify issues in the a11y implementation, but more often than not, winds up complaining about regressions that other tests have already caught for us.
All that is to say we could cut down the amount of email significantly with a couple of simple changes:
- Set thresholds before email alerts are sent for the Trace Malloc tests;
- Pick x86-64 or x86 Linux for Number of Constructors, possibly set a threshold here too;
- Remove the specific measurements in the Tp5 test.
Other ideas:
- The above analysis was only for Mozilla-Inbound; there are of course statistics from other trees that are sent to dev-tree-management. Maybe it’s worth splitting dev-tree-management up? Must compute statistics on what trees generate the most mail to the list.
- The graphserver links sent in the emails are helpful; it would be even better if they featured multiple platforms. That way developers would have an easy(ier) way of assessing the usefulness of pursuing a given regression.
- It’d be even better if regression emails weren’t sent unless there were regressions on multiple platforms. This would be a little tricky.