Over the past couple of weeks, I’ve been working on refining the Talos summary script I blogged about earlier. The fruits of the labor can be seen in summary pages for the last three release cycles:
(It’s worth pointing out that the coloring for the JS-related tests is wrong; I think those tests are “bigger is better” tests.)
One meta-point before diving into suggestions for various tests: emails don’t always get triggered for the correct changesets. To see what I mean, take a look at the following examples:
- FF17 Ts, MED Dirty Profile: We were quite fortunate during the FF17 cycle; our Talos emails identified a single changeset as causing significant regressions in several areas, notably startup. But if you look at the above linked chart, you’ll see that on x86-64 Linux, the regression is linked to changesets occuring after the regressing changeset. I’m not sure how this happens, but it’s clearly a problem for identifying regressions.
- FF16 SVG, Row Opacity Major: The (first?) DLBI backout shows a significant improvement on a good number of platforms, but Win XP-PGO’s improvement is attributed to changesets after the backout that have nothing to do with SVG.
- FF18 Trace Malloc Allocs: This changeset shows a signficant improvement in the number of allocations on Linux. Problem is, that changeset touched non-Linux related code.
- FF16 Number of Constructors: Mike Hommey did heroic work to significantly reduce the number of static constructors in the tree. Problem is that Bobby Holley’s bugfixing push got all the credit.
- FF18 DHTML Row Major MozAfterPaint: XP shows an improvement that’s almost certainly related to DLBI landing, except that the improvement is attributed to the changesets before the landing, which is bizarre.
All this suggests that there’s a bug in how we’re benchmarking our trees and generating our results; I haven’t investigated any further.