Over the past couple of weeks, I’ve been working on refining the Talos summary script I blogged about earlier. The fruits of the labor can be seen in summary pages for the last three release cycles:
(It’s worth pointing out that the coloring for the JS-related tests is wrong; I think those tests are “bigger is better” tests.)
One meta-point before diving into suggestions for various tests: emails don’t always get triggered for the correct changesets. To see what I mean, take a look at the following examples:
- FF17 Ts, MED Dirty Profile: We were quite fortunate during the FF17 cycle; our Talos emails identified a single changeset as causing significant regressions in several areas, notably startup. But if you look at the above linked chart, you’ll see that on x86-64 Linux, the regression is linked to changesets occuring after the regressing changeset. I’m not sure how this happens, but it’s clearly a problem for identifying regressions.
- FF16 SVG, Row Opacity Major: The (first?) DLBI backout shows a significant improvement on a good number of platforms, but Win XP-PGO’s improvement is attributed to changesets after the backout that have nothing to do with SVG.
- FF18 Trace Malloc Allocs: This changeset shows a signficant improvement in the number of allocations on Linux. Problem is, that changeset touched non-Linux related code.
- FF16 Number of Constructors: Mike Hommey did heroic work to significantly reduce the number of static constructors in the tree. Problem is that Bobby Holley’s bugfixing push got all the credit.
- FF18 DHTML Row Major MozAfterPaint: XP shows an improvement that’s almost certainly related to DLBI landing, except that the improvement is attributed to the changesets before the landing, which is bizarre.
All this suggests that there’s a bug in how we’re benchmarking our trees and generating our results; I haven’t investigated any further.
The constructors one is particularly weird given that it is a quantity that can be computed precisely, compared to performance which can have some variance.
Yeah, I do not understand how the “previous” score for constructors is calculated. Would have to go and look at the code. It certainly doesn’t encourage people to look at it when you tell them their changeset added .68 of a constructor.
Watch out, you’re at risk of making Talos actually useful!
So I’ll ask questions and make a bunch of requests to try to slow you down:
1. Is the changeset range a rollup, or the finest available granularity? (As in, does each range correspond to a interval between talos runs?)
2. I’d like to be able to look at the table to find an interesting platform, then click on the platform heading to show a graph of just that platform over time. Maybe that points to the graph server with appropriate parameters? (I haven’t looked at the graph server in ages, ever since I gave up on interpreting whatever the heck its mass of lines was trying to show me.)
3. I want to be able to feed in a changeset hash and have it highlight the row containing it on all graphs. (“Did changeset X break/improve things?”)
4. I’d like push datetime ranges for the rev ranges.
5. DD/MM/YYYY sucks almost as much as MM/DD/YYYY. Pretty please use YYYY/MM/DD?
6. I sort of want to be able to star the big jumps (comment on them with any known explanation.) But that raises all kinds of issues
7. Is the final row a percent? If so, add a percent sign so I don’t have to wonder.
8. White gaps mean talos was not run for anything in that changeset range? Then what does a multi-row cell just after a white gap mean?
9. Could you add a changeset count to the rows? Just to have a feel for whether it was a merge or individual change. I suppose it would be more direct to label merges vs regular commits (vs backouts?)
Glad to see the visualizer is getting some comments! I should write a post explaining things in a bit more detail; the graphserver ought to show all this stuff too…
1: Each colored block in a table corresponds to an email sent to dev-tree-management. Whitespace is assumed to mean that there was no significant change for that platform over whatever range of changesets gets covered, given the lack of emails to dev-tree-management.
2: Correlating this with the graphserver would be useful. Even just letting the individual changes link back to graphserver output.
3,4,5,7,9: Yeah, the UI could use some work/more information.
6: I am not that competent of an Ajax-y programmer. (Yet?)
8: Yes to the first part; I don’t understand the second question.
The ranges associated to the number of constructors are almost always wrong, which I generically filed as bug 721387.
[…] few thoughts from several days of staring at these charts; I’m going to focus on the tests that generate the most email. Because Talos generates so […]