looking at talos differently, part 2

Over the past couple of weeks, I’ve been working on refining the Talos summary script I blogged about earlier.  The fruits of the labor can be seen in summary pages for the last three release cycles:

(It’s worth pointing out that the coloring for the JS-related tests is wrong; I think those tests are “bigger is better” tests.)

One meta-point before diving into suggestions for various tests: emails don’t always get triggered for the correct changesets.  To see what I mean, take a look at the following examples:

All this suggests that there’s a bug in how we’re benchmarking our trees and generating our results; I haven’t investigated any further.

6 comments

  1. The constructors one is particularly weird given that it is a quantity that can be computed precisely, compared to performance which can have some variance.

    • Nathan Froyd

      Yeah, I do not understand how the “previous” score for constructors is calculated. Would have to go and look at the code. It certainly doesn’t encourage people to look at it when you tell them their changeset added .68 of a constructor.

  2. Watch out, you’re at risk of making Talos actually useful!

    So I’ll ask questions and make a bunch of requests to try to slow you down:

    1. Is the changeset range a rollup, or the finest available granularity? (As in, does each range correspond to a interval between talos runs?)

    2. I’d like to be able to look at the table to find an interesting platform, then click on the platform heading to show a graph of just that platform over time. Maybe that points to the graph server with appropriate parameters? (I haven’t looked at the graph server in ages, ever since I gave up on interpreting whatever the heck its mass of lines was trying to show me.)

    3. I want to be able to feed in a changeset hash and have it highlight the row containing it on all graphs. (“Did changeset X break/improve things?”)

    4. I’d like push datetime ranges for the rev ranges.

    5. DD/MM/YYYY sucks almost as much as MM/DD/YYYY. Pretty please use YYYY/MM/DD?

    6. I sort of want to be able to star the big jumps (comment on them with any known explanation.) But that raises all kinds of issues

    7. Is the final row a percent? If so, add a percent sign so I don’t have to wonder.

    8. White gaps mean talos was not run for anything in that changeset range? Then what does a multi-row cell just after a white gap mean?

    9. Could you add a changeset count to the rows? Just to have a feel for whether it was a merge or individual change. I suppose it would be more direct to label merges vs regular commits (vs backouts?)

    • Nathan Froyd

      Glad to see the visualizer is getting some comments! I should write a post explaining things in a bit more detail; the graphserver ought to show all this stuff too…

      1: Each colored block in a table corresponds to an email sent to dev-tree-management. Whitespace is assumed to mean that there was no significant change for that platform over whatever range of changesets gets covered, given the lack of emails to dev-tree-management.

      2: Correlating this with the graphserver would be useful. Even just letting the individual changes link back to graphserver output.

      3,4,5,7,9: Yeah, the UI could use some work/more information.

      6: I am not that competent of an Ajax-y programmer. (Yet?)

      8: Yes to the first part; I don’t understand the second question.

  3. The ranges associated to the number of constructors are almost always wrong, which I generically filed as bug 721387.

  4. [...] few thoughts from several days of staring at these charts; I’m going to focus on the tests that generate the most email.  Because Talos generates so [...]