{"id":485,"date":"2023-12-13T02:01:21","date_gmt":"2023-12-13T10:01:21","guid":{"rendered":"https:\/\/blog.mozilla.org\/performance\/?p=485"},"modified":"2023-12-13T04:01:35","modified_gmt":"2023-12-13T12:01:35","slug":"new-sheriffing-feature-and-significant-updates-to-kpi-reporting-queries","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/performance\/2023\/12\/13\/new-sheriffing-feature-and-significant-updates-to-kpi-reporting-queries\/","title":{"rendered":"New Sheriffing feature and significant updates to KPI reporting queries"},"content":{"rendered":"<p>A year ago I was <a href=\"https:\/\/blog.mozilla.org\/performance\/2022\/09\/15\/a-different-perspective\/\">sharing<\/a> how a Mozilla Performance Sheriff catches performance regressions, the entire <a href=\"https:\/\/wiki.mozilla.org\/TestEngineering\/Performance\/Sheriffing\/Workflow\">Workflow<\/a> they go through, and the incoming improvements. Since I joined the <a href=\"https:\/\/wiki.mozilla.org\/Performance\/Tools\">Performance Tools Team<\/a> (formerly Performance Test), almost five years ago, a <a href=\"https:\/\/blog.mozilla.org\/performance\/?s=Improvements+treeherder\">whole lot of improvements<\/a> have been made, and features have been added.<\/p>\n<p>In this article, I want to focus on a special set of features, that give the Performance Sheriffs more control over the Sheriffing Workflow (from when an alert is triggered, triaged to when the regression bug is filed and linked to the alert). We call them <b>time-to-triage<\/b> (from alert to triage) and <b>time-to-bug<\/b> (from alert to bug). They are actually the object of our <b>Sheriffing Team&#8217;s KPIs<\/b>, the KPIs that measure the performance of the Performance Sheriffs team (I like puns).<\/p>\n<p>The <b>time-to-triage<\/b> KPI measures the time since an alert was triggered by a performance change to when it was triaged (basically first-time analysis). It is at most <i>3 days<\/i>, and at least <i>80%<\/i> of the sheriffed alerts have to meet this deadline (or 20% is allowed not to). However, our team does not work weekends and they have to be excluded. For example, if an alert was created on a Friday (any), the three-day-triage time ends on Monday instead of Wednesday when the three business days actually expire. This means we basically only get a single day to triage it. So every time something like this happens, we have to manually exclude those alerts from the old queries of the KPI report that do not exclude the weekends from those times. The new queries do this exclusion automatically.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_486\" style=\"width: 258px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-486\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-486\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.19.46.png\" alt=\"Triage Response Times (time-to-triage)Year To Date\" width=\"248\" height=\"310\" \/><p id=\"caption-attachment-486\" class=\"wp-caption-text\">Triage Response Times (time-to-triage)<br \/>Year To Date<\/p><\/div>\n<div id=\"attachment_487\" style=\"width: 262px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-487\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-487\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.20.45.png\" alt=\"Triage Response Times (New Query)Year To Date\" width=\"252\" height=\"312\" \/><p id=\"caption-attachment-487\" class=\"wp-caption-text\">Triage Response Times (New Query)<br \/>Year To Date<\/p><\/div>\n<div id=\"attachment_488\" style=\"width: 559px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-488\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-488\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.41.32.png\" alt=\"Alerts Exceeding Triage TargetYear To Date\" width=\"549\" height=\"342\" srcset=\"https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.41.32.png 956w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.41.32-580x362.png 580w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.41.32-940x586.png 940w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.41.32-768x479.png 768w\" sizes=\"(max-width: 549px) 100vw, 549px\" \/><p id=\"caption-attachment-488\" class=\"wp-caption-text\">Alerts Exceeding Triage Target<br \/>Year To Date<\/p><\/div>\n<p>The same thing is true for an alert created on a weekend, where a part of the alert-to-triage time falls on the weekend. Actually, the only alerts that can not capture weekends are the ones created Monday and Tuesday.<\/p>\n<p>The <b>time-to-bug<\/b> KPI measures the time since an alert was triggered by a performance change to when a bug was linked to the alert. It is at most <i>5 days<\/i>, and at least <i>80%<\/i> of the valid regression alerts must meet this deadline (or 20% is allowed not to). The only alerts that can not capture weekends within this KPI are the ones created on Monday, the first hour in the morning, whose KPI ends Friday in the last hour of the day.<\/p>\n<div id=\"attachment_489\" style=\"width: 271px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-489\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-489\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.20.05.png\" alt=\"Regression Bug Response TimesYear To Date\" width=\"261\" height=\"327\" \/><p id=\"caption-attachment-489\" class=\"wp-caption-text\">Regression Bug Response Times<br \/>Year To Date<\/p><\/div>\n<div id=\"attachment_490\" style=\"width: 266px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-490\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-490\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.20.45-1.png\" alt=\"Regression Bug Response Times (New Query)Year To Date\" width=\"256\" height=\"317\" \/><p id=\"caption-attachment-490\" class=\"wp-caption-text\">Regression Bug Response Times (New Query)<br \/>Year To Date<\/p><\/div>\n<div id=\"attachment_491\" style=\"width: 450px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-491\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-491\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.44.05.png\" alt=\"Regressions Exceeding Bug TargetYear To Date\" width=\"440\" height=\"273\" srcset=\"https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.44.05.png 958w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.44.05-580x360.png 580w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.44.05-940x583.png 940w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-06-at-10.44.05-768x476.png 768w\" sizes=\"(max-width: 440px) 100vw, 440px\" \/><p id=\"caption-attachment-491\" class=\"wp-caption-text\">Regressions Exceeding Bug Target<br \/>Year To Date<\/p><\/div>\n<p>In the images above, you can see a difference in the percentages of <b>time-to-triage<\/b> (<span style=\"text-decoration: underline;\">86.9%<\/span> vs. <span style=\"text-decoration: underline;\">97.9%<\/span> <i>old query vs. new query<\/i>) and <b>time-to-bug<\/b> (<span style=\"text-decoration: underline;\">75.7%<\/span> vs. <span style=\"text-decoration: underline;\">97%<\/span> <i>old query vs. new query<\/i>). This is not because the Sheriffing Team is doing a better job, they were doing this the whole time. It is because the feature we developed helps measure the percentages accurately by excluding the weekends from the calculated times. According strictly to the percentages, the impact of this feature is significant, taking us from an average &#8211; maybe struggling &#8211; performance, to a really good one. Of course, the inclusion of weekends in the report of the KPIs was known a while ago, but having a bigger picture and concrete metrics is more revealing.<\/p>\n<p>The development of these time-to-triage\/time-to-bug features is full-stack and involved:<\/p>\n<ul>\n<li aria-level=\"1\"><b>Helping<\/b> our manager\u2019s Sheriffing report calculate the times more accurately (to whom I am grateful for supporting this initiative);<\/li>\n<li aria-level=\"1\"><b>Modifying<\/b> the <a href=\"https:\/\/github.com\/mozilla\/treeherder\/blob\/master\/treeherder\/perf\/models.py#L260\">performance_alert_summary<\/a> database table to store due dates;<\/li>\n<li aria-level=\"1\"><b>Implementing<\/b> the accurate calculation in the <a href=\"https:\/\/github.com\/mozilla\/treeherder\/blob\/master\/treeherder\/perf\/utils.py#L8\">backend<\/a> as described above;<\/li>\n<li aria-level=\"1\"><b>Showing<\/b> in the UI the countdown until the alert goes overdue gives the Performance Sheriffs more control and the ability to organize themselves throughout the Sheriffing Workflow better.<i><\/i><b><\/b><\/li>\n<\/ul>\n<p>I didn\u2019t mention the <b>countdown<\/b> feature yet. It is shown in the image below, right next to the status dropdown of the alert summary (top-right corner). Here are displayed:<\/p>\n<ul>\n<li aria-level=\"1\">The type of due date that is in effect (<b>Triage<\/b> in this case);<\/li>\n<li aria-level=\"1\">The amount of time. When the time goes under <i>24 hours<\/i>, the timer will switch to showing the hours left.<\/li>\n<\/ul>\n<p>The alert will become triaged and the counter will switch from triage to bug when the first-time analysis is performed on it (star, assign, add tag, add note).<\/p>\n<div id=\"attachment_492\" style=\"width: 950px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-492\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-492 size-large\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.30.30-940x200.png\" alt=\"Alert with Triage due date status\" width=\"940\" height=\"200\" srcset=\"https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.30.30-940x200.png 940w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.30.30-580x123.png 580w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.30.30-768x163.png 768w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.30.30-1536x327.png 1536w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.30.30-2048x436.png 2048w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.30.30-1000x213.png 1000w\" sizes=\"(max-width: 940px) 100vw, 940px\" \/><p id=\"caption-attachment-492\" class=\"wp-caption-text\">Alert with <strong>Triage due<\/strong> date status<\/p><\/div>\n<p>&nbsp;<\/p>\n<p>Below is an example of a <b>time-to-bug <\/b>timer (the time left until linking the alert to a bug will go due). By default the timer counter is <span style=\"color: #808000;\"><b>green<\/b><\/span>, but when the timer goes under 24 hours, it will go <span style=\"color: #ff6600;\"><b>orange<\/b><\/span>.<\/p>\n<div id=\"attachment_493\" style=\"width: 950px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-493\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-493 size-large\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.33.22-940x208.png\" alt=\"Alert with Bug due date status\" width=\"940\" height=\"208\" srcset=\"https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.33.22-940x208.png 940w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.33.22-580x129.png 580w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.33.22-768x170.png 768w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.33.22-1536x341.png 1536w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.33.22-2048x454.png 2048w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.33.22-1000x222.png 1000w\" sizes=\"(max-width: 940px) 100vw, 940px\" \/><p id=\"caption-attachment-493\" class=\"wp-caption-text\">Alert with <strong>Bug due<\/strong> date status<\/p><\/div>\n<p>When the timer goes <b>overdue<\/b>, we can see in the image below that the counter icon becomes <span style=\"color: #ff0000;\"><b>red<\/b><\/span> and the \u201c<i>Overdue<\/i>\u201d status is shown up.<\/p>\n<div id=\"attachment_494\" style=\"width: 950px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-494\" decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-494\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.39.01-940x445.png\" alt=\"Alert with Overdue status (this is for demo purposes only, the alert wasn\u2019t overdue for real)\" width=\"940\" height=\"445\" srcset=\"https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.39.01-940x445.png 940w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.39.01-580x275.png 580w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.39.01-768x364.png 768w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.39.01-1536x728.png 1536w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.39.01-2048x970.png 2048w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.39.01-1000x474.png 1000w\" sizes=\"(max-width: 940px) 100vw, 940px\" \/><p id=\"caption-attachment-494\" class=\"wp-caption-text\">Alert with <strong>Overdue<\/strong> status<br \/>(this is for demo purposes only, the alert wasn\u2019t overdue for real)<i><\/i><\/p><\/div>\n<p>Lastly, after the alert is finally linked to a bug, the counter will turn into a <span style=\"color: #808000;\"><strong>green<\/strong><\/span> checkmark and the countdown status will be \u201cReady for acknowledge\u201d.<\/p>\n<div id=\"attachment_495\" style=\"width: 950px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-495\" decoding=\"async\" loading=\"lazy\" class=\"size-large wp-image-495\" src=\"http:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.34.09-940x201.png\" alt=\"Alert with Ready for acknowledge status\" width=\"940\" height=\"201\" srcset=\"https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.34.09-940x201.png 940w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.34.09-580x124.png 580w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.34.09-768x164.png 768w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.34.09-1536x329.png 1536w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.34.09-2048x438.png 2048w, https:\/\/blog.mozilla.org\/performance\/files\/2023\/12\/Screenshot-2023-12-05-at-15.34.09-1000x214.png 1000w\" sizes=\"(max-width: 940px) 100vw, 940px\" \/><p id=\"caption-attachment-495\" class=\"wp-caption-text\">Alert with <strong>Ready for acknowledge<\/strong> status<\/p><\/div>\n<p>Now, instead of manually excluding the times inflated by the weekends, we have an automated feature to closely control the alert lifecycle and report the KPI percentages more accurately.<\/p>\n<p>The development of this feature was a personal initiative, encouraged by our manager and by the whole team (without their support I couldn\u2019t have done this). This is part of a wider initiative I support, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1772825\">improvements to Performance Sheriffing Workflow<\/a>. It improves the developer experience while working with performance regressions and helps the Performance Sheriffs be more efficient by improving their tools and automating as much as possible their workflow.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A year ago I was sharing how a Mozilla Performance Sheriff catches performance regressions, the entire Workflow they go through, and the incoming improvements. Since I joined the Performance Tools &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/performance\/2023\/12\/13\/new-sheriffing-feature-and-significant-updates-to-kpi-reporting-queries\/\">Read more<\/a><\/p>\n","protected":false},"author":1808,"featured_media":25,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/posts\/485"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/users\/1808"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/comments?post=485"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/posts\/485\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/media\/25"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/media?parent=485"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/categories?post=485"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/performance\/wp-json\/wp\/v2\/tags?post=485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}