{"id":187,"date":"2020-05-08T15:16:08","date_gmt":"2020-05-08T15:16:08","guid":{"rendered":"https:\/\/blog.mozilla.org\/data\/?p=187"},"modified":"2020-05-08T15:16:08","modified_gmt":"2020-05-08T15:16:08","slug":"this-week-in-glean-mozregression-telemetry-part-2","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/data\/2020\/05\/08\/this-week-in-glean-mozregression-telemetry-part-2\/","title":{"rendered":"This Week in Glean: mozregression telemetry (part 2)"},"content":{"rendered":"<p><em>(\u201cThis Week in Glean\u201d is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can find <a href=\"https:\/\/mozilla.github.io\/glean\/book\/appendix\/twig.html\">an index of all TWiG posts online.<\/a>)<\/em><\/p>\n<p><em>This is a special guest post by non-Glean-team member William Lachance!<\/em><\/p>\n<p>This is a continuation of an exploration of adding Glean-based telemetry to a python application, in this case <a href=\"https:\/\/mozilla.github.io\/mozregression\">mozregression<\/a>, a tool for automatically finding the source of Firefox regressions (breakage).<\/p>\n<p>When we left off <a href=\"https:\/\/wlach.github.io\/blog\/2020\/02\/this-week-in-glean-special-guest-post-mozregression-telemetry-part-1\/\">last time<\/a>, we had written some test scripts and verified that the data was visible in the debug viewer.<\/p>\n<h2 id=\"adding-telemetry-to-mozregression-itself\">Adding Telemetry to mozregression itself<\/h2>\n<p>In many ways, this is pretty similar to what I did inside the sample application: the only significant difference is that these are shipped inside a Python application that is meant to be be installable via <a href=\"https:\/\/pypi.org\/project\/pip\/\">pip<\/a>. This means we need to specify the <code>pings.yaml<\/code> and <code>metrics.yaml<\/code> (located inside the <code>mozregression<\/code> subirectory) as package data inside <code>setup.py<\/code>:<\/p>\n<div class=\"brush: py\">\n<div class=\"source\">\n<pre><span class=\"n\">setup<\/span><span class=\"p\">(<\/span>\r\n    <span class=\"n\">name<\/span><span class=\"o\">=<\/span><span class=\"s2\">\"mozregression\"<\/span><span class=\"p\">,<\/span>\r\n    <span class=\"o\">...<\/span>\r\n    <span class=\"n\">package_data<\/span><span class=\"o\">=<\/span><span class=\"p\">{<\/span><span class=\"s2\">\"mozregression\"<\/span><span class=\"p\">:<\/span> <span class=\"p\">[<\/span><span class=\"s2\">\"*.yaml\"<\/span><span class=\"p\">]},<\/span>\r\n    <span class=\"o\">...<\/span>\r\n<span class=\"p\">)<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>There were also a number of Glean SDK enhancements which we determined were necessary. Most notably, Michael Droettboom added 32-bit Windows wheels to the Glean SDK, which we need to make building the <a href=\"https:\/\/mozilla.github.io\/mozregression\/quickstart.html#gui\">mozregression GUI<\/a> on Windows possible. In addition, some minor changes needed to be made to Glean\u2019s behaviour for it to work correctly with a command-line tool like mozregression \u2014 for example, Glean used to assume that Telemetry would always be disabled via a GUI action so that it would send a deletion ping, but this would obviously not work in an application like mozregression where there is only a configuration file \u2014 so for this case, Glean needed to be modified to check if it had been disabled <em>between<\/em> runs.<\/p>\n<p>Many thanks to Mike (and others on the Glean team) for so patiently listening to my concerns and modifying Glean accordingly.<\/p>\n<h2 id=\"getting-data-review\">Getting Data Review<\/h2>\n<p>At Mozilla, we don\u2019t just allow random engineers like myself to start collecting data in a product that we ship (even a semi-internal like mozregression). We have <a href=\"https:\/\/wiki.mozilla.org\/Firefox\/Data_Collection\">a process<\/a>, overseen by Data Stewards to make sure the information we gather is actually answering important questions and doesn\u2019t unnecessarily collect personally identifiable information (e.g. email addresses).<\/p>\n<p>You can see the specifics of how this worked out in the case of mozregression in <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1581647#c9\">bug 1581647<\/a>.<\/p>\n<h2 id=\"documentation\">Documentation<\/h2>\n<p>Glean has some fantastic utilities for generating markdown-based documentation on what information is being collected, which I have made available on GitHub:<\/p>\n<p><a href=\"https:\/\/github.com\/mozilla\/mozregression\/blob\/master\/docs\/glean\/metrics.md\">https:\/\/github.com\/mozilla\/mozregression\/blob\/master\/docs\/glean\/metrics.md<\/a><\/p>\n<p>The generation of this documentation is <a href=\"https:\/\/github.com\/mozilla\/mozregression\/blob\/3454e1eafe83f53a84cb6b10f46649320d5ed097\/.travis.yml#L57\">hooked up to mozregression\u2019s continuous integration<\/a>, so we can sure it\u2019s up to date.<\/p>\n<p>I also added <a href=\"https:\/\/mozilla.github.io\/mozregression\/documentation\/telemetry.html\">a quick note<\/a> to mozregression\u2019s web site describing the feature, along with (very importantly) instructions on how to turn it off.<\/p>\n<h2 id=\"enabling-data-ingestion\">Enabling Data Ingestion<\/h2>\n<p>Once a Glean-based project has passed data review, getting our infrastructure to ingest it is pretty straightforward. Normally <a href=\"https:\/\/mozilla.github.io\/glean\/book\/user\/adding-glean-to-your-project.html#adding-metadata-about-your-project-to-the-pipeline\">we would suggest just filing a bug<\/a> and let us (the data team) handle the details, but since I\u2019m <em>on<\/em> that team, I\u2019m going to go a (little bit) of detail into how the sausage is made.<\/p>\n<p>Behind the scenes, we have a collection of ETL (extract-transform-load) scripts in the <a href=\"https:\/\/github.com\/mozilla\/probe-scraper\/\">probe-scraper repository<\/a> which are responsible for parsing the ping and probe metadata files that I added to mozregression in the step above and then automatically creating BigQuery tables and updating our ingestion machinery to insert data passed to us there.<\/p>\n<p>There\u2019s quite a bit of complicated machinery being the scenes to make this all work, but since it\u2019s already in place, adding a new thing like this is relatively simple. The changeset I submitted as part of a <a href=\"https:\/\/github.com\/mozilla\/probe-scraper\/pull\/184\">pull request<\/a> to probe-scraper was all of 9 lines long:<\/p>\n<div class=\"brush: diff\">\n<div class=\"source\">\n<pre><span class=\"gh\">diff --git a\/repositories.yaml b\/repositories.yaml<\/span>\r\n<span class=\"gh\">index dffcccf..6212e55 100644<\/span>\r\n<span class=\"gd\">--- a\/repositories.yaml<\/span>\r\n<span class=\"gi\">+++ b\/repositories.yaml<\/span>\r\n<span class=\"gu\">@@ -239,3 +239,12 @@ firefox-android-release:<\/span>\r\n     - org.mozilla.components:browser-engine-gecko-beta\r\n     - org.mozilla.appservices:logins\r\n     - org.mozilla.components:support-migration\r\n<span class=\"gi\">+mozregression:<\/span>\r\n<span class=\"gi\">+  app_id: org-mozilla-mozregression<\/span>\r\n<span class=\"gi\">+  notification_emails:<\/span>\r\n<span class=\"gi\">+    - wlachance@mozilla.com<\/span>\r\n<span class=\"gi\">+  url: 'https:\/\/github.com\/mozilla\/mozregression'<\/span>\r\n<span class=\"gi\">+  metrics_files:<\/span>\r\n<span class=\"gi\">+    - 'mozregression\/metrics.yaml'<\/span>\r\n<span class=\"gi\">+  ping_files:<\/span>\r\n<span class=\"gi\">+    - 'mozregression\/pings.yaml'<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<h2 id=\"a-pretty-graph\">A Pretty Graph<\/h2>\n<p>With the probe scraper change merged and deployed, we can now start querying! A number of tables are automatically created according to the schema outlined above: notably \u201clive\u201d and \u201cstable\u201d tables corresponding to the usage ping. Using <a href=\"https:\/\/docs.telemetry.mozilla.org\/tools\/stmo.html\">sql.telemetry.mozilla.org<\/a> we can start exploring what\u2019s out there. Here\u2019s a quick query I wrote up:<\/p>\n<div class=\"brush: sql\">\n<div class=\"source\">\n<pre><span class=\"k\">SELECT<\/span> <span class=\"nb\">DATE<\/span><span class=\"p\">(<\/span><span class=\"n\">submission_timestamp<\/span><span class=\"p\">)<\/span> <span class=\"k\">AS<\/span> <span class=\"nb\">date<\/span><span class=\"p\">,<\/span>\r\n       <span class=\"n\">metrics<\/span><span class=\"p\">.<\/span><span class=\"n\">string<\/span><span class=\"p\">.<\/span><span class=\"n\">usage_variant<\/span> <span class=\"k\">AS<\/span> <span class=\"n\">variant<\/span><span class=\"p\">,<\/span>\r\n       <span class=\"k\">count<\/span><span class=\"p\">(<\/span><span class=\"o\">*<\/span><span class=\"p\">),<\/span>\r\n<span class=\"k\">FROM<\/span> <span class=\"o\">`<\/span><span class=\"n\">moz<\/span><span class=\"o\">-<\/span><span class=\"n\">fx<\/span><span class=\"o\">-<\/span><span class=\"k\">data<\/span><span class=\"o\">-<\/span><span class=\"n\">shared<\/span><span class=\"o\">-<\/span><span class=\"n\">prod<\/span><span class=\"o\">`<\/span><span class=\"p\">.<\/span><span class=\"n\">org_mozilla_mozregression_stable<\/span><span class=\"p\">.<\/span><span class=\"n\">usage_v1<\/span>\r\n<span class=\"k\">WHERE<\/span> <span class=\"nb\">DATE<\/span><span class=\"p\">(<\/span><span class=\"n\">submission_timestamp<\/span><span class=\"p\">)<\/span> <span class=\"o\">&gt;=<\/span> <span class=\"s1\">'2020-04-14'<\/span>\r\n  <span class=\"k\">AND<\/span> <span class=\"n\">client_info<\/span><span class=\"p\">.<\/span><span class=\"n\">app_display_version<\/span> <span class=\"k\">NOT<\/span> <span class=\"k\">LIKE<\/span> <span class=\"s1\">'%.dev%'<\/span>\r\n<span class=\"k\">GROUP<\/span> <span class=\"k\">BY<\/span> <span class=\"nb\">date<\/span><span class=\"p\">,<\/span> <span class=\"n\">variant<\/span><span class=\"p\">;<\/span>\r\n<\/pre>\n<\/div>\n<\/div>\n<p>Generating a chart like this:<\/p>\n<div class=\"figure\"><img decoding=\"async\" src=\"https:\/\/wlach.github.io\/files\/2020\/05\/mozregression-variant-usage.png\" alt=\"\" \/><\/div>\n<p>This chart represents the absolute volume of mozregression usage since April 14th 2020 (around the time when we first released a version of mozregression with Glean telemetry), grouped by mozregression \u201cvariant\u201d (GUI, console, and mach) and date &#8211; you can see that (unsurprisingly?) the GUI has the highest usage.<\/p>\n<h2 id=\"next-steps\">Next Steps<\/h2>\n<p>We\u2019re not done yet! Next time, we\u2019ll look into making a public-facing dashboard demonstrating these results and making an aggregated version of the mozregression telemetry data publicly accessible to researchers and the general public. If we\u2019re lucky, there might even be a bit of <em>data science<\/em>. Stay tuned!<\/p>\n<p>(( This is a syndicated copy of <a href=\"https:\/\/wlach.github.io\/blog\/2020\/05\/this-week-in-glean-mozregression-telemetry-part-2\/\">the original post<\/a>. ))<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(\u201cThis Week in Glean\u201d is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/data\/2020\/05\/08\/this-week-in-glean-mozregression-telemetry-part-2\/\">Read more<\/a><\/p>\n","protected":false},"author":1528,"featured_media":168,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[448297],"tags":[448297,288565],"coauthors":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/187"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/users\/1528"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/comments?post=187"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/187\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media\/168"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media?parent=187"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/categories?post=187"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/tags?post=187"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/coauthors?post=187"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}