{"id":327,"date":"2021-09-07T15:27:58","date_gmt":"2021-09-07T15:27:58","guid":{"rendered":"https:\/\/blog.mozilla.org\/data\/?p=327"},"modified":"2021-09-07T15:28:55","modified_gmt":"2021-09-07T15:28:55","slug":"this-week-in-glean-data-reviews-are-important-glean-parser-makes-them-easy","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/data\/2021\/09\/07\/this-week-in-glean-data-reviews-are-important-glean-parser-makes-them-easy\/","title":{"rendered":"This Week in Glean: Data Reviews are Important, Glean Parser makes them Easy"},"content":{"rendered":"\r\n<p>(\u201cThis Week in Glean\u201d is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.) All \u201cThis Week in Glean\u201d blog posts are listed in the <a href=\"https:\/\/mozilla.github.io\/glean\/book\/appendix\/twig.html\">TWiG index<\/a>).<\/p>\r\n\r\n<p>At Mozilla we put a lot of stock in Openness. Source? <a href=\"https:\/\/firefox-source-docs.mozilla.org\/contributing\/contributing_to_mozilla.html\">Open<\/a>. Bug tracker? <a href=\"https:\/\/bugzilla.mozilla.org\">Open<\/a>. Discussion Forums (Fora?)? Open (<a href=\"https:\/\/chat.mozilla.org\/\">synchronous<\/a> and <a href=\"https:\/\/discourse.mozilla.org\/\">asynchronous<\/a>).<\/p>\r\n<p>We also have an open process for determining if a new or expanded data collection in a Mozilla project is in line with our <a href=\"https:\/\/www.mozilla.org\/privacy\/principles\/\">Privacy Principles<\/a> and <a href=\"https:\/\/www.mozilla.org\/privacy\/\">Policies<\/a>: <a href=\"https:\/\/wiki.mozilla.org\/Data_Collection\">Data Review<\/a>.<\/p>\r\n<p>Basically, when a new piece of instrumentation is put up for code review (or before, or after), the instrumentor fills out <a href=\"https:\/\/github.com\/mozilla\/data-review\/blob\/main\/request.md\">a form<\/a> and asks a volunteer Data Steward to review it. If the instrumentation (as explained in the filled-in form) is obviously in line with our privacy commitments to our users, the Data Steward gives it the go-ahead to ship.<\/p>\r\n<p>(If it isn\u2019t _obviously_ okay then we kick it up to our Trust Team to make the decision. They sit next to Legal, in case you need to find them.)<\/p>\r\n<p>The Data Review Process and its forms are very generic. They\u2019re designed to work for any instrumentation (tab count, bytes transferred, theme colour) being added to any project (<a href=\"https:\/\/firefox.com\">Firefox Desktop<\/a>, <a href=\"https:\/\/mozilla.org\">mozilla.org<\/a>, <a href=\"https:\/\/en.wikipedia.org\/wiki\/Firefox_Focus\">Focus<\/a>) and being collected by any data collection system (<a href=\"https:\/\/firefox-source-docs.mozilla.org\/toolkit\/components\/telemetry\">Firefox Telemetry<\/a>, <a href=\"https:\/\/firefox-source-docs.mozilla.org\/toolkit\/crashreporter\/crashreporter\/index.html\">Crash Reporter<\/a>, <a href=\"https:\/\/mozilla.github.io\/glean\/book\/index.html\">Glean<\/a>). This is great for the process as it means we can use it and rely on it anywhere.<\/p>\r\n<p>It isn\u2019t so great for users _of_ the process. If you only ever write Data Reviews for one system, you\u2019ll find yourself answering the same questions with the same answers every time.<\/p>\r\n<p>And Glean makes this worse (better?) by including in its metrics definitions almost every piece of information you need in order to answer the review. So now you get to write the answers first in YAML and then in English during Data Review.<\/p>\r\n<p>But no more! Introducing <code>glean_parser data-review<\/code> and <code>mach data-review<\/code>: command-line tools that will generate for you a Data Review Request skeleton with all the easy parts filled in. It works like this:<\/p>\r\n\r\n\r\n\r\n<ol>\r\n<li>Write your instrumentation, providing full information in the metrics definition.<\/li>\r\n<li>Call <code>python -m glean_parser data-review &lt;bug_number&gt; &lt;list of metrics.yaml files&gt;<\/code> (or <code>mach data-review &lt;bug_number&gt;<\/code> if you\u2019re adding the instrumentation to Firefox Desktop).<\/li>\r\n<li><a href=\"https:\/\/github.com\/mozilla\/glean_parser\/\">glean_parser<\/a> will parse the metrics definitions files, pull out only the definitions that were added or changed in &lt;bug_number&gt;, and then output a partially-filled-out form for you.<\/li>\r\n<\/ol>\r\n<p>Here\u2019s an example. Say I\u2019m working on <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1664461\">bug 1664461<\/a> and add a new piece of instrumentation to Firefox Desktop:<\/p>\r\n\r\n\r\n\r\n<div class=\"wp-block-group is-layout-flow wp-block-group-is-layout-flow\"><div class=\"wp-block-group__inner-container\">\r\n<pre class=\"wp-block-code\"><code>fog.ipc:\r\n  replay_failures:\r\n    type: counter\r\n    description: |\r\n      The number of times the ipc buffer failed to be replayed in the\r\n      parent process.\r\n    bugs:\r\n      - https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1664461\r\n    data_reviews:\r\n      - https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1664461\r\n    data_sensitivity:\r\n      - technical\r\n    notification_emails:\r\n      - chutten@mozilla.com\r\n      - glean-team@mozilla.com\r\n    expires: never<\/code><\/pre>\r\n<\/div><\/div>\r\n\r\n\r\n\r\n<p>I\u2019m sure to fill in the `bugs` field correctly (because that\u2019s important on its own _and_ it\u2019s what glean_parser data-review uses to find which data I added), and have categorized the data_sensitivity. I also included a helpful description. (The data_reviews field currently points at the bug I\u2019ll attach the Data Review Request for. I\u2019d better remember to come back before I land this code and update it to point at the specific comment\u2026)<\/p>\r\n<p>Then I can simply use <code>mach data-review 1664461<\/code> and it spits out:<\/p>\r\n\r\n\r\n\r\n<pre class=\"wp-block-code\"><code>!! Reminder: it is your responsibility to complete and check the correctness of\r\n!! this automatically-generated request skeleton before requesting Data\r\n!! Collection Review. See https:\/\/wiki.mozilla.org\/Data_Collection for details.\r\n\r\nDATA REVIEW REQUEST\r\n1. What questions will you answer with this data?\r\n\r\nTODO: Fill this in.\r\n\r\n2. Why does Mozilla need to answer these questions? Are there benefits for users?\r\n   Do we need this information to address product or business requirements?\r\n\r\nTODO: Fill this in.\r\n\r\n3. What alternative methods did you consider to answer these questions?\r\n   Why were they not sufficient?\r\n\r\nTODO: Fill this in.\r\n\r\n4. Can current instrumentation answer these questions?\r\n\r\nTODO: Fill this in.\r\n\r\n5. List all proposed measurements and indicate the category of data collection for each\r\n   measurement, using the Firefox data collection categories found on the Mozilla wiki.\r\n\r\nMeasurement Name | Measurement Description | Data Collection Category | Tracking Bug\r\n---------------- | ----------------------- | ------------------------ | ------------\r\nfog_ipc.replay_failures | The number of times the ipc buffer failed to be replayed in the parent process.  | technical | https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=1664461\r\n\r\n\r\n6. Please provide a link to the documentation for this data collection which\r\n   describes the ultimate data set in a public, complete, and accurate way.\r\n\r\nThis collection is Glean so is documented\r\n[in the Glean Dictionary](https:\/\/dictionary.telemetry.mozilla.org).\r\n\r\n7. How long will this data be collected?\r\n\r\nThis collection will be collected permanently.\r\n**TODO: identify at least one individual here** will be responsible for the permanent collections.\r\n\r\n8. What populations will you measure?\r\n\r\nAll channels, countries, and locales. No filters.\r\n\r\n9. If this data collection is default on, what is the opt-out mechanism for users?\r\n\r\nThese collections are Glean. The opt-out can be found in the product's preferences.\r\n\r\n10. Please provide a general description of how you will analyze this data.\r\n\r\nTODO: Fill this in.\r\n\r\n11. Where do you intend to share the results of your analysis?\r\n\r\nTODO: Fill this in.\r\n\r\n12. Is there a third-party tool (i.e. not Telemetry) that you\r\n    are proposing to use for this data collection?\r\n\r\nNo.<\/code><\/pre>\r\n<p>As you can see, this Data Review Request skeleton comes partially filled out. Everything you previously had to mechanically fill out has been done for you, leaving you more time to focus on only the interesting questions like \u201cWhy do we need this?\u201d and \u201cHow are you going to use it?\u201d.<\/p>\r\n<p>Also, this saves you from having to remember the URL to the Data Review Request Form Template each time you need it. We\u2019ve got you covered.<\/p>\r\n<p>And since this is part of Glean, this means this is already available to <a href=\"https:\/\/dictionary.telemetry.mozilla.org\/\">every project you can see here<\/a>. This isn\u2019t just a Firefox Desktop thing.<\/p>\r\n<p>Hope this saves you some time! If you can think of other time-saving improvements we could add once to Glean so every Mozilla project can take advantage of, please <a href=\"https:\/\/chat.mozilla.org\/#\/room\/#glean:mozilla.org\">tell us on Matrix<\/a>.<\/p>\r\n<p>If you\u2019re interested in how this is implemented, glean_parser\u2019s part of this is <a href=\"https:\/\/github.com\/mozilla\/glean_parser\/blob\/main\/glean_parser\/data_review.py\">over here<\/a>, while the mach command part is <a href=\"https:\/\/searchfox.org\/mozilla-central\/source\/toolkit\/components\/glean\/build_scripts\/mach_commands.py\">here<\/a>.<\/p>\r\n<p>:chutten<\/p>\r\n<p>(( This is a syndicated copy of <a href=\"https:\/\/chuttenblog.wordpress.com\/2021\/09\/07\/this-week-in-glean-data-reviews-are-important-glean-parser-makes-them-easy\/\">the original post<\/a>. ))<\/p>\r\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":1437,"featured_media":197,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[448297],"tags":[525,448323,315987,448297],"coauthors":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/327"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/users\/1437"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/comments?post=327"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/327\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media\/197"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media?parent=327"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/categories?post=327"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/tags?post=327"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/coauthors?post=327"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}