{"id":372,"date":"2021-10-22T14:16:25","date_gmt":"2021-10-22T14:16:25","guid":{"rendered":"https:\/\/blog.mozilla.org\/data\/?p=372"},"modified":"2021-10-22T14:16:25","modified_gmt":"2021-10-22T14:16:25","slug":"this-week-in-glean-the-three-roles-of-data-engagements","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/data\/2021\/10\/22\/this-week-in-glean-the-three-roles-of-data-engagements\/","title":{"rendered":"This Week in Glean: The Three Roles of Data Engagements"},"content":{"rendered":"<p>(\u201cThis Week in Glean\u201d is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.) All \u201cThis Week in Glean\u201d blog posts are listed in the <a href=\"https:\/\/mozilla.github.io\/glean\/book\/appendix\/twig.html\">TWiG index<\/a>).<\/p>\n<p>I\u2019ve just recently started my sixth year working at Mozilla on data and data-adjacent things. In those years I\u2019ve started to notice some patterns in how data is approached, so I thought I\u2019d set them down in a TWiG because Glean\u2019s got a role to play in them.<\/p>\n<h2>Data Engagements<\/h2>\n<p>A Data Engagement is when there\u2019s a question that needs to engage with data to be answered. Something like \u201cHow many bookmarks are used by Firefox users?\u201d.<\/p>\n<p>(No one calls these Data Engagements but me, and I only do because I need to call them _something_.)<\/p>\n<p>I\u2019ve noticed three roles in Data Engagements at Mozilla:<\/p>\n<ol>\n<li aria-level=\"1\">Data Consumer: The Question-Asker. The Temperature-Taker. This is the one who knows what questions are important, and is frustrated without an answer until and unless data can be collected and analysed to provide it. \u201cWe need to know how many bookmarks are used to see if we should invest more in bookmark R&amp;D.\u201d<\/li>\n<li aria-level=\"1\">Data Analyst: The Answer-Maker. The Stats-Cruncher. This is the one who can use Data to answer a Consumer\u2019s Question. \u201cBookmarks are used by Canadians more than Mexicans most of the time, but only amongst profiles that have at least one bookmark.\u201d<\/li>\n<li aria-level=\"1\">Data Instrumentor: The Data-Digger. The Code-Implementor. This one can sift through product code and find the correct place to collect the right piece of data. \u201cThe Places database holds many things, we\u2019ll need to filter for just bookmarks to count them.\u201d<\/li>\n<\/ol>\n<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/lh6.googleusercontent.com\/fFe-UchE_Xy3d1LhX0shV41TJeABBBxJZigDH9_HXKAhu-m0JaM9fhfS8PQvW2WknXlMfk8lSIheZ-YMtT-NaQcLfdLYZnHC_f3LCkAIb-yYN0qWVvi0UPjQnz9C77sX5r0VLbsR=s1600\" alt=\"This image has an empty alt attribute; its file name is fFe-UchE_Xy3d1LhX0shV41TJeABBBxJZigDH9_HXKAhu-m0JaM9fhfS8PQvW2WknXlMfk8lSIheZ-YMtT-NaQcLfdLYZnHC_f3LCkAIb-yYN0qWVvi0UPjQnz9C77sX5r0VLbsR=s1600\" \/><\/p>\n<p>(diagrams courtesy of :brizental)<\/p>\n<p>It\u2019s through these three working in concert &#8212; The Consumer having a question that the Instrumentor instruments to generate data the Analyst can analyse to return an answer back to the Consumer &#8212; that a Data Engagement succeeds.<\/p>\n<p>At Mozilla, Data Engagements succeed very frequently <i>in certain circumstances<\/i>. The Graphics team answers many deeply-technical questions about Firefox running in the wild to determine how well WebRender is working. The Telemetry team examines the health of the data collection system as a whole. Mike Conley\u2019s old <a href=\"https:\/\/mikeconley.github.io\/bug1310250\/\">Tab Switcher Dashboard<\/a> helped find and solve performance regressions in (unsurprisingly) Tab Switching. These go well, and there\u2019s a common thread here that I think is the secret of why:<\/p>\n<p>In these and the other high-success-rate Data Engagements, all three roles (Consumer, Analyst, and Instrumentor) are embodied <i>by the same person<\/i>.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/lh5.googleusercontent.com\/laonFTnKBH6lmRWFRhcjUGx2aTG8iZbf3Wp99ulVqsu5J4qwZuq2pRaJ9WtBoXTEeAeDtui1yFn2gqMxxoFFZ1F87pLUXgmsymS9alcMqH0QBD7mz1bsTINN5FuVW1s9L0KSew8j=s1600\" alt=\"This image has an empty alt attribute; its file name is laonFTnKBH6lmRWFRhcjUGx2aTG8iZbf3Wp99ulVqsu5J4qwZuq2pRaJ9WtBoXTEeAeDtui1yFn2gqMxxoFFZ1F87pLUXgmsymS9alcMqH0QBD7mz1bsTINN5FuVW1s9L0KSew8j=s1600\" \/><\/p>\n<p>It\u2019s a common problem in the industry. It\u2019s hard to build anything at all, but it\u2019s least hard to build something for yourself. When you are in yourself the Question-Asker, Answer-Maker, and Data-Digger, you don\u2019t often mistakenly dig the wrong data to create an answer that isn\u2019t to the question you had in mind. And when you accidentally do make a mistake (because, remember, this is hard), you can go back in and change the instrumentation, update the analysis, or reword the question.<\/p>\n<p>But when these three roles are in different parts of the org, or different parts of the planet, things get harder. Each role is now trying to speak the others\u2019 languages and infer enough context to do their jobs independently.<\/p>\n<p>In comes the Data Org at Mozilla which has had great successes to date on the theme of \u201cMaking it easier for anyone to be their own Analyst\u201d. Data Democratization. When you\u2019re your own Analyst, then there\u2019s fewer situations when the roles are disparate: Instrumentors who are their own Analysts know when data won\u2019t be the right shape to answer their own questions and Consumers who are their own Analysts know when their questions aren\u2019t well-formed.<\/p>\n<p>Unfortunately we haven\u2019t had as much success in making the other roles more accessible. Everyone can theoretically be their own Consumer: curiosity in a data-rich environment is as common as lanyards at an industry conference[1]. Asking _good_ questions is hard, though. Possible, but hard. You could just about imagine someone in a mature data organization becoming able to tell the difference between questions that are important and questions that are just interesting through self-serve tooling and documentation.<\/p>\n<p>As for being your own Instrumentor\u2026 that is something that only a small fraction of folks have the patience to do. I (and Mozilla\u2019s Community Managers) welcome you to try: it is possible to download and build Firefox yourself. It\u2019s possible to find out which part of the codebase controls which pieces of UI. It\u2019s\u2026 well, it\u2019s more than possible, it\u2019s actually quite pleasant to add instrumentation using Glean\u2026 but on the whole, if you are someone who _can_ Instrument Firefox Desktop you probably already have a copy of the source code on your hard drive. If you check right now and it\u2019s not there, then there\u2019s precious little likelihood that will change.<\/p>\n<p>(Unless you <a href=\"https:\/\/careers.mozilla.org\/\">come and work for Mozilla<\/a>, that is.)<\/p>\n<p>So let\u2019s assume for now that democratizing instrumentation is impossible. Why does it matter? Why should it matter that the Consumer is a separate person from the Instrumentor?<\/p>\n<h2>Communication<\/h2>\n<p>Each role communicates with each other role with a different language:<\/p>\n<ul>\n<li aria-level=\"1\">Consumers talk to Instrumentors and Analysts in units of Questions and Answers. \u201cHow many bookmarks are there? We need to know whether people are using bookmarks.\u201d<\/li>\n<li aria-level=\"1\">Analysts speak Data, Metadata, and Stats. \u201cThe median number of bookmarks is, according to a representative sample of Firefox profiles, twelve (confidence interval 99.5%).\u201d<\/li>\n<li aria-level=\"1\">Instrumentors speak Data and Code. \u201cThere\u2019s a few ways we delete bookmarks, we should cover them all to make sure the count\u2019s correct when the next ping\u2019s sent\u201d<\/li>\n<\/ul>\n<p>Some more of the Data Org and Mozilla\u2019s greatest successes involve supplying context at the points in a Data Engagement where they\u2019re most needed. We\u2019ve gotten exceedingly good at loading context about data (metadata) to facilitate communication between Instrumentors and Analysts with tools like <a href=\"http:\/\/dictionary.telemetry.mozilla.org\/\">Glean Dictionary<\/a>.<\/p>\n<p>Ah, but once again the weak link appears to be the communication of Questions and Answers between Consumers and Instrumentors. Taking the above example, does the number of bookmarks include folders?<\/p>\n<p>The Consumer knows, but the further away they sit from the Instrumentor, the less likely that the data coming from the product and fueling the analysis will be the \u201ccorrect\u201d one.<\/p>\n<p>(Either including or excluding folders would be \u201ccorrect\u201d for different cases. Which one do you think was \u201cmore correct\u201d?)<\/p>\n<p>So how do we improve this?<\/p>\n<h2>Glean<\/h2>\n<p>Well, actually, Glean doesn\u2019t have a solution for this. I don\u2019t actually know what the solutions are. I have some ideas. Maybe we should share more context between Consumers and Instrumentors somehow. Maybe we should formalize the act of question-asking. Maybe we should build into the Glean SDK a high-enough level of metric abstraction that instead of asking questions, Consumers learn to speak a language of metrics.<\/p>\n<p>The one thing I do know is that Glean is absolutely necessary to making any of these solutions possible. Without Glean, we have too many systems that are fractally complex for any context to be relevantly shared. How can we talk about sharing context about bookmark counts when we aren\u2019t even counting things consistently[2]?<\/p>\n<p>Glean brings that consistency. And from there we get to start solving these problems.<\/p>\n<p>Expect me to come back to this realm of Engagements and the Three Roles in future posts. I\u2019ve been thinking about:<\/p>\n<ul>\n<li aria-level=\"1\">how tooling affects the languages the roles speak amongst themselves and between each other,<\/li>\n<li aria-level=\"1\">how the roles are distributed on the org chart,<\/li>\n<li aria-level=\"1\">which teams support each role,<\/li>\n<li aria-level=\"1\">how Data Stewardship makes communication easier by adding context and formality,<\/li>\n<li aria-level=\"1\">how Telemetry and Glean handle the same situations in different ways, and<\/li>\n<li aria-level=\"1\">what roles Users play in all this. No model about data is complete without considering where the data comes from.<\/li>\n<\/ul>\n<p>I\u2019m not sure how many I\u2019ll actually get to, but at least I have ideas.<\/p>\n<p>:chutten<\/p>\n<p>[1] Other rejected similes include \u201cas common as\u201d: maple syrup on Canadian breakfast tables, frustration in traffic, sense isn\u2019t.<\/p>\n<p>[2] Counting is harder than it looks.<\/p>\n<p>(( This post is a syndicated copy of <a href=\"https:\/\/chuttenblog.wordpress.com\/2021\/10\/22\/this-week-in-glean-the-three-roles-of-data-engagements\/\">the original<\/a>. ))<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(\u201cThis Week in Glean\u201d is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/data\/2021\/10\/22\/this-week-in-glean-the-three-roles-of-data-engagements\/\">Read more<\/a><\/p>\n","protected":false},"author":1437,"featured_media":197,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[448297],"tags":[525,448297,448330,448329],"coauthors":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/372"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/users\/1437"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/comments?post=372"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/372\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media\/197"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media?parent=372"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/categories?post=372"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/tags?post=372"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/coauthors?post=372"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}