{"id":148,"date":"2020-03-30T06:56:26","date_gmt":"2020-03-30T06:56:26","guid":{"rendered":"https:\/\/blog.mozilla.org\/data\/?p=148"},"modified":"2020-05-05T08:23:37","modified_gmt":"2020-05-05T08:23:37","slug":"opening-data-to-understand-social-distancing","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/data\/2020\/03\/30\/opening-data-to-understand-social-distancing\/","title":{"rendered":"Opening data to understand social distancing"},"content":{"rendered":"<p>As researchers work to understand and hopefully control the covid-19 pandemic, collaboration and sharing of data is essential.\u00a0 Organizations like <a href=\"https:\/\/github.com\/CSSEGISandData\/COVID-19\">Johns Hopkins University Center for Systems Science and Engineering <\/a>and Italy\u2019s <a href=\"https:\/\/github.com\/pcm-dpc\/COVID-19\">Dipartimento della Protezione Civile<\/a> are publishing data sets to help others make sense of the impact of this pandemic.\u00a0 At Mozilla, we\u2019ve noticed a recent increase in desktop usage of Firefox. Because this data may have some value to researchers investigating social distancing measures in the current pandemic, we are releasing a dataset to support this collaborative effort.<\/p>\n<p>To explain visually, below is a plot of daily active users (DAU) of Firefox Desktop in France.\u00a0 We include both the actual metric and the forecasted metric, which represents what we would expect to see if everything were normal.\u00a0 Following is a plot of the metric deviation from forecast.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-extra-large wp-image-153\" src=\"http:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.42-AM-1000x331.png\" alt=\"DAU forecast and actual graph\" width=\"1000\" height=\"331\" srcset=\"https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.42-AM-1000x331.png 1000w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.42-AM-300x99.png 300w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.42-AM-600x199.png 600w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.42-AM-768x254.png 768w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.42-AM-1536x508.png 1536w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.42-AM-2048x678.png 2048w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-extra-large wp-image-152\" src=\"http:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.24-AM-1000x352.png\" alt=\"Deviation Graph\" width=\"1000\" height=\"352\" srcset=\"https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.24-AM-1000x352.png 1000w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.24-AM-300x106.png 300w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.24-AM-600x211.png 600w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.24-AM-768x271.png 768w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.24-AM-1536x541.png 1536w, https:\/\/blog.mozilla.org\/data\/files\/2020\/03\/Screen-Shot-2020-03-30-at-9.50.24-AM-2048x722.png 2048w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p>We see there is some normal range of variation, but there is a clear increase in usage outside the normal range starting on March 16th.\u00a0 Given the timing and unprecedented nature of the increase in desktop usage we have recorded, we think it may be related to social distancing measures being taken worldwide, although the relationship is not certain and Mozilla has not investigated alternative explanations for this increase in usage.\u00a0 It is possible that this data may be useful to help evaluate adherence with these measures, or even, by examining how the rate of disease growth changes following changes in Firefox usage, the impact that these measures are having.<\/p>\n<p>As data scientists, we would caution other researchers that our user base may not be representative of the general population and that we are still working to understand how other factors that affect online life (like the proliferation of online education or the increased consumption of news or other covid-19 related material) affect our data, but we hope that this data will provide some value to the research community and other interested organizations. We also hope to set an example and encourage our industry colleagues to collaborate openly and share data that may be of value in ways that are consistent with respect for the privacy of their users.<\/p>\n<p>We are making a table available that identifies deviations in our metrics over time for a large set of geographical units at the country and city level.\u00a0 As an organization steeped in openness and respect for individuals\u2019 privacy, we are publishing this non-personal data in accordance with our commitment to an internet that <a href=\"https:\/\/www.mozilla.org\/en-US\/about\/manifesto\/\">catalyzes collaboration for the common good <\/a>and with data practices that preserve anonymity for our users, by ensuring that all released data are based on an aggregation of at least 5000 users.\u00a0 Mozilla is making this database available under the<a href=\"https:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/\"> Creative Commons CC0 public domain dedication<\/a>. That means we have waived all copyrights to the extent we can under the law and the data is public and free to use.\u00a0 The data are currently available as a CSV download <a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1jHWW9QYAOCNTVwyWF29YiVGDf4uX3TcLgREVrQ1bkHI\/export?format=csv\">here<\/a> or as a JSON endpoint <a href=\"https:\/\/public-data.telemetry.mozilla.org\/api\/v1\/tables\/telemetry_derived\/deviations\/v1\/files\/000000000000.json\">here<\/a>.\u00a0 This will be replaced with a public BigQuery table and a JSON endpoint soon (this post will be updated). We describe the data briefly below. Inquiries can be directed to <a href=\"mailto:publicdata@mozilla.com\">publicdata@mozilla.com<\/a>.<\/p>\n<p><b>Details of the data<\/b><\/p>\n<p>Our metrics have normal variation over time for many reasons, so rather than sharing the raw metrics, we are providing the deviation from a forecast. The forecast models what would have happened to our metrics since January 30th (the date the WHO declared a global public-health emergency) if \u201ceverything was normal,\u201d so deviations from the forecast represent an anomaly. Not all anomalies will be related to social distancing measures, but by analyzing the patterns over time, it may be possible to develop confidence that a deviation is attributable to the measures.<\/p>\n<p>The code used to produce this data is available <a href=\"https:\/\/github.com\/mozilla\/dscontrib\/tree\/master\/src\/dscontrib\/jmccrosky\/anomdtct\">here<\/a>. We would like to acknowledge our use of <a href=\"http:\/\/facebook.github.io\/prophet\/\">Facebook\u2019s Prophet forecasting library<\/a>. Its robustness makes it possible to develop useful forecasts for thousands of different geographical units over multiple metrics in a very short time.\u00a0 The library is freely available under an MIT open source license and does not require sharing any data with Facebook.<\/p>\n<p>The data describe the deviation from forecast of various usage metrics. Currently we include:<\/p>\n<ul>\n<li>\u201cdesktop_dau\u201d, or daily active users of our desktop browser.\u00a0 Note that increases in DAU may be due to more frequent use by existing users or due to an influx of new users.<\/li>\n<li>&#8220;mean_active_hours_per_client&#8221; attempts to measure the mean number of hours that users in the region were actively using the browser each day.<\/li>\n<li>We hope to add more metrics in the near future.<\/li>\n<\/ul>\n<p>The deviation is described as a proportion of the actual, thus the computation of deviation would be (actual &#8211; forecast) \/ actual. A value of 0 indicates no deviation while more extreme positive or negative values indicate unexpectedly high or low values respectively.<\/p>\n<p>As it is normal that the actual values will deviate slightly from the forecasted value, the forecast provides a \u201c80% credible interval\u201d to capture the range of \u201cnormal\u201d values. We also include a deviation relative to the forecast credible interval. This is calculated as (actual &#8211; forecast) \/ (ci_upper &#8211; forecast) where actual &gt; forecast or (actual &#8211; forecast) \/ (forecast &#8211; ci_lower) otherwise. A metric just on the upper edge of the credible interval would have a value of \u201c1\u201d here and a value just on the lower edge would have a value of \u201c-1\u201d. Depending on your analysis, you may wish to consider all anomalies with an absolute value of ci_deviation less than 1 (or more) to be non-anomalous.<\/p>\n<p><b>Schema<\/b><\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<td>date<\/td>\n<td>We provide the metric deviation values for dates ranging from the beginning of 2020 to the most recent data available<\/td>\n<\/tr>\n<tr>\n<td>metric<\/td>\n<td>The metric being analyzed.\u00a0 Currently always \u201cdesktop_dau\u201d.<\/td>\n<\/tr>\n<tr>\n<td>deviation<\/td>\n<td>The deviation of the metric from its forecast.<\/td>\n<\/tr>\n<tr>\n<td>ci_deviation<\/td>\n<td>The deviation relative to the credible interval.<\/td>\n<\/tr>\n<tr>\n<td>geography<\/td>\n<td>Either:<\/p>\n<ul>\n<li>the ISO country code for country-level data<\/li>\n<li>the ISO country code, level-1 subdivision, level-2 subdivision, and city name separated with \u201c:\u201ds for city-level data.<\/li>\n<\/ul>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p><b>Caveats<\/b><\/p>\n<p>As with any observational data, there are many caveats and interpretation must be done carefully.\u00a0 Below is a list of issues we have considered, but it is not exhaustive.<\/p>\n<ul>\n<li>Firefox users may not be representative of the general population in their region.<\/li>\n<li>We are applying a constantly-configured forecast model across more than a thousand geographical units.\u00a0 The forecast is remarkably robust, but will not perform perfectly in every case. Some deviations may be due to forecast misspecification.<\/li>\n<li>Geo data is based on IPGeo databases.\u00a0 These databases are imperfect, so some activity may be attributed to the wrong location.\u00a0 As well, updates in the IPGeo database create noise that introduces many artifactual anomalies, as such, we use only the most recent geo per profile for this analysis, which may have interactions with pandemic-related travel.\u00a0 Further, proxy and VPN usage can create geo-attribution errors.<\/li>\n<li>Our desktop product can be used on portable products (such as notebook computers).\u00a0 In our data we assume a fixed location for each profile based on the most recent telemetry ping from that profile, which may introduce some errors for users that have traveled during this time.<\/li>\n<\/ul>\n<p><strong>Updates<\/strong><\/p>\n<ul>\n<li>2020-4-1: Added &#8220;mean_active_hours_per_client&#8221; metric.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>As researchers work to understand and hopefully control the covid-19 pandemic, collaboration and sharing of data is essential.\u00a0 Organizations like Johns Hopkins University Center for Systems Science and Engineering and &hellip; <a class=\"go\" href=\"https:\/\/blog.mozilla.org\/data\/2020\/03\/30\/opening-data-to-understand-social-distancing\/\">Read more<\/a><\/p>\n","protected":false},"author":1749,"featured_media":154,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[323282,315987],"tags":[448303,448304],"coauthors":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/148"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/users\/1749"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/comments?post=148"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/posts\/148\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media\/154"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/media?parent=148"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/categories?post=148"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/tags?post=148"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/data\/wp-json\/wp\/v2\/coauthors?post=148"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}