As with many software companies, we are keenly interested in gauging active installations and understanding how people use our products. However, as a non-profit organization with a strong interest in promoting privacy, we also recognize there’s a fine line between this activity and tracking people in unwelcome ways. Mozilla has developed a new mechanism that further enhances privacy, while still meeting its objective to create usage metrics.
1. Measuring Usage Activity
1.1 Our Current Approach: Blocklist Cookies
Mozilla has a mechanism for maintaining installed add-ons called a “blocklist” [https://wiki.mozilla.org/Blocklisting]. This involves a scheduled request to retrieve an updated blocklist file from Mozilla servers. The request is currently performed not more than once per 24 hours by several Gecko powered applications maintained by Mozilla (e.g., Firefox Desktop and Mobile). Because the request only happens once per day, we can study the pattern and volume of requests to understand how many active installations of a product there are on a particular day. While requests do not collect or use any personal user information, they are currently using a cookie to study how many unique active installations we see during a given time period.
1.2 From FF4 Onwards: Days Since Last Ping
It’s been our experience that using cookies for tracking unique usage is somewhat unreliable and cookies raise privacy issues for our users. Cookies can be cleared by the user and sometimes even corrupted via proxies. Because of these reasons, the Mozilla Metrics team filed bug 616835 [1] to implement a new method for tracking unique installations without any need for a cookie or any other form of identifier. This not only gives us the ability to get better usage metrics, but also to strengthen user privacy by removing the old cookie entirely.
[1] https://bugzilla.mozilla.org/show_bug.cgi?id=616835
Design and Implementation of Days Since Last Ping
Each time a request is made for the blocklist data, the request includes a new parameter that indicates how many days it has been since the last request. There is very little possibility to derive a fingerprint from this new parameter since it is a low number of bits, it changes on every request, and users will not maintain outlier values unless they consistently have a pattern of extremely occasional usage (i.e. months between usage of the application). If Firefox is left open unattended for 2 weeks, the days last ping value will be 1 for every day even though the user might never have been at their computer.
Computing Active Installations
For each day in the desired time period, we add all the requests with a value indicating that either this is the first ever request to blocklist (a new installation), or the last time the application made the request was before the time period we are analyzing. This means that on the first day, we count all requests with a valid parameter (i.e. between 1 and max_valid). On the second day of the time period, we count all requests with a parameter between 2 and max_valid. After iterating through each day of the time period using this algorithm, we sum all the counts together and we have the number of unique active installations in that time period.
Example
Consider the date range 04 March – 07 March. We proceed as follows:
1. For March 4th, add the ‘new’ count and the number of days last ping (dlp) ==n, n>=1.
2. For March 5th, add the ‘new’ count and number of dlp==n, n>=2. We ignore dlp==1 because those installations would have made the blocklist check with some of value of dlp on March 4th.
3. Similarly for March 6th and March 7th, add ‘new’ and counts of dlp==n, n>=3 and n>=4 respectively.
4. Add all the counts in (1)-(3) to get number of unique active installations in the above period.
With this metric we can also compute the number of new installations being added on a daily basis. Also, we would like to confirm that we have a high proportion of profiles with days last ping equal to 1. This is not the same as a ‘daily user’ but it is encouraging to see a lot of installations using Firefox for two consecutive days.
Limitations
What we can’t compute the number of installations using Firefox for exactly ‘k’ days in a week or retention patterns.
2. Visualizing the Behavior of the Days Last Ping Metric
Firefox 4 was released on 22nd March though beta and release candidates were available before that. This is a great opportunity to visualize the dynamics of a metric for a new product.
2.1 New Installations
We are eager to see what proportion of our daily ‘blocklist pings’ come from new installations. Figure 1 displays the proportion of blocklist pings that come from new installations. The heartening observation is the positive slope of the red smoother curve. The peaks are not day of the week effects but correspond to release dates. It is difficult to comment on day of week effects here because of the significant events that occurred in the time period. Nevertheless, new profile percentage appears to around 3.5-4.5% on a daily basis with a slight increasing trend towards the end of the period.
2.2 ‘Daily’ Usage
Figure 2 is a display of the proportion of blocklist updates that come from installations with days last ping equal to 1 which means the installation is active today and the previous day. The proportion varies from 72% to 86%, with a mean hovering around 82%. The red smoother indicates not much change. The 1st,3rd and 5th week have a similar pattern: a low at the beginning of the week (a value of dlp equal to 1 on Monday means that the profile used FF on Sunday), peaking towards the center of the week and dipping as we approach the weekend. The 4th week was the week of the release. Understandably this looks very different. In both Figures 2 and 1, week 2 looks different , probably because of the RC release. I would like to say there is no weekly effect, and indeed the shape is same (except for the two exceptional weeks) but the highs and lows are different.
2.3 ‘Recent’ Usage
Together with counts of days last ping less than 7, we capture more than 90% of the users. Figure 3 shows the proportion of blocklist pings that come from installations that last contacted between 2 to 7 days back. The alignment of troughs and valleys is opposed to the dlp==1 display (Figure 2). There does not seem to be an increasing trend, with the mean around 12-14%, peaking on Thursdays. Why is that? Because the bulk come from dlp=3. On average (across day of week), 95.3 % of the blocklist pings have dlp<=3 and 98.9%<=7. Figure 4 displays the mean cumulative percent of different values of days last ping (between 2 to 7) by the day of week. The key observation is that the curve doesn’t change much – meaning there is little interaction involved here.
2.4 ‘Infrequent’ Usage
Finally, in Figure 5, we get to see the dynamics of the days last ping distribution for a new product. In Figure 5, we plot the density of days last ping greater than 14 for all the days. Each row is 7 days so we can fix a day of week by moving along columns.
Firstly, we see the distribution shifting out to the tails. On one hand this is expected as there is more time available for installations to be used after a long period of inactivity. Also the maximum proportion (see Figure 6) decreases steadily over time from 0.1% to about 0.03 % on 3rd April, dramatically so a few days after FF4 launch. However in both Figures 5 and 6, we see the peak starting to rise again. This means we have more 14 days plus inactive users using FF4. Whether this panel stabilizes is something we can see over the next few months.
2.5 Daily Actives vs. Weekly Actives
Surprisingly, if we look at unique actives over a week, vs the mean daily actives in a week, the numbers are relatively stable, for the 6 weeks the ratios are: 0.692 0.593 0.666 0.994 0.685 and 0.672
In future, we can look at rolling 7 day periods and other window lengths( e.g. 14 days, monthly etc ) and week on week on growth for unique actives.
Reassuring results, and we are all very eager to monitor the progress as FF4’s adoption increases.
Credits:
Thanks to Daniel Einspanjer for the introduction to the cookie usage and the background on the metrics ping.
jean bouff!ard wrote on :
Axel Hecht wrote on :