What Does the Fox Weigh? ♫

Ali Almossawi


Unlike physical objects, it might not always be apparent what wear and tear looks like in software. Parts don’t squeak or crack or moan in the same way, and to the chagrin of many a researcher, there are no fundamental laws in software like there are in nature. And so best practices and good advice are generally the camels on whose backs we travel, occasionally sneering at onlookers for lacking the requisite level of faith in our idealized theories of how things ought to be done.

Yet the road to engineering from mere development, a characterization that I first read in Garlan and Shaw’s introduction to software architecture, is filled with noble efforts like that of devising metrics and building models of quality to help track and manage the complexity that is inherent in software. Complexity that is essential, as Brooks might say.

But quality is ambiguous. Nobody really knows what one thing it is meant to measure, which is why we define quality in concrete and local terms. “Building for quality means building for maintainability,” one may proclaim, to which another may respond, “To us, it’s more about performance.” Often though, it is a combination of attributes that span multiple categories.

Maintainability is what I’m interested in, for personal reasons, as I have witnessed the wrath of code that rips through muscle and bone and condemns the souls of all who dare approach it to eternal damnation—technical debt as it is sometimes called. The metaphor is a useful one, since it posits that software development malpractices incur a debt that must be paid at some point in the future in the form of time, effort or defects.

This accumulation of debt can take months to become noticeable and so detecting it is only possible when we observe a system over a period of time. It is by way of this sort of transparency that we see things like drops and rises in a system’s level of maintainability, indicators that we can then use to inform decisions about the product, process and project.

In order to provide such a lens for those interested in tracking the maintainability of the Firefox codebase, I have put together a dashboard that tracks six measures of architectural complexity. They constitute a set of measures that studies have shown to be good predictors of quality. Many other measures exist too.


  1. Lines of code: The number of executable lines of code, not counting comments and blank lines. Though lines of code is the simplest measure of a system’s complexity, some practitioners argue that it remains one of the best predictors of quality.
  1. Files: The number of files, not counting filtered files and unit tests. A file is analogous to a component in other engineered systems, seeing as it is not as atomic as, say, a screw or a bolt and not as large as a module.

Intra-component complexity

  1. Cyclomatic complexity: The number of linearly independent paths of execution within a system per line of code. These are paths that occur as a result of branching constructs like if-else statements. For the sake of readability, the measure is per 1,000 lines of code, so a cyclomatic complexity value of 200 means that there are around 200 independent paths in every 1,000 lines of code.

Inter-component complexity

  1. Dependencies: The number of files that the average file can directly impact. A file depends on another if it includes, calls, sets, uses, casts, or refers to one or more items in that file. We can determine the average number of dependencies in a system by building an adjacency matrix of its components.
  1. Propagation: The proportion of files in a system that are connected, either directly or indirectly. In practical terms, propagation gives a sense of the total reach of a change to a file. We calculate propagation through a process of matrix multiplication.
  1. Highly interconnected files: Files that are interconnected via a chain of cyclic dependencies. These are pairs of files in a system that have a lot of dependencies between each other. Highly interconnected files may be correlated with propagation.

*  *  *

The analysis runs daily on revisions in Mozilla’s central tree, first on the entire codebase and then on a set of top-level directories, which are meant to constitute individual modules. In addition to the interface, the dashboard includes an endpoint that allows one to specify the path to a file in the Firefox codebase, for which it returns the set of inward and outward dependencies. This information can be useful for things like determining what subset of tests to run for a particular commit.

The entire code for the analyzer and the interface, most of which is Python and JavaScript, is available in a public repository, as is the documentation for how to set things up and modify default behavior.

By following the documentation, you should be able to run the analysis on your own codebases. A previous project, for instance, ran it on 23 releases of Chromium, which served as a useful benchmark.

Seeing as the work is still at a fairly embryonic stage, if you’re interested in this sort of thing, I invite you to get in touch or contribute your thoughts on GitHub.

Introducing MetricsGraphics.js

Hamilton Ulmer


MetricsGraphics.js is an opinionated library that aims to take the mystery and complication out of presenting and visualizing data. It offers line charts, scatterplots, bar charts, histograms and data tables* and elevates the layout and explanation of these graphics to the same level of priority as the graphics. The emergent philosophy is one of efficiency and practicality. By following the standards embodied by the library, you will make beautiful, concise and impactful presentations and dashboards.

Today marks the release of v1.0—our first public release. We began building the library somewhat inadvertently earlier this year, during which time we found ourselves copy-and-pasting bits of code in various projects. Naturally, this led to errors and inconsistent features. We decided that it made sense to develop a single library that provides common functionality and aesthetics to all of our internal projects.

With that in mind, MetricsGraphics.js was born. The library encapsulates what we believe are best practices for effective data presentation, practices that have guided our development process. MetricsGraphics.js follows four principles:

  1. You only need a few types of graphics to tell most business stories.
  2. Usability and conciseness are absolutely necessary in making a successful data graphic. Users should be guided toward customization options that enhance the data presentation, not superfluous visual tweaks.
  3. Layout, annotation, and explanation are as important as graphing and should be given as high a priority as the data. Presentations should work across a wide variety of contexts and devices.
  4. Development of the library should follow real needs, not imagined ones.

MetricsGraphics.js focuses on a few types of graphics that we believe are important. We have deliberately left out a number of common graphic types because we feel that they aren’t particularly useful and can sometimes be liable to misinterpretation. Each of the graphic types that we offer comes with a wealth of useful options that cover a wide range of use-cases. Our API is very straightforward. Producing a simple line chart, for instance, is as simple as:

var data = [{'date': new Date('2014-10-01'), 'value': 434034},...]

  target: 'div#line-plot',
  data:   data,
  title: "The first line plot.",
  description: "This is a line plot, for sure.", // appears in tooltip
  x_accessor: 'date',
  y_accessor: 'value'

Updating an existing graphic’s options or data is just as easy, seeing as the same function that’s used to create the data graphic is also used to update it. We don’t maintain state. 

We consider layout to be just as important as visualization. This is an aspect that many charting libraries leave out. The library ships with tooltips and textual descriptions, as well as a custom layout that you may wish to use in your own project. We believe that a good layout and a concise story are vital to acclimating customers to the data, and so the sample layout in the demo aims to do just that.

As a final point, we follow a ‘real needs’ approach to development. Right now, we have mostly implemented features that have been important to us internally. Having said that, our work is available on Github, as are many of our discussions, and we take any and all pull requests and issues seriously.

There is still a lot of work to be done. But this little library already has serious teeth. We invite you to try it out!

Check out the MetricsGraphics.js repository on Github.

*Data tables aren’t available just yet, but they are on the horizon.

Firefox Health Report



At Mozilla we believe that openness, innovation, and opportunity are key to the continued health of the Internet and we are committed to building Web products and services that provide outstanding functionality and capability to the user. This post describes a new feature that we plan to release into Firefox called Firefox Health Report which will share Firefox product information with Mozilla and its users to provide a better browsing experience.

Better “Motoring” on the Open Web with Firefox Health Report

The modern car provides a good analogy for what we are planning to achieve with the Firefox Health Report. Earlier in its 100 year history the car was seen as novel and exciting, opening new opportunities for individuals and society. However the car was also seen as often capricious and sometimes dangerous – making users endure unwelcome anxieties.  Today, cars have become a differentiated product maintaining all the positive promises of an earlier age.  But much of the angst of earlier times is diminished by improved reliability, increased safety measures, and better maintenance approaches.  A key to this improved state is better data efficiently used by the car manufacturer to deliver an excellent driving experience.

All cars today come with the capability of logging or recording critical information on the car’s on-board computer.  This information relates to the performance of critical sub-systems, the condition of key mechanical characteristics, and the occurrence of anomalous or dangerous events.  This information is used by on-board control systems to advise the driver of potential problems or areas for improvement (e.g., check engine light) and used by service personnel to diagnose problems and determine repair actions (i.e., clearing service codes).  In more sophisticated instances the information can be used to improve or programmatically optimize the vehicle by providing field upgrades for the on-board control software.

Moreover, the information collected from a vast number of cars is invaluable to each manufacturer in optimizing their products.  The information supports better maintenance and warranty programs and facilitates product recalls, if necessary.  The information gives comprehensive insights into the empirical driving conditions encountered by their drivers.  All the aggregated evidence from the field operation of cars is consumed in the design process and impacts positively the quality and functionality of future products.

Most importantly, these on-board data systems serve the in-car needs of drivers and riders.  Better summary instrumentation delivers maximum relevant data for a fixed budget of driver attention – car cockpits exhibit marvelous economies of design.  Under the hood, automatic closed loop systems on the power-train perform constrained optimization – for safety, fuel efficiency, performance, or other composite driving styles that the driver can select – economy, sport, off-road, etc…

Why Your Browser is Like Your Car

Today, Mozilla’s ability to deliver excellence to our Firefox users is quite limited.

Up to now Mozilla has counted only Firefox installations and has some very basic information to allow limited cross-tabulations of these installations, without having any ability to assess the longitudinal trends on these population characteristics. Metaphorically speaking, the standard of our products statistics is frozen in the 1940s or 1950s.

Modern evidence-based approaches to delivering a viable, let alone optimized, Firefox product demand more Firefox installation data, but acquired in a very carefully considered manner and with full disclosure of our motivations. We are transparent about our argument as to the existential necessity for Firefox functionality and our explicit social contract with the community around data and its ownership and stewardship.

The Firefox Health Report

Our philosophy and mission set a very high standard of respecting user data and privacy (see Mitchell’s recent post). We are also commanded to make our products not just good but excellent, providing the best user experience in a secure manner. This new product feature will allow us to deliver an improved Firefox product that better serves users, both individually and collectively.  Our proposal is driven by the best of scientific and analytical intent and takes the greatest of pains to manage downward the amount of data collected.  Data needs are set to the minimum –necessary level.  So let me explain what FHR will do.

FHR will collect data on the following aspects of the browser instance:

  • Configuration data – for example, device hardware, operating system, Firefox version
  • Customizations data – for example, add-ons, count and type
  • Performance data – for example, timing of browser events, rendering, session restores
  • Wear and Tear data – for example, length of session, how old a profile is, count of crashes

The car analogy drives home the point that we are interested in the browser instance (car) rather than the user (driver).  In fact the information recorded is a pooled blend of the characteristics of all browser instances of a given profile.  Needless to say, we – as in the auto case – have no interest in where the browser has been – search terms, keyword and location are not collected.

The Firefox Health Report provides the following benefits:

  • User insights exhibited on-board the browser instance through visualizations and comparative graphics.
  • Product insights conveyed to Mozilla – the manufacturer or designer of the car– to help in improving existing browser instances and especially to more fully inform future design and development of Firefox.
  • Provide Mozilla with the ability to streamline and reduce duplicate information it collects across other products such as Telemetry.

The Firefox Health Report will land in the Nightly build soon. For more information about it please take a look at this FAQ, or ask questions about it here by posting a comment. We’ll provide further updates when FHR becomes available in Nightly.

How is the Time to Start Firefox affected by EARLY_GLUESTARTUP_READ_OPS?

Saptarshi Guha


We were asked to determine if EARLY_GLUESTARTUP_READ_OPS affected the startup time for Firefox. It is expected that when the former is 0, the startup time is shorter.

The bug https://bugzilla.mozilla.org/show_bug.cgi?id=757215 describes this in more detail (and a place for the reader to comment).

For our analysis on this, see http://people.mozilla.org/~sguha/757215.html .


Sampling Crash Volumes, Rates and Rarity for Socorro Samples

Saptarshi Guha


1 Introduction

The Socorro crash report accumulation pipeline does not process all the crash reports. Though every report is stored on disk, only 10% are processed and saved in HBase as JSON objects. Each crash report has a crash signature (Crash Report Signature or CRS for short). The relationship between crash reports and CRSs is many to one.

Consumers of the crash reports (engineers working on bugfixes, product managers to name a few) had concerns regarding the use of samples. For example, some asked if the 10% sampling is a viable sampling rate to accurately estimate the frequency of the CRSs and if not all of them, then how accurate are the top N most frequently observed crash report signatures? With FF’s usage running into the 100 millions, we can expect new CRSs to be coming in every day. Some are very rare (occurs for a small user base) and others more frequent. How many days can we expect to wait till we see 50% of all the CRS that come in (for a given version)?

To answer these questions, the #breakpad team processed every crash report for the week 03/22-29/2011 , post Firefox 4 release. This served as a full enumeration of the crash report data. The full enumeration contained 2.2MM crash reports belonging to 84,760 CRSs.

Primarily, the crash-stats dashboard lists the top 100 most frequent crashes by OS. Some questions,

  • How accurate are the sample estimates? Does the top 100 from a sample equal the top 100 from the full enumeration (population) and are the proportion estimates accurate?
  • Given estimates, can we provide something about their accuracy?
  • How many distinct crash types are there? Throttling is a random sample of incoming crash reports. If in a 10% sample, we observe ‘N’ CRSs, can we estimate how many there in the population i.e. how many haven’t we seen? Estimating the number of unique CRS is entirely different from estimating the proportions of the CRS.


For more read:  http://people.mozilla.org/~sguha/species.crash.report.html

Some Results for Memshrink

Saptarshi Guha


As the website ( https://wiki.mozilla.org/Performance/MemShrink ) describes, MemShrink is a project to reduce Firefox memory consumption. Summary (taken from the webpage) is

  • Speed. Firefox will be faster due to less cache pressure, less paging, and fewer/smaller GC and CC pauses. Changes that reduce memory consumption but make Firefox slower are not desirable.
  • Stability. Firefox will suffer fewer aborts/crashes due to virtual or physical memory exhaustion. The former is mostly a problem on 32-bit Windows builds with a 2GB or 4GB virtual memory limit, the latter is mostly a problem on mobile devices that lack swap space.

The engineers working on MemShrink asked the Metrics team to help discover and quantify what variables affect variables that related to MemShrink. Key among these is RESIDENT_MEMORY which is the resident memory that Firefox occupies. For a given installation, multiple measurements are taken before the data is submitted. The data, for a given installation, is recorded as a histogram (so we dont have serial correlations between observations …), and the final value used in modeling is the weighted mean.



More discussion and the full analysis can be found at : http://people.mozilla.org/~sguha/memshrink.analysis.html . In short, there is a lot of variation in the data and the variables used do a poor job explaining the variance in the data. That said,

  • version 11 reduces memory consumption about 2% (on average, but keep in mind there is a lot of variation) over FF10
  • version 12 ups it by about 11% over v.10 (see the distribution at top of log of RESIDENT_MEMORY by version)
  • Presence of Firebug extension causes a slight increase (on average of 12%) but the difference decreases with FF12.
  • if one doubles the number of addons (and add 1 to this) the RESIDENT_MEMORY increases by approximately 33%

For more details and plots see http://people.mozilla.org/~sguha/memshrink.analysis.html .

Comparing the Bias in Telemetry Data vs The Typical Firefox User

Saptarshi Guha


Telemetry  is a feature in Firefox that captures performance metrics such as start up time, DNS latency among others. The number of metrics captured is in the order of a couple hundred. The data is sent back to the Mozilla Bagheera servers  which is then analyzed by the engineers.

The Telemetry feature asks the Nightly/Aurora (pre-release) users  if they would like to submit their anonymized performance data . This resulted in  a response rate (number of people who opted in divided by the number of people who were asked) of less than 3%. This led to two concerns: small number of responses (which changed when Telemetry became part of  Firefox release) and more importantly representativeness: are the performance measurements as collected from the 3% representative of those of people who chose not to  opt in?

Measuring the bias is not easy unless we have measurements about the users who did not opt in. Firefox sends the following pieces of information to the Mozilla servers: operating system, Firefox version, extension identifiers and the time for the session to be restored. This is sent by all Firefox installations unless the distribution or user have the feature turned off (this is called the services AMO ping). The Telemetry data contains the same pieces of information.

What this implies is that we have start up times for i) the users who opted in to Telemetry and ii) everyone. We can now answer the question “Are the startup times for the people who opted into Telemetry representative of the typical Firefox user?”

Note: ‘everyone’ is almost everyone. Very few have this feature turned off.

Data Collection

We collected start up times for Firefox 7,8 and 9 for November, 2011 from the log files of services.addons.mozilla.org (SAMO). We also took the same information for the same period from the Telemetry data contained in HBase ( some code examples can be found at the end of the article).


Are start up times different by Firefox version and/or Source, where source can be SAMO or Telemetry.


Figure 1 is boxplot of log of start up time for Telemetry (tele) vs. SAMO (samo) by Firefox version. At first glance it appears the start up times from Telemetry are less than those of SAMO. But the length of the bars makes it difficult to stand by this conclusion.

Figure 1: Boxplot of Log SessionRestored for Telemetry/SAMO by FF Version

Figure 1: Boxplot of Log SessionRestored for Telemetry/SAMO by FF Version

Figure 2 is the difference in the deciles of log of start up time. In other words, approximately speaking, the deciles of ratio of Telemetry start up time to SAMO start up time. The medians hover in the 0.8 region, though the bars are very wide and do not support to a the quick conclusion that Telemetry start up time is smaller.

Figure 2: Difference of Deciles of Logs

Figure 2: Difference of Deciles of Logs

In Figure 3, we have the mean of medians of 1000 samples: red circles are for telemetry and black for SAMO. The ends of the line segments correspond the sample 95% confidence interval (based on the sample of sample medians). The CI for the SAMO data lies entirely within that of the Telemetry data. This makes one believe that the two groups are not different.

Figure 4: Mean of the medians (circles) with their 95% confidence intervals. Red isTelemetry, Black is SAMO

Figure 4: Mean of the medians (circles) with their 95% confidence intervals. Red isTelemetry, Black is SAMO

Analysis of Variance

For a more numerical approach, we can estimate the analayis of variance components. The model is

log(startup time) ~ version + src

(we ignore interaction). Since the data is in the order of billions of rows, I instead take 1000 samples of approximately 20,000 (sampling rate of 0.001%) rows each. Compute ANOVA results of each and then average the summary tables of the lm function in R. In other words we make our conclusions based on the average of the 1000 samples of ~20,000 rows each. ( I should point out that the residuals (as per a quick visual check) were roughly distributed as gaussian and other diagnostics came out clean)

The average ANOVA indicates does not support version effect or source effect (at the 1% level). In other words, the log of start up time is not affected by the version nor is it affected by the source (Telemetry/ SAMO).

               Estimate Std. Error     t value   Pr(>|t|)
(Intercept)  8.62635472 0.01171420 736.4390937 0.00000000
vers8       -0.05995627 0.01928947  -3.1089666 0.02922402
vers9       -0.03382135 0.10466330  -0.3247165 0.48286903
vers10      -0.03862282 0.29308642  -0.1418623 0.48228122
srctele     -0.02290538 0.03946150  -0.5811779 0.45300964

This is good news! Insofar start up time is concerned, Telemetry is representative of SAMO.

A Different Approach and Some Checks

By now, the reader should note that we have answered our question (see last line of previous section). Two questions remain:

1. The samples are representative. We are sampling on 3 dimensions: startup time, src and version. Consider the 1000 quantiles of startup time, the 2 levels of src and 4 levels of version. All in all, we have 1000x2x4 or 8000 cells. Sampling from the population might result in several empty cells, so much so, that the joint distribution of the sample might be very different from that of the population. To confirm that our cell distribution of the samples reflect the cell distribution of the population, we computed Chi Square tests comparing the sample cell counts with that of the parent. All 1000 samples passed!

2. Why use samples? We can do a log linear regression testing on the 8000 cell counts (i.e all the 1.9 BN data points) . This of course loses a lot of power: we are binning the data and all monotonic transformations are equivalent. The model equivalent (using R’s formula language) of the ANOVA described above is

log(cell count) ~ src+ver+binned_startup:(src+ver)

 If the effects of binned_startup:src and binned_startup:ver are not significant this corresponds to our conclusion in the previous section. And nicely enough, it does!  Output of summary(aov(glm(…))) is

summary(aov(glmout <- glm(n~ver+src+sesscut:(ver+src)
                          , family=poisson
                          , data=cells3.parent))
              Df     Sum Sq    Mean Sq   F value Pr(>F)
ver            3 4.6465e+14 1.5488e+14 1131.8666 <2e-16 ***
src            1 3.2705e+14 3.2705e+14 2390.0704 <2e-16 ***
ver:sesscut 3952 5.4969e+13 1.3909e+10    0.1016      1
src:sesscut  988 2.0009e+13 2.0252e+10    0.1480      1
Residuals   2967 4.0600e+14 1.3684e+11

Some R Code and Data Sizes:

1. The data for SAMO was obtained from Hive, sent to a text file and then imported to blocked R data frames using RHIPE. All subsequent analysis was done using RHIPE.

2. The data for Telemetry, was obtained from Hbase using Pig (RHIPE can read HBase, but I couldn’t install it on this particular cluster). The text data was then imported as blocked R data frames and placed in the same directory as the
imported SAMO data.

3. Data sizes were in the few hundreds of gigabytes. All computations were done using RHIPE (R not on the on the nodes) on  a 350TB/33 node Hadoop cluster.

3. I include some sample code to give a flavor of RHIPE.

Importing text data as Data Frames

map         <- expression({
  ln        <- strsplit(unlist(map.values),"\001")
  a         <- do.call("rbind",ln)
  addonping <- data.frame(ds=a[,1]
z <- rhmr(map=map

Creating Random Samples

map         <- expression({
  y         <- do.call('rbind', map.values)
  p         <- 20000/1923725302
  for(i in 1:1000){
    zz      <- runif(nrow(y)) < p
    mu      <- y[zz,,drop=FALSE]
reduce      <- expression(
    pre={ x <- NULL}
    ,reduce = {
      x     <- rbind(x,do.call('rbind',reduce.values))
    ,post={ rhcollect(reduce.key,x) }
z <- rhmr(map=map,reduce=reduce

Run Models Across Samples

map        <- expression({
  cuts     <- unserialize(charToRaw(Sys.getenv("mcuts")))
  lapply(map.values, function(y){
    y$tval <- sapply(y$sesssionrestored
                     ,function(r) {
                       if(is.na(r)) return( r)
    mdl    <- lm(log(tval)~vers+src,data=y)
    rhcollect(NULL, summary(mdl))
z <- rhmr(map=map

Computing Cell Counts For A Log Linear Model

cuts2                <- wtd.quantile(tms$x,tms$n,
cuts2[1]             <- cuts[1]
cuts2[length(cuts2)] <- cuts[2]
map.count <- expression({
  cuts       <- unserialize(charToRaw(Sys.getenv("mcuts")))
  z          <- do.call(rbind,map.values)
  z$tval     <- sapply(z$sesssionrestored,function(r)
  z$sessCuts <-
  f          <- split(z,list(z$vers,z$sessCuts,z$src),drop=FALSE)
  for(i in seq_along(f)){
    y <-strsplit(names(f)[[i]],"\\.")[[1]]
    rhcollect(y,nrow(f[[i]])) }
z <-
       ,inout=c("seq","seq") ,mapred=
            ,mcuts=rawToChar(serialize(cuts2, NULL,

Understanding DNT Adoption within Firefox



UPDATED 2011-09-08 11:55am PST: changed the description of how we store and retain IP address to be more accurate

On March 23rd, Mozilla launched its newest and most awesome browser: Firefox 4. Along with a plethora of features, including faster performance, better security and the whole nine yards, Firefox 4 included a cutting edge privacy feature called Do No Track (DNT). For the uninitiated, DNT simply tells sites “I don’t want to be tracked” via a HTTP header visible to all advertisers and publishers.
Mozilla’s new Privacy Blog  has several posts on the feature, including a new one today releasing a Do Not Track Field Guide for developers. Based on our current numbers, we’ve been seeing for several weeks now just under 5% of our users with DNT turned on within Firefox.
The Mozilla team is all about experimenting. We love innovating new technologies that do good and benefit the community as a whole. As Firefox 4 kept breaking records, the peak was 5,500 downloads/minute, (source: http://blog.mozilla.org/blog/2011/03/25/the-first-48-hours-of-mozilla-firefox-4/) we felt that it would be important to understand whether people were enabling DNT. Every Metrics guy lives and dies by the data. In late 2010, the metrics team gave a small talk on how we collect log data (click here for the video ppt). While that project has gone multiple iterations over time, the basic premise is still the same:

  • Grab logs from multiple data-centers.
  • Split out anonymized and non-anonymized data into two separate files
  • Store both sets of files in HDFS
  • Create relevant partitions inside HIVE
  • Query the data.
  • Drool over the stats.

(Non-anonymized data such as IP address has a 6-month retention policy and is deleted on expiration)

We decided to follow the same approach for calculating DNT stats. Once every day, each Firefox instance pings the AUS servers with respect to its DNT status. The ping request looks something like this:
“DNT:-” User has NOT set DNT
“DNT:1” User HAS set DNT and does *not* wish to be tracked.

Armed with the following data points, a simple HIVE query gives us DNT stats for a given day:

   SELECT ds, dnt_type,  count(distinct ip_address)  FROM web_logs WHERE (request_url LIKE ‘%Firefox/4.0%’ OR request_url LIKE ‘%Firefox/5.0%’ OR request_url LIKE ‘%Firefox/6.0%’) AND dnt_type != ‘DNT:1, 1’ AND ds = ‘$dateTime’ GROUP BY ds, dnt_type ORDER BY ds desc;

The above script is run on a nightly basis and the result is then plotted over a time graph, as included with this post.

One BIG caveat:

The DNT numbers are being undercounted, primarily because we use hashed IP address as proxy for counting a unique user. This means, while there can be multiple users behind a given NAT with DNT set, the counter is incremented only once. This may account for why our numbers are a bit lower than those being reported by other groups, including the recent study of 100 million Firefox users conducted by Krux Digital

Possible Fix, NOT:

While it is possible to uniquely identify each instance of browser, doing so will require that we start tracking users, thereby defeating the exact purpose for why DNT was created in the first place.


Feel free to leave us a comment or email: (aphadke at_the_rate mozilla dot com – Anurag Phadke) for more information.

Do 90% of People Not Use CTRL+F?

Diyang Tang


According to an article in The Atlantic floating around the internet, 90% of users don’t know how to use CTRL+F or Command+F to search a webpage. We were surprised at that percentage. Fortunately, Mozilla has TestPilot studies with open data, and we can see if Firefox users behave similarly. One relevant 7-day TestPilot study of about 69,000 Windows users focused on Firefox’s user interface. Along with seeing how users interacted with the navigation bar, their bookmarks, etc., the study looked at how often people used keyboard shortcuts.

What we found is that about 81% of TestPilot users didn’t use CTRL+F during the course of the study. While 81% is lower than the 90% in the article, TestPilot users are usually more technologically experienced than the general population, since they are largely Firefox Beta users. When we look at TestPilot users who consider themselves beginners, the percentage goes up to 85%. Therefore, our 81% figure does not belie the Atlantic piece.

In addition, those who use CTRL+F on average use keyboard shortcuts twice as much as those who don’t, even when we ignore those people who don’t use any keyboard shortcuts at all. This implies that people who use CTRL+F are more comfortable with keyboard shortcuts in general. The only keyboard shortcut the users who use CTRL+F lag behind in is Full Screen, or F11.

Feel free to take a look at the data yourself and let us know about any interesting trends you discover!

Text mining users’ definitions of browsing privacy

Rebecca Weiss


One issue that’s been on everyone’s mind lately is privacy.  Privacy is extremely important to us at Mozilla, but it isn’t exactly clear how Firefox users define privacy.  For example, what do Firefox users consider to be essential privacy issues?  What features of a browsing experience lead users to consider a browser to be more or less private?

In order to answer  these questions, we asked users to give us their definitions of privacy, specifically privacy while browsing, in order to answer these questions.  The assumption was that users will have different definitions, but that there will be enough similarities between groups of responses that we could identify “themes” amongst the responses. By text mining user responses to an open-ended survey question asking for definitions of browsing privacy,  we were able to identify themes directly from the users’ mouths:

  1. Regarding  privacy issues, people know that tracking and browser history are  different issues, validating the need for browser features that address  these issues independently (“private browsing” and “do not track”)
  2. People’s definition of personal information vary, but we can group people  according to the different ways they refer to personal information (this leads to a natural follow-up question; what makes some information more personal than others?)
  3. Previous focus group research, contracted by Mozilla, showed that users are aware that spam indicates a  security risk, but what didn’t come out of the focus group research was that users also also consider spam to be an invasion of their privacy (a follow-up question, what do users define as “spam?”  Do they consider targeted ads to be spam?)
  4. There are users who don’t distinguish privacy and security from each other

Some previous research on browsing and privacy

We  knew from our own focus group research that users are concerned about viruses, theft of their personal information and passwords, that a  website might misuse their information, that someone may track their  online “footprint”, or that their browser history is visible to others.   Users view things like targeted ads, spam, browser crashes, popups, and  windows imploring them to install updates as security risks.

But it’s difficult to broadly generalize findings from focus groups.  One group may or may not have the same concerns as the general population.  The quality of the discussion moderator, or some unique combination of participants,  the moderator, and/or the setting can also influence the findings you get from focus groups.

One way of validating the representativeness of focus group research is to use surveys.  But while surveys may increase the representativeness of your findings, they are not as flexible as focus groups.  You have to give survey respondents their answer options up front.  Therefore, by providing the options that a respondent can endorse, you are limiting their voice.

A typical  way to approach this problem in surveys is to use open-ended survey questions.  In the pre-data mining days, we would have to manually code  each of these survey responses: a first pass of all responses to get an idea of respondent “themes” or “topics” and a second pass to code each  response according to those themes.  This approach is costly in terms of time and effort, plus it also suffers from the problem of reproducibility; unless themes are extremely obvious, different coders might not classify a response as part of the same theme.  But with modern text mining methods, we can simulate this coding process much more quickly and reproducibly.

Text mining open-ended survey questions

Because text mining is growing in popularity primarily due to its computational feasibility , it’s important to review the  methods in some detail.  Text mining, as with any machine learning-based approach, isn’t magic.  There are a number of caveats to make about the text mining approach used. First, the clustering algorithm I chose to use requires an arbitrary and a priori decision regarding the number of clusters.  I looked at 4 to 8 clusters and decided that 6 provided the best trade-off between themes expressed and redundancy.  Second, there is a random component to  clustering, meaning that one clustering of the same set of data may not produce the exact same results as another clustering. Theoretically,  there shouldn’t be tremendous differences between the themes expressed in one clustering over another, but it’s important to keep these details in mind.

The general idea of text mining is to assume that you can represent documents as “bags of words”, that bags of words can be represented or coded quantitatively, and that the quantitative representation of text can be projected into a multi-dimensional space. For example, I can represent survey respondents in two dimensions, where each point is a respondent’s answer.  Points that are tightly clustered together mean that these responses are theoretically very similar with respect to lexical content (e.g., commonality of words).

I  also calculated a score that identifies the relative frequency of each word in a cluster, which is reflected in the size of the word on each  cluster’s graph.  In essence, the larger the word, the more it “defines”  the cluster (i.e. its location and shape in the space).

Higher resolution .pdf files of these graphs can be found here and here.

Cluster summaries

  • “Privacy and Personal information”: Clusters  1, 4, and 5 are dominated by, unsurprisingly, concerns about  information.  What’s interesting are the lower-level associations  between the clusters and the words.  The largest, densest cluster  (cluster 4)  deals mostly with access to personal information whereas  cluster 1 addresses personal information as it relates to identity  issues (such as when banking).  Cluster 5 is subtly different from both 1  and 4.  The extra emphasis on “share” could imply that users have  different expectations of privacy with personal information that they explicitly choose to leak onto the web as opposed to personal information that they  aren’t aware they are expressing.  One area of further investigation would be to seek out user definitions on personal information; what makes some information more “personal” than others?
  • “Privacy and Tracking”: Cluster  6 clearly shows that people associate being tracked as a privacy issue.   The lower-scored words indicate what kind of tracked information  concerns them (e.g., keystrokes, cookies, site visits), but in general  the notion of “tracking” is paramount to respondents in this cluster.   Compare this with cluster 2, which is more strongly defined by the words  “look” and “history.”  This is obviously a reference to the role that  browsing history has in defining privacy.  It’s interesting that these clusters are so distinct from each other, because it implies that users  are aware there is a difference between their browser history and other  behaviors they exhibit that could be tracked.  It’s also interesting  that users who consider browser history a privacy issue also consider  advertising and ads (presumably a reference to targeted ads) as privacy  issues as well.  We can use this information to extend the focus group  research on targeted ads; in addition to a security risk, some users  also view targeted ads as an invasion of privacy.  One interesting question naturally arises: do users differentiate between spam and  targeted advertisements?
  • “Privacy and Security”: The  weakest defined group is cluster 3, which can be interpreted in many ways.  The least controversial inference could be that these users simply don’t have a strong definition of privacy aside from a notion  that privacy is related to identity and security.  This validates a notion from our focus group research that some users really don’t differentiate between privacy and  security.

Final thoughts

User  privacy and browser security are very important to us at Mozilla, and  developing a product that improves on both requires a deep and evolving  understanding of what those words mean to people of all communities – our entire user population.    In this post, we’ve shown how text mining can enhance our understanding  of pre-existing focus group research and generate novel directions for  further research. Moreover, we’ve also shown how it can provide insight into  users’ perception by looking at the differences in the language they use  to define a concept.  In the next post, I’ll be using the same text  mining approach to evaluate user definitions of security while browsing  the web.