Introducing MetricsGraphics.js

Hamilton Ulmer

0

MetricsGraphics.js is an opinionated library that aims to take the mystery and complication out of presenting and visualizing data. It offers line charts, scatterplots, bar charts, histograms and data tables* and elevates the layout and explanation of these graphics to the same level of priority as the graphics. The emergent philosophy is one of efficiency and practicality. By following the standards embodied by the library, you will make beautiful, concise and impactful presentations and dashboards.

Today marks the release of v1.0—our first public release. We began building the library somewhat inadvertently earlier this year, during which time we found ourselves copy-and-pasting bits of code in various projects. Naturally, this led to errors and inconsistent features. We decided that it made sense to develop a single library that provides common functionality and aesthetics to all of our internal projects.

With that in mind, MetricsGraphics.js was born. The library encapsulates what we believe are best practices for effective data presentation, practices that have guided our development process. MetricsGraphics.js follows four principles:

  1. You only need a few types of graphics to tell most business stories.
  2. Usability and conciseness are absolutely necessary in making a successful data graphic. Users should be guided toward customization options that enhance the data presentation, not superfluous visual tweaks.
  3. Layout, annotation, and explanation are as important as graphing and should be given as high a priority as the data. Presentations should work across a wide variety of contexts and devices.
  4. Development of the library should follow real needs, not imagined ones.

MetricsGraphics.js focuses on a few types of graphics that we believe are important. We have deliberately left out a number of common graphic types because we feel that they aren’t particularly useful and can sometimes be liable to misinterpretation. Each of the graphic types that we offer comes with a wealth of useful options that cover a wide range of use-cases. Our API is very straightforward. Producing a simple line chart, for instance, is as simple as:

var data = [{'date': new Date('2014-10-01'), 'value': 434034},...]

data_graphic({
  target: 'div#line-plot',
  data:   data,
  title: "The first line plot.",
  description: "This is a line plot, for sure.", // appears in tooltip
  x_accessor: 'date',
  y_accessor: 'value'
})



Updating an existing graphic’s options or data is just as easy, seeing as the same function that’s used to create the data graphic is also used to update it. We don’t maintain state. 

We consider layout to be just as important as visualization. This is an aspect that many charting libraries leave out. The library ships with tooltips and textual descriptions, as well as a custom layout that you may wish to use in your own project. We believe that a good layout and a concise story are vital to acclimating customers to the data, and so the sample layout in the demo aims to do just that.

As a final point, we follow a ‘real needs’ approach to development. Right now, we have mostly implemented features that have been important to us internally. Having said that, our work is available on Github, as are many of our discussions, and we take any and all pull requests and issues seriously.

There is still a lot of work to be done. But this little library already has serious teeth. We invite you to try it out!

Check out the MetricsGraphics.js repository on Github.

*Data tables aren’t available just yet, but they are on the horizon.

Firefox Health Report

gfitzgerald

5

At Mozilla we believe that openness, innovation, and opportunity are key to the continued health of the Internet and we are committed to building Web products and services that provide outstanding functionality and capability to the user. This post describes a new feature that we plan to release into Firefox called Firefox Health Report which will share Firefox product information with Mozilla and its users to provide a better browsing experience.

Better “Motoring” on the Open Web with Firefox Health Report

The modern car provides a good analogy for what we are planning to achieve with the Firefox Health Report. Earlier in its 100 year history the car was seen as novel and exciting, opening new opportunities for individuals and society. However the car was also seen as often capricious and sometimes dangerous – making users endure unwelcome anxieties.  Today, cars have become a differentiated product maintaining all the positive promises of an earlier age.  But much of the angst of earlier times is diminished by improved reliability, increased safety measures, and better maintenance approaches.  A key to this improved state is better data efficiently used by the car manufacturer to deliver an excellent driving experience.

All cars today come with the capability of logging or recording critical information on the car’s on-board computer.  This information relates to the performance of critical sub-systems, the condition of key mechanical characteristics, and the occurrence of anomalous or dangerous events.  This information is used by on-board control systems to advise the driver of potential problems or areas for improvement (e.g., check engine light) and used by service personnel to diagnose problems and determine repair actions (i.e., clearing service codes).  In more sophisticated instances the information can be used to improve or programmatically optimize the vehicle by providing field upgrades for the on-board control software.

Moreover, the information collected from a vast number of cars is invaluable to each manufacturer in optimizing their products.  The information supports better maintenance and warranty programs and facilitates product recalls, if necessary.  The information gives comprehensive insights into the empirical driving conditions encountered by their drivers.  All the aggregated evidence from the field operation of cars is consumed in the design process and impacts positively the quality and functionality of future products.

Most importantly, these on-board data systems serve the in-car needs of drivers and riders.  Better summary instrumentation delivers maximum relevant data for a fixed budget of driver attention – car cockpits exhibit marvelous economies of design.  Under the hood, automatic closed loop systems on the power-train perform constrained optimization – for safety, fuel efficiency, performance, or other composite driving styles that the driver can select – economy, sport, off-road, etc…

Why Your Browser is Like Your Car

Today, Mozilla’s ability to deliver excellence to our Firefox users is quite limited.

Up to now Mozilla has counted only Firefox installations and has some very basic information to allow limited cross-tabulations of these installations, without having any ability to assess the longitudinal trends on these population characteristics. Metaphorically speaking, the standard of our products statistics is frozen in the 1940s or 1950s.

Modern evidence-based approaches to delivering a viable, let alone optimized, Firefox product demand more Firefox installation data, but acquired in a very carefully considered manner and with full disclosure of our motivations. We are transparent about our argument as to the existential necessity for Firefox functionality and our explicit social contract with the community around data and its ownership and stewardship.

The Firefox Health Report

Our philosophy and mission set a very high standard of respecting user data and privacy (see Mitchell’s recent post). We are also commanded to make our products not just good but excellent, providing the best user experience in a secure manner. This new product feature will allow us to deliver an improved Firefox product that better serves users, both individually and collectively.  Our proposal is driven by the best of scientific and analytical intent and takes the greatest of pains to manage downward the amount of data collected.  Data needs are set to the minimum –necessary level.  So let me explain what FHR will do.

FHR will collect data on the following aspects of the browser instance:

  • Configuration data – for example, device hardware, operating system, Firefox version
  • Customizations data – for example, add-ons, count and type
  • Performance data – for example, timing of browser events, rendering, session restores
  • Wear and Tear data – for example, length of session, how old a profile is, count of crashes

The car analogy drives home the point that we are interested in the browser instance (car) rather than the user (driver).  In fact the information recorded is a pooled blend of the characteristics of all browser instances of a given profile.  Needless to say, we – as in the auto case – have no interest in where the browser has been – search terms, keyword and location are not collected.

The Firefox Health Report provides the following benefits:

  • User insights exhibited on-board the browser instance through visualizations and comparative graphics.
  • Product insights conveyed to Mozilla – the manufacturer or designer of the car– to help in improving existing browser instances and especially to more fully inform future design and development of Firefox.
  • Provide Mozilla with the ability to streamline and reduce duplicate information it collects across other products such as Telemetry.

The Firefox Health Report will land in the Nightly build soon. For more information about it please take a look at this FAQ, or ask questions about it here by posting a comment. We’ll provide further updates when FHR becomes available in Nightly.

How is the Time to Start Firefox affected by EARLY_GLUESTARTUP_READ_OPS?

Saptarshi Guha

0

We were asked to determine if EARLY_GLUESTARTUP_READ_OPS affected the startup time for Firefox. It is expected that when the former is 0, the startup time is shorter.

The bug https://bugzilla.mozilla.org/show_bug.cgi?id=757215 describes this in more detail (and a place for the reader to comment).

For our analysis on this, see http://people.mozilla.org/~sguha/757215.html .

 

Sampling Crash Volumes, Rates and Rarity for Socorro Samples

Saptarshi Guha

3

1 Introduction

The Socorro crash report accumulation pipeline does not process all the crash reports. Though every report is stored on disk, only 10% are processed and saved in HBase as JSON objects. Each crash report has a crash signature (Crash Report Signature or CRS for short). The relationship between crash reports and CRSs is many to one.

Consumers of the crash reports (engineers working on bugfixes, product managers to name a few) had concerns regarding the use of samples. For example, some asked if the 10% sampling is a viable sampling rate to accurately estimate the frequency of the CRSs and if not all of them, then how accurate are the top N most frequently observed crash report signatures? With FF’s usage running into the 100 millions, we can expect new CRSs to be coming in every day. Some are very rare (occurs for a small user base) and others more frequent. How many days can we expect to wait till we see 50% of all the CRS that come in (for a given version)?

To answer these questions, the #breakpad team processed every crash report for the week 03/22-29/2011 , post Firefox 4 release. This served as a full enumeration of the crash report data. The full enumeration contained 2.2MM crash reports belonging to 84,760 CRSs.

Primarily, the crash-stats dashboard lists the top 100 most frequent crashes by OS. Some questions,

  • How accurate are the sample estimates? Does the top 100 from a sample equal the top 100 from the full enumeration (population) and are the proportion estimates accurate?
  • Given estimates, can we provide something about their accuracy?
  • How many distinct crash types are there? Throttling is a random sample of incoming crash reports. If in a 10% sample, we observe ‘N’ CRSs, can we estimate how many there in the population i.e. how many haven’t we seen? Estimating the number of unique CRS is entirely different from estimating the proportions of the CRS.

 

For more read:  http://people.mozilla.org/~sguha/species.crash.report.html

Some Results for Memshrink

Saptarshi Guha

3

As the website ( https://wiki.mozilla.org/Performance/MemShrink ) describes, MemShrink is a project to reduce Firefox memory consumption. Summary (taken from the webpage) is

  • Speed. Firefox will be faster due to less cache pressure, less paging, and fewer/smaller GC and CC pauses. Changes that reduce memory consumption but make Firefox slower are not desirable.
  • Stability. Firefox will suffer fewer aborts/crashes due to virtual or physical memory exhaustion. The former is mostly a problem on 32-bit Windows builds with a 2GB or 4GB virtual memory limit, the latter is mostly a problem on mobile devices that lack swap space.

The engineers working on MemShrink asked the Metrics team to help discover and quantify what variables affect variables that related to MemShrink. Key among these is RESIDENT_MEMORY which is the resident memory that Firefox occupies. For a given installation, multiple measurements are taken before the data is submitted. The data, for a given installation, is recorded as a histogram (so we dont have serial correlations between observations …), and the final value used in modeling is the weighted mean.

 

Summary

More discussion and the full analysis can be found at : http://people.mozilla.org/~sguha/memshrink.analysis.html . In short, there is a lot of variation in the data and the variables used do a poor job explaining the variance in the data. That said,

  • version 11 reduces memory consumption about 2% (on average, but keep in mind there is a lot of variation) over FF10
  • version 12 ups it by about 11% over v.10 (see the distribution at top of log of RESIDENT_MEMORY by version)
  • Presence of Firebug extension causes a slight increase (on average of 12%) but the difference decreases with FF12.
  • if one doubles the number of addons (and add 1 to this) the RESIDENT_MEMORY increases by approximately 33%

For more details and plots see http://people.mozilla.org/~sguha/memshrink.analysis.html .

Comparing the Bias in Telemetry Data vs The Typical Firefox User

Saptarshi Guha

1

Telemetry  is a feature in Firefox that captures performance metrics such as start up time, DNS latency among others. The number of metrics captured is in the order of a couple hundred. The data is sent back to the Mozilla Bagheera servers  which is then analyzed by the engineers.

The Telemetry feature asks the Nightly/Aurora (pre-release) users  if they would like to submit their anonymized performance data . This resulted in  a response rate (number of people who opted in divided by the number of people who were asked) of less than 3%. This led to two concerns: small number of responses (which changed when Telemetry became part of  Firefox release) and more importantly representativeness: are the performance measurements as collected from the 3% representative of those of people who chose not to  opt in?

Measuring the bias is not easy unless we have measurements about the users who did not opt in. Firefox sends the following pieces of information to the Mozilla servers: operating system, Firefox version, extension identifiers and the time for the session to be restored. This is sent by all Firefox installations unless the distribution or user have the feature turned off (this is called the services AMO ping). The Telemetry data contains the same pieces of information.

What this implies is that we have start up times for i) the users who opted in to Telemetry and ii) everyone. We can now answer the question “Are the startup times for the people who opted into Telemetry representative of the typical Firefox user?”

Note: ‘everyone’ is almost everyone. Very few have this feature turned off.

Data Collection

We collected start up times for Firefox 7,8 and 9 for November, 2011 from the log files of services.addons.mozilla.org (SAMO). We also took the same information for the same period from the Telemetry data contained in HBase ( some code examples can be found at the end of the article).

Objective

Are start up times different by Firefox version and/or Source, where source can be SAMO or Telemetry.

Displays

Figure 1 is boxplot of log of start up time for Telemetry (tele) vs. SAMO (samo) by Firefox version. At first glance it appears the start up times from Telemetry are less than those of SAMO. But the length of the bars makes it difficult to stand by this conclusion.

Figure 1: Boxplot of Log SessionRestored for Telemetry/SAMO by FF Version

Figure 1: Boxplot of Log SessionRestored for Telemetry/SAMO by FF Version

Figure 2 is the difference in the deciles of log of start up time. In other words, approximately speaking, the deciles of ratio of Telemetry start up time to SAMO start up time. The medians hover in the 0.8 region, though the bars are very wide and do not support to a the quick conclusion that Telemetry start up time is smaller.

Figure 2: Difference of Deciles of Logs

Figure 2: Difference of Deciles of Logs

In Figure 3, we have the mean of medians of 1000 samples: red circles are for telemetry and black for SAMO. The ends of the line segments correspond the sample 95% confidence interval (based on the sample of sample medians). The CI for the SAMO data lies entirely within that of the Telemetry data. This makes one believe that the two groups are not different.

Figure 4: Mean of the medians (circles) with their 95% confidence intervals. Red isTelemetry, Black is SAMO

Figure 4: Mean of the medians (circles) with their 95% confidence intervals. Red isTelemetry, Black is SAMO

Analysis of Variance

For a more numerical approach, we can estimate the analayis of variance components. The model is

log(startup time) ~ version + src

(we ignore interaction). Since the data is in the order of billions of rows, I instead take 1000 samples of approximately 20,000 (sampling rate of 0.001%) rows each. Compute ANOVA results of each and then average the summary tables of the lm function in R. In other words we make our conclusions based on the average of the 1000 samples of ~20,000 rows each. ( I should point out that the residuals (as per a quick visual check) were roughly distributed as gaussian and other diagnostics came out clean)

The average ANOVA indicates does not support version effect or source effect (at the 1% level). In other words, the log of start up time is not affected by the version nor is it affected by the source (Telemetry/ SAMO).

               Estimate Std. Error     t value   Pr(>|t|)
(Intercept)  8.62635472 0.01171420 736.4390937 0.00000000
vers8       -0.05995627 0.01928947  -3.1089666 0.02922402
vers9       -0.03382135 0.10466330  -0.3247165 0.48286903
vers10      -0.03862282 0.29308642  -0.1418623 0.48228122
srctele     -0.02290538 0.03946150  -0.5811779 0.45300964

This is good news! Insofar start up time is concerned, Telemetry is representative of SAMO.

A Different Approach and Some Checks

By now, the reader should note that we have answered our question (see last line of previous section). Two questions remain:

1. The samples are representative. We are sampling on 3 dimensions: startup time, src and version. Consider the 1000 quantiles of startup time, the 2 levels of src and 4 levels of version. All in all, we have 1000x2x4 or 8000 cells. Sampling from the population might result in several empty cells, so much so, that the joint distribution of the sample might be very different from that of the population. To confirm that our cell distribution of the samples reflect the cell distribution of the population, we computed Chi Square tests comparing the sample cell counts with that of the parent. All 1000 samples passed!

2. Why use samples? We can do a log linear regression testing on the 8000 cell counts (i.e all the 1.9 BN data points) . This of course loses a lot of power: we are binning the data and all monotonic transformations are equivalent. The model equivalent (using R’s formula language) of the ANOVA described above is

log(cell count) ~ src+ver+binned_startup:(src+ver)

 If the effects of binned_startup:src and binned_startup:ver are not significant this corresponds to our conclusion in the previous section. And nicely enough, it does!  Output of summary(aov(glm(…))) is

summary(aov(glmout <- glm(n~ver+src+sesscut:(ver+src)
                          , family=poisson
                          , data=cells3.parent))
              Df     Sum Sq    Mean Sq   F value Pr(>F)
ver            3 4.6465e+14 1.5488e+14 1131.8666 <2e-16 ***
src            1 3.2705e+14 3.2705e+14 2390.0704 <2e-16 ***
ver:sesscut 3952 5.4969e+13 1.3909e+10    0.1016      1
src:sesscut  988 2.0009e+13 2.0252e+10    0.1480      1
Residuals   2967 4.0600e+14 1.3684e+11

Some R Code and Data Sizes:

1. The data for SAMO was obtained from Hive, sent to a text file and then imported to blocked R data frames using RHIPE. All subsequent analysis was done using RHIPE.

2. The data for Telemetry, was obtained from Hbase using Pig (RHIPE can read HBase, but I couldn’t install it on this particular cluster). The text data was then imported as blocked R data frames and placed in the same directory as the
imported SAMO data.

3. Data sizes were in the few hundreds of gigabytes. All computations were done using RHIPE (R not on the on the nodes) on  a 350TB/33 node Hadoop cluster.

3. I include some sample code to give a flavor of RHIPE.

Importing text data as Data Frames

map         <- expression({
  ln        <- strsplit(unlist(map.values),"\001")
  a         <- do.call("rbind",ln)
  addonping <- data.frame(ds=a[,1]
                         ,vers=a[,3]
                         ,sesssionrestored=as.numeric(a[,6])
                         ,src=rep("samo",length(a[,6]))
                         ,stringsAsFactors=FALSE)
  rhcollect(runif(1),addonping)
})
z <- rhmr(map=map
          ,ifolder="/user/sguha/somequants"
          ,ofolder="/user/sguha/teledf/samo"
          ,zips="/user/sguha/Rfolder.tar.gz"
          ,inout=c("text","seq")
          ,mapred=list(mapred.reduce.tasks=120
             ,rhipe_map_buff_size=5000))
rhstatus(rhex(z,async=TRUE),mon.sec=4)

Creating Random Samples

map         <- expression({
  y         <- do.call('rbind', map.values)
  p         <- 20000/1923725302
  for(i in 1:1000){
    zz      <- runif(nrow(y)) < p
    mu      <- y[zz,,drop=FALSE]
    if(nrow(mu)>0)
      rhcollect(i,mu)
  }
})
reduce      <- expression(
    pre={ x <- NULL}
    ,reduce = {
      x     <- rbind(x,do.call('rbind',reduce.values))
    }
    ,post={ rhcollect(reduce.key,x) }
    )
z <- rhmr(map=map,reduce=reduce
          ,ifolder="/user/sguha/teledfsubs/p*"
          ,ofolder="/user/sguha/televers/dfsample"
          ,inout=c('seq','seq')
          ,orderby='integer'
          ,partition=list(lims=1,type='integer')
          ,zips="/user/sguha/Rfolder.tar.gz"
          ,mapred=list(mapred.reduce.tasks=72
             ,rhipe_map_buff_size=20))
rhstatus(rhex(z,async=TRUE),mon.sec=5)

Run Models Across Samples

map        <- expression({
  cuts     <- unserialize(charToRaw(Sys.getenv("mcuts")))
  lapply(map.values, function(y){
    y$tval <- sapply(y$sesssionrestored
                     ,function(r) {
                       if(is.na(r)) return( r)
                       max(min(r,cuts[2]),cuts[1])
                     })
    mdl    <- lm(log(tval)~vers+src,data=y)
    rhcollect(NULL, summary(mdl))
  })})
z <- rhmr(map=map
          ,ifolder="/user/sguha/televers/dfsample/p*"
          ,ofolder="/user/sguha/televers2",
          ,zips="/user/sguha/Rfolder.tar.gz"
          ,inout=c("seq","seq")
          ,mapred=list(mapred.reduce.tasks=0))
rhstatus(rhex(z,async=TRUE),mon.sec=4)

Computing Cell Counts For A Log Linear Model

cuts2                <- wtd.quantile(tms$x,tms$n,
                                     p=seq(0,1,length=1000))
cuts2[1]             <- cuts[1]
cuts2[length(cuts2)] <- cuts[2]
map.count <- expression({
  cuts       <- unserialize(charToRaw(Sys.getenv("mcuts")))
  z          <- do.call(rbind,map.values)
  z$tval     <- sapply(z$sesssionrestored,function(r)
                  max(min(r,cuts[length(cuts)]),cuts[1]))
  z$sessCuts <-
    factor(findInterval(z$tval,
                        cuts),ordered=TRUE)
  f          <- split(z,list(z$vers,z$sessCuts,z$src),drop=FALSE)
  for(i in seq_along(f)){
    y <-strsplit(names(f)[[i]],"\\.")[[1]]
    rhcollect(y,nrow(f[[i]])) }
})
z <-
  rhmr(map=map.count,reduce=rhoptions()$templates$scalarsummer
       ,combiner=TRUE,
       ifolder="/user/sguha/teledfsubs/p*"
       ,ofolder="/user/sguha/telecells",
       ,zips="/user/sguha/Rfolder.tar.gz"
       ,inout=c("seq","seq") ,mapred=
       list(mapred.task.timeout=0
            ,rhipe_map_buff_size=40
            ,mcuts=rawToChar(serialize(cuts2, NULL,
                                ascii=TRUE))))

Understanding DNT Adoption within Firefox

aphadke

13

UPDATED 2011-09-08 11:55am PST: changed the description of how we store and retain IP address to be more accurate

On March 23rd, Mozilla launched its newest and most awesome browser: Firefox 4. Along with a plethora of features, including faster performance, better security and the whole nine yards, Firefox 4 included a cutting edge privacy feature called Do No Track (DNT). For the uninitiated, DNT simply tells sites “I don’t want to be tracked” via a HTTP header visible to all advertisers and publishers.
Mozilla’s new Privacy Blog  has several posts on the feature, including a new one today releasing a Do Not Track Field Guide for developers. Based on our current numbers, we’ve been seeing for several weeks now just under 5% of our users with DNT turned on within Firefox.
The Mozilla team is all about experimenting. We love innovating new technologies that do good and benefit the community as a whole. As Firefox 4 kept breaking records, the peak was 5,500 downloads/minute, (source: http://blog.mozilla.org/blog/2011/03/25/the-first-48-hours-of-mozilla-firefox-4/) we felt that it would be important to understand whether people were enabling DNT. Every Metrics guy lives and dies by the data. In late 2010, the metrics team gave a small talk on how we collect log data (click here for the video ppt). While that project has gone multiple iterations over time, the basic premise is still the same:

  • Grab logs from multiple data-centers.
  • Split out anonymized and non-anonymized data into two separate files
  • Store both sets of files in HDFS
  • Create relevant partitions inside HIVE
  • Query the data.
  • Drool over the stats.

(Non-anonymized data such as IP address has a 6-month retention policy and is deleted on expiration)

We decided to follow the same approach for calculating DNT stats. Once every day, each Firefox instance pings the AUS servers with respect to its DNT status. The ping request looks something like this:
“DNT:-” User has NOT set DNT
“DNT:1″ User HAS set DNT and does *not* wish to be tracked.

Armed with the following data points, a simple HIVE query gives us DNT stats for a given day:

   SELECT ds, dnt_type,  count(distinct ip_address)  FROM web_logs WHERE (request_url LIKE ‘%Firefox/4.0%’ OR request_url LIKE ‘%Firefox/5.0%’ OR request_url LIKE ‘%Firefox/6.0%’) AND dnt_type != ‘DNT:1, 1′ AND ds = ‘$dateTime’ GROUP BY ds, dnt_type ORDER BY ds desc;

The above script is run on a nightly basis and the result is then plotted over a time graph, as included with this post.

One BIG caveat:

The DNT numbers are being undercounted, primarily because we use hashed IP address as proxy for counting a unique user. This means, while there can be multiple users behind a given NAT with DNT set, the counter is incremented only once. This may account for why our numbers are a bit lower than those being reported by other groups, including the recent study of 100 million Firefox users conducted by Krux Digital

Possible Fix, NOT:

While it is possible to uniquely identify each instance of browser, doing so will require that we start tracking users, thereby defeating the exact purpose for why DNT was created in the first place.

 

Feel free to leave us a comment or email: (aphadke at_the_rate mozilla dot com – Anurag Phadke) for more information.

Do 90% of People Not Use CTRL+F?

Diyang Tang

21

According to an article in The Atlantic floating around the internet, 90% of users don’t know how to use CTRL+F or Command+F to search a webpage. We were surprised at that percentage. Fortunately, Mozilla has TestPilot studies with open data, and we can see if Firefox users behave similarly. One relevant 7-day TestPilot study of about 69,000 Windows users focused on Firefox’s user interface. Along with seeing how users interacted with the navigation bar, their bookmarks, etc., the study looked at how often people used keyboard shortcuts.

What we found is that about 81% of TestPilot users didn’t use CTRL+F during the course of the study. While 81% is lower than the 90% in the article, TestPilot users are usually more technologically experienced than the general population, since they are largely Firefox Beta users. When we look at TestPilot users who consider themselves beginners, the percentage goes up to 85%. Therefore, our 81% figure does not belie the Atlantic piece.

In addition, those who use CTRL+F on average use keyboard shortcuts twice as much as those who don’t, even when we ignore those people who don’t use any keyboard shortcuts at all. This implies that people who use CTRL+F are more comfortable with keyboard shortcuts in general. The only keyboard shortcut the users who use CTRL+F lag behind in is Full Screen, or F11.

Feel free to take a look at the data yourself and let us know about any interesting trends you discover!

Text mining users’ definitions of browsing privacy

Rebecca Weiss

5

One issue that’s been on everyone’s mind lately is privacy.  Privacy is extremely important to us at Mozilla, but it isn’t exactly clear how Firefox users define privacy.  For example, what do Firefox users consider to be essential privacy issues?  What features of a browsing experience lead users to consider a browser to be more or less private?

In order to answer  these questions, we asked users to give us their definitions of privacy, specifically privacy while browsing, in order to answer these questions.  The assumption was that users will have different definitions, but that there will be enough similarities between groups of responses that we could identify “themes” amongst the responses. By text mining user responses to an open-ended survey question asking for definitions of browsing privacy,  we were able to identify themes directly from the users’ mouths:

  1. Regarding  privacy issues, people know that tracking and browser history are  different issues, validating the need for browser features that address  these issues independently (“private browsing” and “do not track”)
  2. People’s definition of personal information vary, but we can group people  according to the different ways they refer to personal information (this leads to a natural follow-up question; what makes some information more personal than others?)
  3. Previous focus group research, contracted by Mozilla, showed that users are aware that spam indicates a  security risk, but what didn’t come out of the focus group research was that users also also consider spam to be an invasion of their privacy (a follow-up question, what do users define as “spam?”  Do they consider targeted ads to be spam?)
  4. There are users who don’t distinguish privacy and security from each other

Some previous research on browsing and privacy

We  knew from our own focus group research that users are concerned about viruses, theft of their personal information and passwords, that a  website might misuse their information, that someone may track their  online “footprint”, or that their browser history is visible to others.   Users view things like targeted ads, spam, browser crashes, popups, and  windows imploring them to install updates as security risks.

But it’s difficult to broadly generalize findings from focus groups.  One group may or may not have the same concerns as the general population.  The quality of the discussion moderator, or some unique combination of participants,  the moderator, and/or the setting can also influence the findings you get from focus groups.

One way of validating the representativeness of focus group research is to use surveys.  But while surveys may increase the representativeness of your findings, they are not as flexible as focus groups.  You have to give survey respondents their answer options up front.  Therefore, by providing the options that a respondent can endorse, you are limiting their voice.

A typical  way to approach this problem in surveys is to use open-ended survey questions.  In the pre-data mining days, we would have to manually code  each of these survey responses: a first pass of all responses to get an idea of respondent “themes” or “topics” and a second pass to code each  response according to those themes.  This approach is costly in terms of time and effort, plus it also suffers from the problem of reproducibility; unless themes are extremely obvious, different coders might not classify a response as part of the same theme.  But with modern text mining methods, we can simulate this coding process much more quickly and reproducibly.

Text mining open-ended survey questions

Because text mining is growing in popularity primarily due to its computational feasibility , it’s important to review the  methods in some detail.  Text mining, as with any machine learning-based approach, isn’t magic.  There are a number of caveats to make about the text mining approach used. First, the clustering algorithm I chose to use requires an arbitrary and a priori decision regarding the number of clusters.  I looked at 4 to 8 clusters and decided that 6 provided the best trade-off between themes expressed and redundancy.  Second, there is a random component to  clustering, meaning that one clustering of the same set of data may not produce the exact same results as another clustering. Theoretically,  there shouldn’t be tremendous differences between the themes expressed in one clustering over another, but it’s important to keep these details in mind.

The general idea of text mining is to assume that you can represent documents as “bags of words”, that bags of words can be represented or coded quantitatively, and that the quantitative representation of text can be projected into a multi-dimensional space. For example, I can represent survey respondents in two dimensions, where each point is a respondent’s answer.  Points that are tightly clustered together mean that these responses are theoretically very similar with respect to lexical content (e.g., commonality of words).

I  also calculated a score that identifies the relative frequency of each word in a cluster, which is reflected in the size of the word on each  cluster’s graph.  In essence, the larger the word, the more it “defines”  the cluster (i.e. its location and shape in the space).

Higher resolution .pdf files of these graphs can be found here and here.

Cluster summaries

  • “Privacy and Personal information”: Clusters  1, 4, and 5 are dominated by, unsurprisingly, concerns about  information.  What’s interesting are the lower-level associations  between the clusters and the words.  The largest, densest cluster  (cluster 4)  deals mostly with access to personal information whereas  cluster 1 addresses personal information as it relates to identity  issues (such as when banking).  Cluster 5 is subtly different from both 1  and 4.  The extra emphasis on “share” could imply that users have  different expectations of privacy with personal information that they explicitly choose to leak onto the web as opposed to personal information that they  aren’t aware they are expressing.  One area of further investigation would be to seek out user definitions on personal information; what makes some information more “personal” than others?
  • “Privacy and Tracking”: Cluster  6 clearly shows that people associate being tracked as a privacy issue.   The lower-scored words indicate what kind of tracked information  concerns them (e.g., keystrokes, cookies, site visits), but in general  the notion of “tracking” is paramount to respondents in this cluster.   Compare this with cluster 2, which is more strongly defined by the words  “look” and “history.”  This is obviously a reference to the role that  browsing history has in defining privacy.  It’s interesting that these clusters are so distinct from each other, because it implies that users  are aware there is a difference between their browser history and other  behaviors they exhibit that could be tracked.  It’s also interesting  that users who consider browser history a privacy issue also consider  advertising and ads (presumably a reference to targeted ads) as privacy  issues as well.  We can use this information to extend the focus group  research on targeted ads; in addition to a security risk, some users  also view targeted ads as an invasion of privacy.  One interesting question naturally arises: do users differentiate between spam and  targeted advertisements?
  • “Privacy and Security”: The  weakest defined group is cluster 3, which can be interpreted in many ways.  The least controversial inference could be that these users simply don’t have a strong definition of privacy aside from a notion  that privacy is related to identity and security.  This validates a notion from our focus group research that some users really don’t differentiate between privacy and  security.

Final thoughts

User  privacy and browser security are very important to us at Mozilla, and  developing a product that improves on both requires a deep and evolving  understanding of what those words mean to people of all communities – our entire user population.    In this post, we’ve shown how text mining can enhance our understanding  of pre-existing focus group research and generate novel directions for  further research. Moreover, we’ve also shown how it can provide insight into  users’ perception by looking at the differences in the language they use  to define a concept.  In the next post, I’ll be using the same text  mining approach to evaluate user definitions of security while browsing  the web.

 

Test Pilot New Tab Study Results

Lilian Weng

9

[Cross-posted at Mozilla User Research]

The new tab page in Firefox is intentionally left blank, while some browsers present rich information on a newly opened tab.

The decision to leave new tab pages in Firefox blank was driven, in part, by a suspicion that too much information in the new tab may distract users from getting to the destination intended for the new tab. To test whether this suspicion is true and to learn more about user behavior after opening a new tab, Test Pilot recently released the New Tab Study and will soon release a multivariate test on the new tab page. Test Pilot is a platform collecting structured user feedback through Firefox. It currently has about 3 millions users and all the studies are opt in. You can help us better understand how people use their web browser and the Internet in order to build better products by participating studies. Test Pilot add-on is available here.The study ran for 5 days and in all, we collected 256,282 valid submissions.
Results of the study show that on average each user daily:
  • opens 11 new blank tabs
  • loads 7 pages
  • visits 2 unique domains
  • visits 2 pages in a new tab before they leave or close it

Below are details on how a user loads a page in a new tab, their intentions when opening a new tab, and time spent on new tabs below.

How do users load a page in new tabs?

 We detected 11 different methods to load a Web page in a blank tab page. Actions in the Url bar include pressing ENTER through keyboard, clicking the go button on the right side of the bar, clicking the Web page suggestions in the dropdown menu and pressing ENTER key for dropdown suggestions. Similarly 4 actions can be performed in the search bar too. Users can load a previously saved page from the bookmark bar in the toolbar or Bookmark/History in the menu bar.

Note:

  • The URL bar is most used when navigating to new websites.
  • The Search bar is also popular. Users rarely use search bar dropdown to look for old search terms.
  • The Bookmark toolbar is used more often than the bookmark menu button.
  • The History Menu button is seldom used.

We can also classify all methods for loading web pages into either keyboard-based or mouse-based category. Generally speaking, users have a slight preference for mouse usage.

 

Why do users open new tabs?

1.    Are they looking for a specific URL?

13.95% of new tabs (13,941,404) are opened while the text in the clipboard starts with “http” or “www”, which are very likely to be URL strings. The number is surprisingly high, although it may be caused by previous actions rather than by pasting for loading a specific URL.

2.    Users browse a limited set of domains, and only a small proportion of domains attract most visits

If we represent each user as a single point in the plot where x-axis is the number of pageloads, and y-axis is the number of unique domains visited, we can get the following graph. The dash line (diagonal) is what will happen if users always visit a different domain for each page load. When the users are not so active, pageloads less or around a few hundreds, the number of unique domains grows linearly. However, once users get to browse more, distinct domains tend to be stabilized and saturated.

Globally, we check the visit frequencies of all domains, and find that globally only 17.38% domains (461,133 unique domains in total) take 80% of the total page loads (8,291,541 pageloads in total). It verifies the famous “20-80” law of long tail phenomena.

On the individual level, we are interested in whether a single user performs the browsing movements according to the 20-80 law. For each individual, domains taking 80% of the total page visits is defined as “main domains”. A user can confirm the 20-80 law if the ratio of the number of his main domains to the number of distinct domains is around 20%. According to the following fig., active users browse more web pages everyday, but the number of primary sites they go to decreases proportionally. It suggests that when users visit more sites, they prefer to go to the same sites more frequently. The result supports the existence of a speed-dial new tab page to some extent.

 

Time Spent on New Tabs

According to the study results, on average, users open 2 pages in a new tab before they leave or close it. They load the first web page in 6 seconds (median) after they open new tabs, and stay on the tab for 1 minute (median) once they start browsing. The distributions of these two types of reaction timings display broad tails. The actually mean values are much higher than the medians: users load the first web page in 45 seconds (mean) after they open new tabs, and stay on the tab for 7 minute (mean) once they start browsing, since the outliners and expected noises can vary the mean value a lot.
Meanwhile, how users open a new tab can distinguish 2 groups of mouse-based users and keyboard-based users. The tabs invoked by “Plus Button” and “Double Click on TabBar” represent the group of mouse-based users, and the tabs invoked by “Command+T” represent the the group of keyboard-based users. The results turn out that keyboard-based users act slightly faster than mouse-based ones, and they can stay on the same new tab a bit longer.

The study is preliminary study for redesign requirement of the new tab pages in Firefox. We detect user behavior patterns of how they use the new tabs, including how they load a new page, broadness of domain visited, and the timing of different actions. In the following New Tab Multivariate Test, we will do a comparison between several designs of the new tab page, and more research questions will be answered, including whether too much information in the new tab may distract users from the original target or not.