about:profile – Analyzing Data in Firefox

The Prospector team has always been interested in analyzing the data stored locally to an instance of Firefox to experiment with improved interfaces that leverage that data. Previous prototypes have focused on both processing data and changing the interface in a single add-on, but the team sees a benefit of splitting apart the two because more people can get involved and brainstorm ideas of what can be done with each individual piece.

Focusing on the first piece, we would like to get your input on how the data in Firefox should be analyzed. There is a lot of existing data in Firefox such as titles of bookmarks, auto-completion for forms, times you’ve visited pages, cookies from sites, and much more. And even focusing on just one type of data, there’s many ways to analyze it.

We’re releasing about:profile to get the conversation started. It only looks at the domains of pages you’ve visited and references them with two packaged sources of data: ODP categories and Alexa siteinfo. All the analysis is done within the add-on and no data is sent out from Firefox, so you can take a look at about:profile even when offline.

Overall categorization and detailed/recent interests

As a proof of concept, the exact details of how we combine the data isn’t too interesting, and we don’t expect the visualized results above to be accurate for everyone. But briefly, about:profile shows your overall browsing interest based on the top-level ODP categories, your largest sub-categories given all the domains in your history, the largest increases of sub-category interest over the last few days, and estimated demographics for the domains visited.

We would like your input on other ways to analyze data in Firefox, but also keep in mind the goal is to improve Firefox while supporting Mozilla’s principles of openness and user privacy. You can install about:profile without restarting Firefox, and it’ll open a tab with a user profile based on your browsing history. Take a look at the visualization and click around to see if that triggers any ideas.

Next week, we’ll focus on the second piece of using the data to improve Firefox. For example, Firefox currently analyzes browsing patterns to make the AwesomeBar super-smart with predictive suggestions for where you want to go.

As always, you can check the source on Github, provide feedback, and submit issues or suggestions!

Ed Lee on behalf of the Prospector team

7 responses

  1. Anonymous wrote on :

    The pie charts do not load in about:page.

  2. unekdoud wrote on :

    The estimated demographic data is spot-on for me. Creepy!

    I would suggest using the most popular pages from social networking sites (as well as YouTube channels, Wikipedia articles etc. ) as another source of data – although this might get outdated every few months.

  3. Matt wrote on :

    This is interesting. Is it possible to add a http://www.rescuetime.com/ like analysis of your productivity. I like the idea, but I don’t want to transmit my browser history to some company and pay the for that.

    1. Edward Lee wrote on :

      What would you say is the type of analysis rescuetime is doing on what data? At a quick glance, it looks like one way to analyze the data in Firefox is to segment the browsing history into day-by-day chunks and looking for patterns of types of pages visited. This could reuse the ODP categorization to group the data while adding in a time, perhaps hourly, aspect.

      1. Matt wrote on :

        The time based analysis is the interesting thing. See the reports at http://projecthamster.wordpress.com/screenshots/ for example. The aim is to measure working efficiency or distraction for some kind. It focuses also on self-help for internet addicts in the age of distractions or maybe a web developer wants to automatically generate reports to bill his clients.

  4. Anonymous wrote on :

    The sites I visit the most aren’t among the estimated demographic data, so the results are a bit wonky. Is the relative difference between the amount of history entries that hit estimated demographic data and the amount of all history entries available anywhere?

    1. Edward Lee wrote on :

      We only packaged 500 sites for demographics and 5000 sites for categories for this proof of concept, so it’s somewhat expected that it might not cover everybody’s most visited sites. The included data likely has a US-audience bias given how we picked out the top 500/5000 sites. You can check the source to see what sites have been packaged: