Feed on
Posts
Comments

I’m sure you’ve heard by now, Firefox 4 is officially released.  The Metrics team has done our part by working with webdev to release a new real-time download visualization:

World map visualizing real-time Firefox 4 downloads

http://glow.mozilla.org/

 

The basic backend flow is like this:

  1. The various load balancing clusters that host download.mozilla.org are configured to log download requests to a remote syslog server.
  2. The remote server is running rsyslog and has a config that specifically filters those remote syslog events into a dedicated file that rolls over hourly
  3. SQLStream is installed on that server and it is tailing those log files as they appear.
  4. The SQLStream pipeline does the following for each request:
    1. filtering out anything other than valid download requests
    2. uses MaxMind GeoIP to get a geographic location from the IP address
    3. uses a streaming group by to aggregate the number of downloads by product, location, and timestamp
    4. every 10 seconds, sends a stream of counter increments to HBase for the timestamp row with the column qualifiers being each distinct location that had downloads in that time interval
  5. The glow backend is a python app that pulls the data out of HBase using the Python Thrift interface and writes a file containing a JSON representation of the data every minute.
  6. That JSON file can be cached on the front-end forever since each minute of data has a distinct filename
  7. The glow website pulls down that data and plays back the downloads or allows you to browse the geographic totals in the arc chart view

Some links for people interested in the code:

 

28 Responses to “How glow.mozilla.org gets its data”

  1. on 22 Mar 2011 at 11:18 am Skoua

    Impressive.

    How much time did it take you to build that?

  2. on 22 Mar 2011 at 11:21 am Renato Iwashima

    Great job guys! Congrats!

  3. on 22 Mar 2011 at 11:51 am deinspanjer

    The website parts were pretty quick because our webdevs rock. Some of the finer points of the backend took a long time, mostly just working out weird little kinks and getting systems to talk to each other.
    If we had a dedicated backend engineer, it could have been done in probably two or three weeks tops? Building on the framework we’ve done so far though, it would be possible to stand something like this up very quickly.

  4. […] Mozilla’s Daniel Einspanjer has also blogged about their new real-time download vizualization …. The blog explains the overall architecture of the real-time application using SQLstream, and the SQLstream integration with HBase. […]

  5. on 22 Mar 2011 at 11:58 am John Patrick

    Great job guys – it’s strangely addictive watching the downloads spread across the globe like this in real time.

  6. on 22 Mar 2011 at 1:49 pm expert

    Interesting data visualization!
    Can’t wait for FF4 being pushed to public update.

  7. […] not coding) looking at Mozilla’s Glow visualization (Firefox 4 downloads), and reading about how it works.  I’m feeling particularly inspired by this and have a great idea for a project to give me […]

  8. on 22 Mar 2011 at 3:38 pm Eric Hauser

    Before choosing SQLStream, did you evaluate alternatives – open source or otherwise? I believe SQLStream is pretty pricey and you guys are very much an open source shop, so I’m interested as to why you choose it.

  9. on 22 Mar 2011 at 3:46 pm deinspanjer

    There are some decent alternatives and also many more ways we could roll our own now. Back in 2009, that wasn’t quite the case.
    We engaged them back then on the project of real time download visualization and it has continued to work well so far.
    I definitely agree that price and closed source are issues that keep hobbyists and small companies from being able to easily follow in our footsteps here and that doesn’t make me happy.
    I just ran across this very interesting github project that is related to this conversation: https://github.com/stagas/maptail

  10. on 22 Mar 2011 at 8:31 pm Marvin

    Will you release the source code of this tremendous website as Open Source?

  11. on 22 Mar 2011 at 8:35 pm Marvin

    Just a comment… looking now on the glow globe: Japan’s internet is working really well although they had this earthquake. They are downloading alot of Firefoxes :-)

  12. […] The Mozilla team has more details on their blog. Thanks to Alex Parvulescu for pointing that out. Possibly related posts: […]

  13. on 23 Mar 2011 at 6:00 am deinspanjer

    Check the content of this article again. It includes links to the open source repositories.

  14. on 23 Mar 2011 at 7:15 am Shox

    http://glow.mozilla.org/ work faster on chrome than FF4 !!
    Try it please ? (Ubuntu 10.10 on IBM Thinkpad )

  15. […] Read more at Mozilla’s Blog of Data […]

  16. on 23 Mar 2011 at 11:32 pm Vizualization | Home Gym Fan

    […] Blog of Data » Blog Archive » How glow.mozilla.org gets its data […]

  17. on 24 Mar 2011 at 12:19 am Sam Wei

    A cool visualization! I like it.

  18. […] IE9的24小时下载量约为235万。根据Mozilla提供的下载地图,Firefox 4发布24小时还没到,下载量已经突破了564万,是IE9的二倍以上。Firefox 3的24小时下载量曾创下800万次的记录,但那次是Mozilla大力推广“下载日”的结果。 Mozilla官方博客的介绍下载地图工作原理:多个托管download.mozilla.org的负载均衡服务器集群,设置将下载请求日志发送到一台远程syslog服务器,这台服务器通过SQLStream过滤出非有效下载请求,用MaxMind GeoIP对IP地址进行定位,聚合下载次数,地址和时间戳等。 […]

  19. on 24 Mar 2011 at 2:59 am َArash

    Thanks for the good work,

    But not to show the download statistics from a country (in this case Iran) is not even politically correct. I believe it is completely against Mozilla Foundation spirit which is openness to everybody.

    Please reconsider your glow map statistics and add those 7 countries to your map. here is not the stupid politicians battle field.

    Cheers!

  20. on 24 Mar 2011 at 4:37 am firefox4下载量统计系统的实现

    […] Mozilla官方博客的介绍下 载地图工作原理:多个托管download.mozilla.org的负载均衡服务器集群,设置将下载请求日志发送到一台远程syslog服务器,这台服 务器通过SQLStream过滤出非有效下载请求,用MaxMind GeoIP对IP地址进行定位,聚合下载次数,地址和时间戳等。 […]

  21. […] – How glow.mozilla.org visualizes real time download of Firefox […]

  22. […] A technological breakdown is here: http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/ […]

  23. […] quick explanation of how it is working can been seen at http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/ Filed under: news, Uncategorized Leave a comment Comments (0) Trackbacks (0) ( subscribe to […]

  24. What are the different ways to build a real time download counter just like the one in mozilla’s download website (glow.mozilla.org ) using SQL-Stream or any other?…

    Blog about how the architecture works: http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/ The code for glow: https://github.com/jbalogh/glow

  25. […]   英文原文:How glow.mozilla.org gets its data […]

  26. […]   英文原文:How glow.mozilla.org gets its data […]

  27. […] Firefox hat zum Launch von Firefox 4 eine sehr beeindruckende Visualisierung ins Netz gestellt. Unter http://glow.mozilla.org lassen sich die Downloads in Echtzeit verfolgen. Jeder Punkt steht für einen Download, am unteren Bildschirmrand zeigt ein Balkendiagramm die Downloads im Minutentakt und wer es ganz genau wissen will hat über das Kreisdiagramm unten links die Möglichkeit die Daten von Kontinenten auf Stadtebene runterzubrechen. Das ist Datenvisualisierung! Details zur technischen Umsetzung stehen im übrigen hier: How glow.mozilla.org gets its data […]

  28. […] The first version of Glow used several technologies including SQLStream and HBase to process download request logs in real time and make them available for the site to display. If you are interested in learning the technical details behind Glow read this article from the Mozilla Metrics team, which has links to the code repositories: http://blog.mozilla.org/data/2011/03/22/how-glow-mozilla-org-gets-its-data/ […]

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.