Feed on

Archive for the 'Uncategorized' Category

I’m sure you’ve heard by now, Firefox 4 is officially released.  The Metrics team has done our part by working with webdev to release a new real-time download visualization:   The basic backend flow is like this: The various load balancing clusters that host download.mozilla.org are configured to log download requests to a remote syslog […]

The Background During my work as metrics liaison with the Firefox Input team, an exciting requirement has come up: scalable online clustering of the millions of feedback items that the users of Firefox share with us. When designing a service at the metrics team, besides functional requirements (accept text messages, produce clusters) we consider scalability […]

We recently had a situation where we needed to copy a lot of HBase data while migrating from our old datacenter to our new one. The old cluster was running Cloudera’s CDH2 with HBase 0.20.6 and the new one is running CDH3b3. Usually I would use Hadoop’s distcp utility for such a job. As it […]

Few months ago, at Hadoop World 2010, the metrics team gave a talk on Flume + Hive integration and how we plan to integrate it with other projects. As we were nearing production date, the BuildBot/TinderBox team came with an interesting, albeit pragmatic requirement. “Flume + Hive really solves our needs, but we would ideally […]

As documented in THRIFT-601, sending random data to Thrift can cause it to leak memory. At Mozilla, we use a web load balancer to distribute traffic to our Thrift machines, and the default liveness check it uses is a simple TCP connect. We also had Nagios performing TCP connect checks on these nodes for general […]

Introduction: Using A Riak Cluster for the Mozilla Test Pilot Project As part of integrating Test Pilot into the Firefox 4.0 beta, we needed a production-worthy back-end for storing the experiment results and performing analysis on them. As discussed in the previous blog post, Riak and Cassandra and Hbase, oh my!, we decided on Riak […]

Exponential growth, one of the few problems every organization loves, is usually alleviated by scaling out using clustered computing (Hadoop), CDN, EC2 and myriad of other solutions. While a lot of cycles are spent in making sure each scaled out machine contains requisite libraries, latest code deployments, matching configs, and the whole nine yards, very […]

Socorro + HBase = WIN

Socorro: Mozilla’s Crash Reporting System Laura just posted this fantastic article on the Mozilla WebDev blog talking about the past and future for Socorro.  It covers all the points that I was wanting to blog about here regarding what our integration of HBase brings to the table for Socorro.  Please, if you haven’t yet read […]

Pentaho announced this morning that they were going to be adding some features to Pentaho Data Integration (Kettle) and to their BI suite to make it easy for people to use Kettle to retrieve, manipulate, and store data in Hadoop, and to integrate Hadoop communication into the reporting and analysis layer. They posted a nice […]

We are marching along in our integration of HBase with the Socorro Crash Stats project, but I wanted to take a minute away from that to talk about a separate project the Metrics team has also been involved with. Mozilla Labs Test Pilot is a project to experiment and analyze data from real world Firefox […]

Next »