Mozilla Stumbler 1.0 geolocation crowd-sourcing app now in Google Play Store

Hanno Schlichting

Mozilla Stumbler is an open-source wireless network scanner that collects GPS data for the Mozilla Location Service, our crowd-sourced location database. We have just released version 1.0 on the Google Play Store, making it easier for users to install Mozilla Stumbler and stay updated. If you prefer the alternative f-droid catalog, you can also get Mozilla Stumbler on f-droid.

As you move around, the app “stumbles” upon new Wi-Fi networks and cell towers. The Mozilla Location Service combines these wireless observations to provide geolocation services such as showing your location on a map and Find My Device for Firefox OS. The Stumbler’s own map view shows our database coverage (as blue clouds) on your Android device, so you can see which individual streets are yet to be stumbled. For more information about the service visit our project page.

How to use Mozilla Stumbler

  • Go to un-stumbled areas (not covered in blue) to fill in the coverage map
  • Try walking, biking, or driving new routes with Mozilla Stumbler every day.
  • Start with major streets and intersections, then explore side streets. But make sure you stay safe and away from shady places.
  • Your contributions will take a day to show up on the coverage map.
  • The app stops automatically when your battery is low.

Competitive contributors can enter an optional nickname in Stumbler’s settings and track their progress on our leaderboard. The weekly leaderboards are very active. New contributors, because they are often stumbling in new areas and reporting undiscovered wireless networks, have a good chance of ranking at the top of weekly leaderboards.
To conserve cellular data usage, the Stumbler’s map view only shows high resolution map tiles on Wi-Fi. To always show the high resolution maps, you can change the Stumbler’s Developer Settings.

Please report bugs on our GitHub bug tracker or the #geo IRC channel on Mozilla’s IRC server. Mozilla Stumbler is open-source software. The code is available on GitHub.

Firefox Accounts, Sync 1.5, and Self Hosting



Firefox Sync offers best-in-class security and privacy: Mozilla’s servers never see your raw password, and all of your Sync data is encrypted locally, before it leaves your computer. However, some users want to go one step further and host all of their Sync data themselves. The ability to self-host is central to Mozilla’s mission, as it ensures that the convenience of Sync doesn’t come with hidden costs like strict lock-in.

With Sync 1.1, self-hosting was as easy as setting up ownCloud, using one of several Python modules, or for the more intrepid users, running the same code that Mozilla used in production. OpenBSD even includes a Sync 1.1 server in its ports tree!

With Sync 1.5’s recent release, we’re back to square one: all of Mozilla’s code is open source, and people are already running their own Sync 1.5 stacks, but it will take a few months before easier-to-use, self-contained projects emerge.

Of course, if you’re currently using using a self-hosted Sync 1.1 server, Firefox will continue to work with it for several months while we sort out self-hosting of Sync 1.5.

The Relationship Between Sync and Firefox Accounts

One challenge with self-hosting Sync 1.5 is understanding the relationship between Sync 1.5 and Firefox Accounts (FxA). Indeed, this is a frequent topic in the #sync IRC channel.

The Firefox Accounts system serves two purposes:

  1. It is a single, common authentication system for Mozilla services like Sync, the Firefox Marketplace, and Firefox OS’s find-my-device feature.
  2. It stores a password-wrapped version of your Sync key, so that you (and only you) can recover your data, even if you lose all of your devices.

By keeping authentication in one well-defined place, and sharing that infrastructure between projects, we’re able to dramatically reduce complexity and maintenance costs associated with running many services. Sync can focus on Sync, while letting Firefox Accounts handle accounts.

Unfortunately, this happens to increase complexity if all you care about is self-hosting a single service, like Sync.

The Moving Parts

Right now, the Firefox Accounts system is mostly comprised of a database and three servers written in Node.JS:

  1. fxa-auth-server, which handles the Firefox Accounts backend and API, including provisioning identity certificates for users.
  2. fxa-content-server, which serves up static content, like the UI that you see when you visit about:accounts in Firefox.
  3. browserid-verifier, which makes it easy for services like Sync to verify identity certificates issued by the auth server.

Sync has a database and two servers written in Python:

  1. tokenserver, which handles incoming auth requests, issues long-lived bearer tokens, and directs the browser to an appropriate storage server.
  2. syncstorage, which accepts bearer tokens and allows the browser to store and retrieve encrypted Sync data.

Sync and FxA are decoupled, which leads to two interesting self-hosting options:

Method 1: Just the Data

It’s possible to host just the Sync components, while still relying on Mozilla’s hosted Firefox Accounts infrastructure for authentication.

With this method, you still store all of your encrypted Sync data on your own server, but you use Mozilla’s servers for authentication and for backing up a password-wrapped version of your Sync key.

Because Sync uses client-side (PBKDF2 + HKDF) key stretching, Mozilla never learns your plaintext password, and thus cannot unwrap your Sync key. Additional server-side stretching (scrypt + HKDF) further protects against brute-force attacks if the Firefox Accounts database is ever compromised.

Method 2: Full Stack

Alternatively, you could host both the Sync and Firefox Accounts components, completely removing any dependence on Mozilla for Sync.

Self-hosting is limited to browser-based services like Sync. Web sites, like the Firefox Marketplace, may require an additional account with the centralized Firefox Accounts server before allowing you to log in.

The Plan

Right now, we’re focusing on getting Method 1 working as well and as quickly as possible. Most of that work is happening in the syncserver repo, which is a Python project that combines both the tokenserver and syncstorage projects into a single package.

We also need to resolve Bug 1003877, which is currently preventing either method from working on Firefox for Android.

Lastly, we need to ensure that the work we do to facilitate self-hosted Sync is the right work. Namely, how do we go from “technically possible” to “reasonably easy” for advanced users? To get this right, we’ll need to draw on the experience of projects like ownCloud, which is at the forefront of making Sync 1.1 easy to self-host, as well as users who have self-hosted in the past or who are interested in self-hosting now.

If you want to roll up your sleeves and get involved, please join the sync-dev mailing list and introduce yourself in #sync on!

Firefox Sync’s New Security Model



Yesterday’s release of Firefox 29 features a brand new Firefox Sync experience that is much easier to use while maintaining the high standard of safety, security, and openness that you expect from Mozilla.

How does the new Sync differ from the old? Read on!

Sync 1.0

Since its 2010 debut in Firefox 4, Firefox Sync had been powered by a distinctive encryption system which didn’t use passwords: instead it created a unique secret key, which was used to encrypt and decrypt all your data. The only way to get at your data was to know this key. Even the Mozilla servers which held your data could not decrypt the contents.

You (almost) never saw this key, known as the “recovery key,” because the normal way to set up a new device was with a technique called “pairing.” When you set up a new device, you saw a single-use, 12-character “pairing code,” which you could then type into the other device. Through some crypto magic, the recovery key and everything else necessary to set up Sync was safely copied to the new device, ensuring that both devices knew the secret key and could talk securely about your bookmarks and other data.

Problems with Sync

In the last four years, we’ve seen many problems with this scheme. The greatest is that it didn’t do you much good if you only had one device: pairing is about pairs (or threes or fours). If you lost your only device, you probably also lost the only copy of your secret key, and without that key, there was no way to recover your Sync data.

Pairing presented other usability issues as well: you had to be near two devices when setting it up, and many people mistook the pairing code for some sort of computer-generated password that they would need to remember.

The New Firefox Sync

This year, the Services group introduced Firefox Accounts, which are based on a traditional email address and a password, just like the hundreds of other account systems you’re already familiar with.

The new Firefox Sync is the first service to use Firefox Accounts. The security goals remain the same: there is still a strong random secret key, and Mozilla’s servers cannot decrypt your data. However, instead of using pairing, a “wrapped” version of your secret key, protected by your password, is stored alongside your Firefox Account. This means you can recover all your data, even if you lose all your devices at the same time. Setting up a new device only requires typing your Firefox Account email and password into it.

This is a significant change from the previous Firefox Sync. The security of your data now depends upon your password. If your password is guessable, somebody else could connect to your account and decrypt your data. Of course, the best passwords are randomly generated.

Given the importance of your password, we’ve designed Firefox Accounts such that Mozilla’s services never see your password’s clear text. Instead, Firefox first strengthens the password through client-side stretching with PBKDF2, and then derives several purpose-specific keys via HKDF. Neither your password nor the derived “unwrapping” key are ever transmitted to Mozilla. You can read more about the protocol in its technical description on GitHub.

We hope you’ll agree that this is a step in the right direction for Sync. Try it out today and let us know about your experience in the comments!

Special thanks to Brian Warner for his contributions to this post.

Mozilla Location Service – The Next Wave

Hanno Schlichting

Today, we celebrate the first birthday of the Mozilla Location Service (MLS), our experiment in geolocation at Mozilla. Thanks to a lot of help from volunteers, MLS covers a large part of the Earth, proving that our work together can build a usable location service.

Starting from these foundations, we’re now working on building a production service that can provide location services to millions of web users. This is a good time to pause and look back on what we’re doing it, why we’re doing it, and how you can help.

First of all, what is a “location service”? A location service is how your device knows where it is relative to the things around it, like WiFi access points and cell towers. Having a location service is crucial when when you can’t get a GPS signal (like in some urban areas) or don’t have a GPS chip (like in a laptop). Right now, location services are closed, proprietary services, run by Google, Apple, Skyhook, and a few other companies.

There are a few reasons that Mozilla is developing a location service. Location in general has become a very important part of how we use the Internet, and the Mobile Internet is becoming an even more critical part of our lives. So it’s important for the world to have an open and trustworthy location service.

Having an open location service is also critical for some of our key initiatives such as:

  • The FirefoxOS initiative, which aims to bring user choice and the principles of the open web to mobile phones and the mobile ecosystem. MLS in conjunction with FirefoxOS has the potential to provide location services to the millions of users who are transitioning from a feature phone to a smartphone.
  • Providing location information to open-source operating systems.

However, as important as this project is, it can’t succeed without you. The MLS can’t locate phones if it doesn’t know where access points and towers are and we rely on the community to collect this data. We call the process of gathering data “stumbling”. When you stumble for Mozilla, it means you’re traveling around with a GPS device (like a GPS-enabled cell phone) and “listening” for WiFi and cellular signals. When a signal is found, it’s matched up with its GPS location and uploaded and stored in the location service database. Then whenever another device using the service sees that WiFi or cellular tower, the MLS server knows where it is (since it knows where the WiFi or cellular tower is).

You can help MLS get better in a few ways:

  • Stumble: If you’ve got an Android phone, you can install our Stumbler app and start collecting data. Check out the coverage map and see if you can brighten up some dark spots.
  • Contribute: Like all Mozilla software, the Stumbler is an open-source project, with many contributors around the world. Check it out on github and send us a pull request.
  • Discuss: We like to talk about geolocation, whether it’s about how to make MLS better, how to improve stumbling, or how to do cool stuff with location information. You can find us on our dedicated mailing list or come talk to us in our IRC room #geo on Mozilla’s IRC server.

This is an exciting time for geolocation in the open Web. Through the MLS, we’re making location services available to millions more devices, and with more transparency than ever. If you take a look at our roadmap for the year, we’ve got some other big ideas coming up, like an open, community-sourced IP-geo database.

Richard Barnes

Mozilla Location Service – First Anniversary

Hanno Schlichting

The Mozilla Location Service is an open service to provide geolocation lookups based on publicly observable radio signals. It complements geolocation lookups based on global navigation satellite systems like GPS. The project is both built as open-source software and relies on an open community crowd-sourcing approach to gathering signal data across the globe.

We publicly announced the project six month ago, but started work on it a year ago to this day.

In order to celebrate our tireless contributors, we created a little video to show how far we have come and what a small group of dedicated volunteers can accomplish (wait for October, around 16 seconds into the video):


We started with a small group of Bay Area and Vancouver Mozillians, early on attracting volunteers from Moscow and Athens. Over time a small number of especially generous contributors shared their pre-collected data with us in addition to using our own MozStumbler Android app.

The MozStumbler application attracted not only users, but over time an active number of contributors, improving the application and localizing it to 19 different languages. As a fun way to visualize the vibrant community, we created a small video using the gource code visualization tool:


And of course no community is complete without all the members engaging in discussions, sharing the news and some going above and beyond in creating alternative stumbler applications or integrating the service into operating system libraries.

If you have questions or want to engage with us, please contact us on our dedicated mailing list or come talk to us in our IRC room #geo on Mozilla’s IRC server.

Thanks to all past, present and future contributors. This project wouldn’t be possible without all of you!

Hanno Schlichting

P.S. We also generated bigger versions of the map progress in a 1024×768 resolution. You can download the webm or mp4 files.

Heka: Loading log files with Logstreamer

Rob Miller

Heka is a general purpose data processing tool, so it supports a variety of ways to get data into its processing pipeline. But, loading and parsing files from a filesystem is the primary use case for many users. We’ll talk about parsing in the future; this post is going to explore some of the challenges involved with loading.

At first blush it might not seem like there is much of a challenge. It’s a file on a disk somewhere, how hard can it be? You open a file handle and read in the data. Real world cases are rarely so simple, however. Log files don’t grow indefinitely. They’re usually subject to rotation and eventual deletion, and rotation schemes vary widely. Sometimes files are renamed with every rotation tick (e.g. `access.log`, `access.log.0`, `access.log.1`, etc.). Other times new files with new names are periodically created (e.g. `access-20140321.log`, `access-20140322.log`, `access-20140323.log`, etc).

Then there’s the issue of tracking the current location in a file. In some cases we’re loading up a significant backlog of historical log data. In others we’re tracking the tail of a file as it’s being generated in real time. Things get tricky if the process is stopped and restarted. Do we have to start our long import all over again, manually dealing with duplicates? Do we lose records that were generated while the process was down, however long that was? We’d rather be able to pick up where we left off. That’s not too hard in the single file case, but it gets complicated if the files may have rotated while the process was down.

Finally, sometimes different sets of files are actually of the same type. A single web server might be serving dozens of domains, each with its own set of access log files and error log files. All of the access log files use the same format, as do all of the error logs. We’d like to be able to express this elegantly, without having to copy and paste nearly identical configuration settings for each of the domains. We’d also like our log file loader to notice if a new domain is added, without the need to explicitly reconfigure and restart the loader every time.

With version 0.5, Heka introduces the LogstreamerInput to try and address these complexities. As the name implies, the LogstreamerInput’s basic logical unit isn’t a log file but a log stream. A log stream is a single linear data stream made up of one or more non-overlapping files with a clearly defined order. In our web server example, the full set of access log files for a single domain would be one log stream and the error log files would be another. Files for the other domains would be separate streams, though all access logs would be of the same type (ditto for all the error logs).

A single LogstreamerInput can manage and load many different log streams of a single type. You point a LogstreamerInput at the root of a directory tree and provide a regular expression that (relative to that root) matches the files that comprise the streams you’d like to track. The expression’s match groups are used to define a “differentiator” that distinguishes between the separate streams, and a “priority” that defines the ordering within the streams. You can also define one or more translation maps, which allow you to map from string values in your regex match groups to numeric values that specify the file order. Full details about how to set this up can be found in our documentation.

If this sounds like it might be a bit fiddly, well, it is. To simplify things, we’ve also included a standalone `heka-logstreamer` command line utility. You point this utility at your Heka configuration. It will extract any LogstreamerInput config settings and output all of the files that your config will match and the order in which they will be loaded. This will let you verify that your data will be processed correctly before you spin Heka up to start the real crunching.

When Heka is started, LogstreamerInput plugins that you’ve set up will scan their directories, looking for files that match the specified regular expressions and converting them to streams of data to be injected into the Heka pipeline. The folders will be periodically rescanned for file rotations and to see if there have been any new folders and/or files added. As data is being pulled in from the files, Logstreamer will keep track of how far it has advanced in each file, maintaining a ring buffer of the last 500 bytes read. The file location and a hash of the ring buffer contents will be flushed out to disk periodically (because crashes happen) and at shutdown, enabling seamless continuation when the Heka process restarts.

We’ve tested a wide variety of situations and are confident that the Logstreamer performs as expected in common scenarios. There will always be edge cases, of course. There are many fewer edge cases when your rotation scheme creates new files without renaming existing ones, so if losing even a single log line is unacceptable we recommend this approach. We’d love to have your help with testing. Try it out, find creative ways to break it, and let us know when you do.

Sane Concurrency with Go

Rob Miller


Glyph Lefkowitz wrote an enlightening post recently in which he expounds in some detail about the challenges of writing highly concurrent software. If you write software and you haven’t read it yet I recommend you do. It’s a very well written piece, full of wisdom that no modern software engineer should be without.

There are many tidbits to extract, but if I may be so bold as to provide a summary of its main point, it would be this: The combination of preemptive multitasking and shared state generally leads to unmanageable complexity, and should be avoided by those who would prefer to maintain some amount of their own sanity. Preemptive scheduling is fine for truly parallel activities, but explicit cooperative multitasking is much preferred when mutable state is being shared across multiple concurrent threads.

With cooperative multitasking, your code will still likely be complex, it just has a chance at staying manageably complex. When control transfer is explicit a reader of the code at least has some visible indication of where things might go off the rails. Without explicit markers every new statement is a potential landmine of “What if this operation isn’t atomic with the last?” The spaces between each command become endless voids of darkness from which frightening Heisenbugs arise.

For the last year-plus while working on Heka (a high performance data, log, and metrics processing engine) I’ve been mostly programming in Go. One of Go’s selling points is that there are some very useful concurrency primitives baked right into the language. But how does Go’s approach to concurrency fare when viewed through the lens of encouraging code that supports local reasoning?

Not very well, I’m afraid. Goroutines all have access to the same shared memory space, state is mutable by default, and Go’s scheduler makes no guarantees about when, exactly, context switching will occur. In a single core setting I’d say Go’s runtime falls into the “implicit coroutines” category, option 4 in Glyph’s list of oft-presented async programming patterns. When goroutines can be run in parallel across multiple cores, all bets are off.

Go may not protect you, but that doesn’t mean you can’t take steps to protect yourself. By using some of the primitives that Go provides it’s possible to write code that minimizes unexpected behavior related to preemptive scheduling. Consider the following Go port of Glyph’s example account transfer code (ignoring that floats aren’t actually a great choice for storing fixed point decimal values):

    func Transfer(amount float64, payer, payee *Account,
        server SomeServerType) error {

        if payer.Balance() < amount {
            return errors.New("Insufficient funds")
        log.Printf("%s has sufficient funds", payer)
        log.Printf("%s received payment", payee)
        log.Printf("%s made payment", payer)
        server.UpdateBalances(payer, payee) // Assume this is magic and always works.
        return nil

This is clearly unsafe to be called from multiple goroutines, because they might concurrently get the same result from the Balance calls, and then collectively ask for more than the balance available with the Withdraw calls. It’d be better if we made it so the dangerous part of this code can’t be executed from multiple goroutines. Here’s one way to accomplish that:

    type transfer struct {
        payer *Account
        payee *Account
        amount float64

    var xferChan = make(chan *transfer)
    var errChan = make(chan error)
    func init() {
        go transferLoop()

    func transferLoop() {
        for xfer := range xferChan {
            if xfer.payer.Balance < xfer.amount {
                errChan <- errors.New("Insufficient funds")
            log.Printf("%s has sufficient funds", xfer.payer)
            log.Printf("%s received payment", xfer.payee)
            log.Printf("%s made payment", xfer.payer)
            errChan <- nil

    func Transfer(amount float64, payer, payee *Account,
        server SomeServerType) error {

        xfer := &transfer{
            payer: payer,
            payee: payee,
            amount: amount,

        xferChan <- xfer
        err := <-errChan
        if err == nil  {
            server.UpdateBalances(payer, payee) // Still magic.
        return err

There’s quite a bit more code here, but we’ve eliminated the concurrency problem by implementing a trivial event loop. When the code is first executed, it spins up a goroutine running the loop. Transfer requests are passed into the loop over a channel created for that purpose. Results are returned back to outside of the loop via an error channel. Because the channels are unbuffered, they block, and no matter how many concurrent transfer requests come in via the Transfer function, they will be served serially by the single running event loop.

The code above is a little bit awkward, perhaps. A mutex would be a better choice for such a simple case, but I’m trying to demonstrate the technique of isolating state manipulation to a single goroutine. Even with the awkwardness, it performs more than well enough for most requirements, and it works, even with the simplest of Account struct implementations:

    type Account struct {
        balance float64

    func (a *Account) Balance() float64 {
        return a.balance

    func (a *Account) Deposit(amount float64) {
        log.Printf("depositing: %f", amount)
        a.balance += amount

    func (a *Account) Withdraw(amount float64) {
        log.Printf("withdrawing: %f", amount)
        a.balance -= amount

It seems silly that the Account implementation would be so naive, however. It might make more sense to have the Account struct itself provide some protection, by not allowing any withdrawals that are greater than the current balance. What if we changed the Withdraw function to the following?:

    func (a *Account) Withdraw(amount float64) {
        if amount > a.balance {
            log.Println("Insufficient funds")
        log.Printf("withdrawing: %f", amount)
        a.balance -= amount

Unfortunately, this code suffers from the same issue as our original Transfer implementation. Parallel execution or unlucky context switching means we might end up with a negative balance. Luckily, the internal event loop idea applies equally well here, perhaps even more neatly because an event loop goroutine can be nicely coupled with each individual Account struct instance. Here’s an example of what that might look like:

    type Account struct {
        balance float64
        deltaChan chan float64
        balanceChan chan float64
        errChan chan error
    func NewAccount(balance float64) (a *Account) {
        a = &Account{
            balance:     balance,
            deltaChan:   make(chan float64),
            balanceChan: make(chan float64),
            errChan:     make(chan error),

    func (a *Account) Balance() float64 {
        return <-a.balanceChan

    func (a *Account) Deposit(amount float64) error {
        a.deltaChan <- amount
        return <-a.errChan

    func (a *Account) Withdraw(amount float64) error {
        a.deltaChan <- -amount
        return <-a.errChan

    func (a *Account) applyDelta(amount float64) error {
        newBalance := a.balance + amount
        if newBalance < 0 {
            return errors.New("Insufficient funds")
        a.balance = newBalance
        return nil

    func (a *Account) run() {
        var delta float64
        for {
            select {
            case delta = <-a.deltaChan:
                a.errChan <- a.applyDelta(delta)
            case a.balanceChan <- a.balance:
                // Do nothing, we've accomplished our goal w/ the channel put.

The API is slightly different, with both the Deposit and Withdraw methods now returning errors. Rather than directly handling their requests, though, they put the account balance adjustment amount onto a deltaChan, which feeds into the event loop running in the run method. Similarly, the Balance method requests data from the event loop by blocking until it receives a value over the balanceChan.

The important point to note in the code above is that all direct access to and mutation of the struct’s internal data values is done *within* code that is triggered by the event loop. If the public API calls play nicely and only use the provided channels to interact with the data, then no matter how many concurrent calls are being made to any of the public methods, we know that only one of them is being handled at any given time. Our event loop code is much easier to reason about.

This pattern is central to Heka’s design. When Heka starts, it reads the configuration file and launches each plugin in its own goroutine. Data is fed into the plugins via channel, as are time ticks, shutdown notices, and other control signals. This encourages plugin authors to implement their functionality with an event loop-type structure like the example above.

Again, Go doesn’t protect you from yourself. It’s entirely possible to write a Heka plugin (or any struct) that is loose with its internal data management and subject to race conditions. But with a bit of care, and liberal application of Go’s race detector, you can write code that behaves predictably even in the face of preemptive scheduling.

Rob Miller

Heka v0.5 Released

Rob Miller


The last few months have been bustling here in Heka-land. Our community of users and contributors has been growing steadily, and real world use is paying off in feedback and pull requests. Those of us who have been working full time on Heka are taking turns instead working with Heka, deploying it and actually using it to solve operational problems. Our experiences have helped us to understand the rough edges, and to be excited about how useful Heka has already become. It’s also given us great ideas about improving usability, some of which we’ve already implemented.

It’s our pleasure to announce the release of Heka v0.5. This is a significant release, full of so many goodies that they won’t all fit into a single announcement. Expect to see additional posts on some of the new features soon. Here are some highlights (full details are available in the changelog):

  • The new LogstreamerInput allows you to specify any layout, ordering, and rotation scheme for your log files. It will read the files in order, keeping track of its location in the data stream, even through restarts and/or file rotations.
  • We’ve made major improvements to the Lua environment we expose for real time data processing and graphing. Our work building Lua filters for internal customers allowed us to start abstracting out some of the more common tasks that need to happen, and some of these we’ve put into modules available for use in every Heka SandboxFilter.
  • Among these Lua modules are our rsyslog and nginx decoder modules. Setting them up is easy: copy the log format configuration string from the other server’s config file and paste it into your Heka config file. Then wire your decoder up to a LogstreamerInput to read those server’s files from the filesystem (or one of UdpInput, TcpInput, AMQPInput, etc. to handle a stream over the network).
  • We’ve added the ProcessDirectoryInput, which manages a set of processes to be run at specified intervals, generating Heka messages from their output. You can add, remove, or change any of the data collectors without needing to restart or reconfigure Heka itself.
  • The Heka -> Heka TCP transport story is much improved. The TcpOutput now supports reconnecting with exponential back-off, along with queuing to disk to prevent data loss during connection drops. We’ve also added TLS support with full client cert authentication to the TcpInput; the TcpOutput; and the heka-flood, heka-sbmgr, and heka-sbmgrload command line tools.

Release packages for Linux and OSX are available on github. As always, we’d love to hear from you on the Heka mailing list, or in the #heka channel on, or (by popular demand) in the new #heka channel on

A Better Firefox Sync



We’re pleased to announce that the new version of Firefox Sync is available to test in Firefox Aurora. Current Firefox Sync users love the service, but have given us feedback that it wasn’t easy enough to setup or add devices, and in particular recover from a lost device. We listened to this feedback and built an easier way to safely synchronize data between the desktop and mobile versions of Firefox.

The new Firefox Sync makes it much easier to do initial setup and to add multiple devices. To test the new Firefox Sync you can simply enter an email address and choose a strong, memorable password in Firefox for Windows, Mac or Linux. Then you can easily add more computers or Android devices to sync.


How do I use the new Firefox Sync?
The new Firefox Sync feature is available in Firefox Aurora. For more details on how to test Firefox Sync, read this Sumo article.

Strong Security

We believe in trust through transparency, that’s why we’ve worked in the open to develop a strong security system around the new sync.

In simplifying the Firefox Sync set up and sign in flow, it was important not to compromise on the security of a user’s data. This release brings the same level of end to end encryption our current sync product employs, but is much easier to set up.

The key to improved convenience in the new Firefox Sync is data encryption based on a key that is derived from your password. This means the stronger your password is, the better your protection. We’ve taken this basic approach and hardened it in three ways:

First, client side key stretching is a technique that allows us to protect against man in the middle attacks, even when SSL credentials are compromised.

Second, end to end encryption means even if our servers are compromised, it is extremely difficult to access a user’s data.

And finally, public key cryptography and the BrowserID protocol allow for separation between authentication, authorization, and data storage servers – minimizing the number of servers that handle authentication material, and reducing our attack surface.

You can read a whole lot more about the security architecture of new sync in the technical documentation on github.


As with the previous version of Firefox sync, users still have the option to take their data with them and host their own sync service using the open source server-side software.

As we gain experience with this new security architecture, we’re eager to continue to find the best way to have both convenient data access and whole-system security.

What’s Next?
We’re currently in the process of preparing the new Firefox Sync feature to ship to our browser users. After that, we’ll integrate synchronization features into Firefox OS.

Finally, to help build out additional Firefox Sync features more easily, we’ve created a robust account system for Firefox users and for partners to build on our user relationships. We are excited to explore what new services we can build on top of this system, to bring new interesting features to Firefox users. We promise to keep the Mozilla mission goals about putting users first, and advancing the open Web, in all service work we and our partners do.

Mark Mayo and Cloud Services Team

Circus 0.10 released



Circus LogoI am very happy to announce the release of Circus 0.10.

Circus is a process & socket manager we’ve been developing at Cloud Services to supervise our different processes on our servers It is built using Python and ZeroMQ.

More information is available at

This new version is a major step forward for the project for two reasons. First, we’ve added Python 3 compatibility and second, we did a major refactoring of the core to make it fully asynchronous.

What’s exciting is that most of the work in this release has been done by contributors external to the Cloud Services team, making Circus what we intended at its inception: an open source tool that’s built by and for the Python & Mozilla communities at large.

Below are 3 interviews of Circus contributors. I have asked them the same set of questions.

Fabien Marty

Meet Fabien Marty from Météo France, the French national weather forecast company. He’s one of the main instigators of the core refactoring and explains us what he’s been doing to make it happen.

Tarek: Hello Fabien. Can you tell us who you are?

Fabien: I am Fabien Marty, I am 34 years old and I’ve been working at Météo France as a technical lead for 7 years.

Tarek: How did you end up using Circus?

Fabien: We’re working on a big internal project in my company, where we need to run and supervise a lot of processes (more than 80 per server) and make sure we control every aspect of the processes’ start and stop sequences.

Unlike a classical web stack, we’re working on a stack that receives a huge amount of data — satellite images, numerical modeling data, sensor data, etc. When a server is being stopped, we need to make sure we don’t lose any incoming data. That’s why the stop sequence can last for more than 5 minutes.

We built our own tool to deal with processes, called “launcher”, but it was a bit clunky. Instead of spending more time fixing it, we looked at existing open source projects and found Circus.

Tarek: How did it go with Circus at first?

Fabien: The first tries were very positive. The documentation was clear and detailed and we were able to install it and start using it right away, replacing our own solution.

Tarek: Did you have some issues?

Fabien: Yeah, we had some issues when we tested the stop sequence. As I’ve explained earlier, we have a very specific use case:

we don’t stop our processes with signals – but with flags we’re setting in a Redis server the stop sequence can last for over 5 minutes, because our processes might have to finish processing some data before shutting down we have to start and stop our processes quite often to avoid any memory issues.

Circus was not quite suited for this use case, but we kind of knew we’d hit this problem.

Tarek: What did you do?

Fabien: We started with Alex Marandon (one of the project developers) fixing a few bugs in Circus here and there and pushed them upstream. We also fixed some documentation bugs we found along the way.

Then we wrote a plugin to deal with reloading processes automatically if the process command line is changed in the configuration.

Finally, we worked on the stop sequence: since our processes can take up to 5 minutes to stop and since Circus’ core was not fully asynchronous, this would lock up the event loop and Circus would become unresponsive to any other command during that window.

After some serious design talks with the Cloud Services team, we started to work on a branch to fix this.

Tarek : Was it easy?

Fabien: Not at all ! We knew it was going to be a lot of work. But it was even bigger than we thought. Our first attempt failed: we tried to add a simple callback system hooked into the PyZMQ event loop but that made the code harder to read and understand.

That’s mainly because Circus has high level methods that are doing basic operations, so adding callbacks on those made the whole thing a callback hell.

For our second attempt, we decided to move to a pure Tornado event loop and use its coroutine decorator. That drastically simplified making Circus’ core asynchronous. Moreover, moving Circus code to Tornado coroutines was a no brainer for its PyZMQ compatibility. The library has its own bundled version of a Tornado eventloop but you can use a plain Tornado eventloop if you want, and everything stays compatible.

The bottom line is that we ended up with even simpler code!

Tarek : Are you happy with the result?

Fabien: Yes, very much. There are still a few rough edges,but it works well now and is way better than our initial custom tool. We spent more time than we initially planned but that was worth it – and we don’t regret that investment. :-)

Rémy Hubscher

Meet Rémy Hubscher from Novapost, a French Software company. Rémy is a long time Circus contributor

Tarek: Hello Rémy. Can you tell us who you are?

Rémy: Hey Tarek, I am Rémy HUBSCHER, 26 years old and I work for Novapost as an R&D engineer.

Tarek: How did you end up using Circus?

Rémy : We’re deploying our apps in clusters from 3 to 25 servers, in a private cloud on Amazon and Rackspace. We previously used Supervisor and Gunicorn to deploy them.

Our main motivation for using Circus is its decentralised behavior: every server has its own circus daemon and manages processes there. With a single circus-web dashboard it’s easy to manage them all; we can watch the socket, cpu and memory loads in realtime.

Circus is also easy to configure with Saltstack and it’s dead easy to add new processes and sockets in our stacks.

Tarek: How did it go with Circus at first?

Rémy: Great! Installing Circus in our environment was easy and the clear documentation helped us a lot there.

Tarek: Did you have some issues?

Rémy: Yes, we did have some but my colleague Boris Feld and I contributed fixes to the project that took care of them.

One issue that remains is the fact that it’s not possible to automatically close a socket when all the processes that use it are shut down. But this is going to be fixed soon.

Tarek : What did you contribute in Circus?

Rémy: We started contributing a year ago, by adding simple features like bash autocompletion and the shell in circusctl. We were also involved in brainstorming about the clustering feature, and added automatic UDP discovery of circus daemons.

We also organized a three-day hackaton on Circus in our office to work on the clustering, since we needed it.

Circus-web (the web dashboard) was initially built with Bottle, but we moved it to Tornado for simpler integration with PyZMQ

Tarek : Was it easy?

Rémy: Well, it’s still Python so we found solutions eventually. But it’s important to understand the overall architecture and design, and how Tornado’s async works. But, when there’s an issue, the services team is quite responsive on IRC

Tarek : Are you happy with the result?

Rémy: Very happy. We’ve been using Circus for three apps in production and we’re gradually moving everything else to it.

Tarek : What’s next?

Rémy: We’ll help in reviewing the pull requests on the project and answer any questions – and we will be organizing a new hackaton in early 2014, to tackle more clustering features.

Scott Maxwell

Meet Scott Maxwell, who made Circus python 3 compatible!

Tarek: Hello Scott. Can you tell us who you are?

Scott: My name is Scott Maxwell. I’m a veteran of the video game industry (since 1983) but, recently I have been working on the app store for one of the American car companies.

Tarek: How did you end up using Circus?

Scott: We are currently using Supervisord to run our servers. I used supervisor at Sony Online Entertainment and it worked fairly well for me. We are on Python 3.3 so I had to port supervisor to Py3 about a year ago.

Lately, our restart logic has gotten more complex, since we want to bounce uWSGI without losing any connections. We had to build more and more outside of Supervisor. Also, we started querying Supervisor through the RPC mechanism for monitoring purposes, and it started to crash periodically. The Supervisor team never accepted my changes so Supervisord is still limited to Py2 today and I cannot easily get any fixes they might make.

I discovered Circus through a post on the uWSGI site. When I saw that it was using ZeroMQ and that it supported signals and much richer restart hooks, it seemed like exactly what we were looking for.

Tarek: How did it go with Circus at first?

Scott: The functionality looked great, so we were very excited about the potential.

Tarek: Did you have some issues?

Scott: It was a bit of arough start because of our Py3 requirement. Porting a client/server application to Py3 is much harder because exceptions are caught in one process and sent to the other, losing much of the context. Also, right around the time I was finishing up, the big async upgrade dropped.

Once I got the Py3 port done, I realized that a few other features were missing. For instance, supervisor lets you specify the signal to use for stopping the process. Fortunately this was a very easy feature to add to circus.

Tarek: What did you do?

Scott: I just put my head down and got the initial port done. Once I had basic functionlity in place, I issued a pull request and got everything integrated. From there, the real collaboration began.

Since I was new to the project, I lacked deep understanding of how everything fit together. I spent many hours trying to resolve the resource warnings that the Py3 runtime exposed, without great success. But you, Tarek, were able to fix the majority of them very quickly. When I ran into trouble with the flapping plugin, Alex (@amarandon) jumped straight in. Rémy (@natim) was also very helpful.

Tarek: Was it easy?

Scott: The work was hard, but the collaboration was easy.

Tarek: Are you happy with the result?

Scott: Very happy so far. I expect to have my entire stack moved over to circus on one environment in the next few days. Then I will be able to fully judge the result.

Tarek: What’s next?

Scott: I think Circus is complete for my needs at this point. But it is very comforting to know that if I need a new feature that is general purpose enough, the Circus team will take my change in a timely manner. It gives me great confidence moving forward.

If you are interested in contributing to Circus – visit

Many thanks to Toby Elliott for all the English proofreading. :)

– Tarek, on behalf of the Circus team