Author Archives: James Socol

About James Socol

James Socol leads the Community Platforms group, which encompasses the Support, MDN, and Community Tools teams. He lives in New York, talks about deployment a lot, and sometimes takes pictures.

SUMO is Deploying Continuously

As of a couple of weeks ago, the Support Engineering team is deploying changes to SUMO continuously. This is the result of over a year of work by developers, sysadmins, QA, and more, and I couldn’t be prouder of everyone involved.

Continuous deployment is a complicated term, but at its core it means the engineers have the ability to update the software that runs SUMO at any time.

Not only do we have the ability, but we’re using it. Instead of pushing changes once per week, or when a feature was “complete,” we’re deploying changes as often as we can. That means bug fixes and new features will make it to our users and contributors much, much faster. The first week we pushed seven code updates in four days with zero downtime.

Watch the performance graphs!

An example of a mid-day performance improvement.

Beyond getting fixes, features, and improvements out to users faster, we have made a number of improvements on the road to getting here.

  • Simplified and automated our deployment process.
  • Automated test coverage of the client-side parts of the code.
  • Made our tests run as close to check-in time as currently possible.
  • Added the ability to dark launch and use feature flags to integrate continuously.
  • Started graphing hundreds of application metrics in real-time.
  • Made our staging servers consistent and up-to-date environments.
  • Improved our planning process for predictability.
  • And a lot more…

Those are all necessary for us to deploy continuously, but they are also great things to have by themselves. The road to get here has been long, but filled with positive changes along the way.

So what will be different? How will this affect our users and contributors?

Primarily, our users will have a better experience, as issues we discover won’t be around very long. More subtly, during this process we have eliminated downtime from all but the rarest and most complex of code changes, we’ve dramatically improved our confidence in the code we push, and we’re able to objectively monitor and improve in ways we haven’t been able to before.

Again, I can’t express enough how proud I am of everyone involved for reaching this point, and how happy I am that the transition was an utter non-event—just like our deploys are now.

We will continue to improve this process, and refine our planning around it. And we will share and export everything we’ve learned on the way, both to other web projects within Mozilla and to the web development and sysadmin communities at large.

SUMO 2.4 – The end of the tunnel

Two weeks ago, we released SUMO 2.4, completing a year-long project to replace the SUMO platform!

SUMO 2.4 moved the last bits of functionality into Kitsune, our Django-based platform. These include user features like login/logout, registration, and profiles.

This represents a significant milestone and success for the SUMO project, and is particularly meaningful to the development team. We’ve been working toward this since January 2010, and seeing it completed is an amazing feeling.

Over the past year we’ve progressively replaced pieces of our old platform with new code:

  • In May, we took our first step by transitioning to new Search Result Pages.
  • In July, we switched the Discussion Forums, and started authenticating users in both systems.
  • In August, we turned on the new Support Forum section.
  • In September, we added the new Army of Awesome, built very rapidly on the new platform.
  • Just recently, in November, we brought the new Knowledge Base, the largest, and most complex part of SUMO, online.
  • And with 2.4, we’ve brought over the last piece, User Accounts.

This final step in the migration to Kitsune opens up a bunch of new doors for features and improvements. For example, user registration is much simpler now. We’re transitioning data internally to be more secure. The entire site is faster and puts a lot less load on our servers, meaning we can serve more traffic with the same hardware.

We are especially happy we were able to complete this transition before the Firefox 4 release. Being entirely on the new platform gives us more confidence in our ability to keep helping users even with traffic spikes from the release.

We devoted 2010 to investing in this new platform, designed specifically to make it easier for our awesome community to help 400 million Firefox users worldwide. In 2011, we’ll start seeing the payoff of that investment, for our developers, contributors, and users, and expect to see SUMO really take off!

SUMO 2.1: New Discussion Forums

On Wednesday this week, we migrated the discussion forums on support.mozilla.com to our new platform.

The forums we moved over are

We did not move the Firefox support forum: that is coming in our next major milestone. These discussion forums join search results as the second component of the site to move over to the new platform.

The new discussion forums are very similar to the old discussion forums. We tried to keep big changes to a minimum. If you see something that doesn’t seem like it’s working right, let us know in the comments here, or in the Contributors’ forum.

The Evolution of SUMO

(Guest post by James Socol of the webdev team. Comments or questions? Head on over to James’ original blog post and comment there!)

When I joined the SUMO team six months ago, the team was just starting a discussion of “where do we go from here?”  SUMO was built on a CMS called TikiWiki, and had diverged pretty significantly in two years. (David Tenser wrote a more detailed history if you’re interested.)

After a few months of talking and testing—and a few changes of direction—we’ve decided that SUMO will follow our colleagues on AMO and move to a custom web application, built on Django, a development framework in Python.

Why are we committing to such a dramatic new direction? Three major reasons. Keep in mind that SUMO was built on TikiWiki 1.10, a little more than two years out of date.

Performance

TikiWiki is a very feature-rich application. An unfortunate trade-off for us is performance, especially on a site serving 16 million users every week. As our European users in particular know, SUMO can be unacceptably slow at times, especially when editing articles. Many of the changes we made to the platform—most of which were contributed back over the past few months—were to improve performance via tools like output caching, database replication, and just refactoring. When we evaluated the latest version of TikiWiki, we found that performance was around the same, on average.

In the new platform, we’ll be taking advantage of techniques now available, including query and template/fragment caching and expect to see dramatic performance improvements. We’ll also be avoiding some of the performance pitfalls that TikiWiki fell into over the years with improvements to the security, database, and templating layers, among others.

But the biggest performance impact—I expect—will be moving from a general-purpose CMS to a dedicated web application, focused on providing the SUMO experience.

Hackability

To work on SUMO, you have to overcome a steep learning curve. Components tend to be tightly-coupled, or grouped in unintuitive ways, and are not as extensible as we’d like. The lack of a comprehensive test suite leaves changes to important sections of code open to introducing regressions in otherwise unrelated, dependent areas. SUMO 1.x also fails to function without a relatively complete copy of its database, which makes it difficult for community members outside the company to contribute.

With the new platform, and some discipline from the team, our goal is to improve all of these and make it easier for someone to get started hacking on SUMO.

  • We’ll be striving to keep code loosely-coupled and extensible—including using existing or external libraries whenever possible, and turning our own contributions into external libraries where possible.
  • We’re adopting a test-driven development workflow to ensure that our components are easier to safely hack, and lighten the load on our QA team by reducing regressions.
  • TDD and Django will make it easier to work without a copy of the database, using fixtures and migrations to minimize the dependence on real data.

The net effect of these decisions will be to lower the barrier to entry to SUMO development, and hopefully make useful code available to other projects. Wil Clouser listed more strengths of Django as a platform when the AMO team decided to switch.

Strength in Numbers

By using the same platform as AMO, both teams will benefit from sharing code and resources. We’re already using the same template adapter, database router, caching layer, and HTML sanitizer. As open source developers often say: “with enough eyes, all bugs are shallow,” and by sharing code we get more eyes on it. We’ll benefit from insights the AMO team has gleaned by starting the process of moving from a PHP framework to Python just ahead of us. We’ll even be able to send code reviews across teams and benefit from deeper knowledge of the various problem domains we share: have a question about localization? Both teams can share expertise and best-practices.

Solving problems once and sharing the solution directly reduces the amount of work both teams have to do. And when SUMO writes code in such a way that AMO can use it, we can also release it separately so others can benefit from our solutions—and point out flaws and contribute improvements.

Other Changes

Also among the changes coming in the next year:

  • Version Control System. Though we don’t have a specific plan in place, it seems likely that SUMO will be moving from SVN to Git for source control. Because Git is distributed, it allows us to use a more collaborative workflow, and it’s easier for us to push our code to public repositories like Github.
  • Continuous Integration. We’ll be using Hudson for continuous integration, which will automate our tests and alert us to potential issues and regressions. The web QA team has also been working to make sure our Selenium tests can run through Hudson, greatly increasing test coverage for a web application like SUMO.
  • Interface Localization. One of the ways we plan to improve the SUMO experience this year is by moving our interface localization to gettext, which is an industry-standard tool for localization. As we move parts of the site from TikiWiki to Django, those new sections will be localized via gettext, which helps us take advantage of our great community with tools like Verbatim.

A Foundation for the Future

The goal of all of this work—and it will be a lot of work—is to put SUMO on a solid foundation for future growth and, at the same time, improve the experience for everyone—from developers to contributors to localizers to visitors. We have a daunting and aggressive road ahead of us, but I’m confident that we’ll emerge in a better place.

SUMO 2 is codenamed Kitsune, and is already up on Github.

Comments or questions? Head on over to James’ original blog post and comment there!