How do you get a dynamic website in another country without any servers?

10

Back in December I went out to Amsterdam to setup Mozilla’s first international datacenter. The build out included two database machines, three webservers, two VMWare ESX servers and two Citrix Netscalers (along with various PDUs and networking gear).

When I got back I was on a kick to get as much usage out of that facility as we could and very quickly got all of our static websites replicated out there. As expected, response times to www.mozilla.com for our European users dropped dramatically. We also setup a CVS mirror and eventually a community l10n server.

One of the last things to make its way out there is all the dynamically generated websites, most notably addons.mozilla.org. The problem, or difficulty, is that all of these sites rely on some MySQL database and replicating the read-write portion of it is not for the faint of heart (master-master across 6000 miles?).

I took an initial stab at replicating just the read-only slave databases out to Amsterdam. This is doable and has been running since February taking a consistent 60Kbps to maintain.

But reworking all the webapps to know about a read-only site vs. a read-write site is challenging and requires time that hasn’t been available. And the more I thought about it the more I came to think that this method wasn’t right. The more datacenters Mozilla has, the more complicated database replication becomes and the more bandwidth I need just to maintain content syncs and databases.

I started wondering how other sites have solved this problem and turned to Wikipedia to see if I could figure out what they were doing and went so far as to email them to see how they invented this wheel. I got a very good email back from Mark and Brion which talked about how they use Squid to proxy/cache content from their primary datacenter in Florida. They use explicit cache invalidations when a page content has changed and use a static geographic database for geographic load balancing (versus the dynamic method we’re using).

In a lot of ways, their application mirrors the issues AMO has. It’s a highly cachable site too.

For the past week we’ve been testing this method using the Netscalers and are hoping to switch to production next week. Morgamic sent the following out earlier today:

Matthew has set up AMO on the .nl cluster using the netscaler there. He asked us to take a look. The IP there is 63.245.213.31 — should be able to test by changing your hosts file.

Notes about the install:

  • public pages are cached entries delivered from the local NS
  • logged-in pages are actually from the sjc cluster
  • admin/dev/editor pages are all still from the sjc cluster

This is an overseas cache that should help offload a lot of the public traffic, so it’s good news. Can you guys think of any reason why we can’t push this live next week? Can you take a look and see if everything looks alright w/ the .nl pages?

I’m really excited about this! If it works as expected, it drastically reduces the entry barrier to opening POPs in other countries and means we can start improving user experience without a complicated infrastructure built out.

If you happen to test the site out, please let me know your feedback.

Categories: Mozilla

10 responses

  1. Mike Beltzner wrote on :

    This is really fascinating stuff. It would be cool to build up some best practises documentation on MDC to help other website authors grow and scale over time like this.

  2. Adrian Chadd wrote on :

    I’m glad to hear another open source project has discovered just how flexible and powerful the Squid web cache is.

    We’d like to build a repository of information showing off just the sorts of things you’re doing. It’ll come online with the new Squid website I’ve almost finished (http://new.squid-cache.org/).

    Let me know if you’d like to share your experiences!

  3. Neil wrote on :

    So if it takes 60Kbps to maintain a partial database sync, how much bandwidth would it take by comparison for the web server to talk to the sjc MySQL server?

  4. mrz wrote on :

    Based on traffic graphs in San Jose on the master and slave MySQL server I can infer bandwidth.

    mrdb03 is the AMO master read-write database server – it’s pushing ~88Mbps outbound and pulling in ~6Mbps.

    mrdb04 is mrdb03′s slave – it averages about 1.45Mbps in either direction.

    For whatever reason, from San Jose to Amsterdam, I’m only able to sustain something less than 5Mbps (it took about 1GB/hour for the initial MySQL replica tarball get to from San Jose to Amsterdam).

  5. mrz wrote on :

    Not to mislead you but we’re using Citrix Netscalers as our load balancer and since they do caching as well, I’m using them for that instead of building Squid boxes.

    We went non-open source for load balancing mostly for performance and costs (I would have spent nearly as much on a farm of machines doing ssl offload/proxy/whatever else as I did on the Netscalers).

    That said, I’m really interested in Varnish (http://varnish.projects.linpro.no/) as a high performance open source solution.

  6. shaver wrote on :

    This is pretty great to see. I’ve always been a fan of the explicit invalidation model, and it’s excellent to see this first step towards it playing out.

    Squid would also be a harder thing for us to use here because it doesn’t seem to support an HTTP API for invalidation-by-pattern, which is critical to having the update-check responses handled correctly. (I think we could get all the rest of the performance we need with a 4-core Dell machine and an SSL offload card, but I’m not the one who’d get paged when it choked to death on a Firefox update, so I’m not going to push my luck!) Sure would be nice to have Vary:, though! :)

  7. Frédéric Wenzel wrote on :

    I really like to hear this as well. From my first tests I must say the Amsterdam AMO works quite well and I couldn’t notice any drawbacks.

    Though, this only helps with the read-portion of a page, right? Luckily AMO is largely read-only. But in case you end up having a lot of write access to the Amsterdam datacenter wouldn’t you end up having a bottleneck between SJC and AMS?

  8. mrz wrote on :

    Re: bottleneck – possibly but that’s not much different than how it is now (you have to go to San Jose for writes). I over provisioned the bandwidth out of Amsterdam so that shouldn’t ever be a bottleneck.

    I think we’re on track to flip this this coming Tuesday though!

  9. Pingback from mrz’s noise » Blog Archive » Where in the world is AMO? on :

    [...] flip the switch on getting addons.mozilla.org to be served out of Amsterdam as well as San Jose. I talked about how we’re doing this last week if you’re interested. We’ll make this change [...]

  10. Pingback from mrz’s noise » Blog Archive » I'm a Mac? and why the open Web rocks. on :

    [...] to a black MacBook that no one wanted (it seemed the perfect size to take with my on my trip to Amsterdam) and now I’ve replaced my desktop with a Mac [...]