Back in December I went out to Amsterdam to setup Mozilla’s first international datacenter. The build out included two database machines, three webservers, two VMWare ESX servers and two Citrix Netscalers (along with various PDUs and networking gear).
When I got back I was on a kick to get as much usage out of that facility as we could and very quickly got all of our static websites replicated out there. As expected, response times to www.mozilla.com for our European users dropped dramatically. We also setup a CVS mirror and eventually a community l10n server.
One of the last things to make its way out there is all the dynamically generated websites, most notably addons.mozilla.org. The problem, or difficulty, is that all of these sites rely on some MySQL database and replicating the read-write portion of it is not for the faint of heart (master-master across 6000 miles?).
I took an initial stab at replicating just the read-only slave databases out to Amsterdam. This is doable and has been running since February taking a consistent 60Kbps to maintain.
But reworking all the webapps to know about a read-only site vs. a read-write site is challenging and requires time that hasn’t been available. And the more I thought about it the more I came to think that this method wasn’t right. The more datacenters Mozilla has, the more complicated database replication becomes and the more bandwidth I need just to maintain content syncs and databases.
I started wondering how other sites have solved this problem and turned to Wikipedia to see if I could figure out what they were doing and went so far as to email them to see how they invented this wheel. I got a very good email back from Mark and Brion which talked about how they use Squid to proxy/cache content from their primary datacenter in Florida. They use explicit cache invalidations when a page content has changed and use a static geographic database for geographic load balancing (versus the dynamic method we’re using).
In a lot of ways, their application mirrors the issues AMO has. It’s a highly cachable site too.
For the past week we’ve been testing this method using the Netscalers and are hoping to switch to production next week. Morgamic sent the following out earlier today:
Matthew has set up AMO on the .nl cluster using the netscaler there. He asked us to take a look. The IP there is 126.96.36.199 — should be able to test by changing your hosts file.
Notes about the install:
- public pages are cached entries delivered from the local NS
- logged-in pages are actually from the sjc cluster
- admin/dev/editor pages are all still from the sjc cluster
This is an overseas cache that should help offload a lot of the public traffic, so it’s good news. Can you guys think of any reason why we can’t push this live next week? Can you take a look and see if everything looks alright w/ the .nl pages?
I’m really excited about this! If it works as expected, it drastically reduces the entry barrier to opening POPs in other countries and means we can start improving user experience without a complicated infrastructure built out.
If you happen to test the site out, please let me know your feedback.