China, Amsterdam, San Jose and global load balancing

Mozilla’s current GSLB (global server load balancing[1]) solution (Citrix Netscalers) is a mix of active proximity probes and static map assignments.

The algorithm first checks to see if there’s a match in the static maps and then falls back to proximity metrics. If that’s missing, it’ll round-robin through all the GSLB sites (effectively three – San Jose, Amsterdam and China – though technically it’s six and actually it’ll only round-robin through sites that have that service/site defined).

In practice, this round-robin fallback hasn’t been terribly user impacting mostly. You’d only ever hit the round-robin fallback if no one in your network (/16 in this case, but based on the source address of whatever name server you used) had ever accessed any of Mozilla’s web properties. And while that’s possible, the performance his is really minimal. But then up until now, we’ve only had two GSLB sites to send users to and San Jose and Amsterdam are well connected.

China changes things. Connectivity to mainland China can often be congested and can induce a lot of latency (which matters less for downloads than it does for interactive sites). If you’re in New York and fall through to round-robin, you very likely don’t want to end up taking 80ms to the west coast and another 300ms-400ms to China. This is a bad user experience.

If you don’t believe me, look at my trip times from San Jose to some Mozilla gear in China:

mrz@boris [~/] 15> ping mozcn01
PING mozcn01 ( 56(84) bytes of data.
64 bytes from mozcn01 ( icmp_seq=6 ttl=237 time=389 ms
64 bytes from mozcn01 ( icmp_seq=9 ttl=237 time=439 ms

mrz@boris [~/] 17> ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=0 ttl=231 time=206 ms
64 bytes from icmp_seq=1 ttl=231 time=212 ms
64 bytes from icmp_seq=2 ttl=231 time=205 ms

Since December I’ve been working on ways to exclude China from the fallback round-robin method while making sure our Chinese users don’t goto San Jose or Amsterdam.  The latter was easier – using MaxMind’s GeoLite database I’ve built a static proximity location map assigning all IP addresses in China to the China datacenter.  The former, excluding China from round-robin, took longer to figure out.

One of the GSLB methods the Netscalers have is a “weighted round-robin” and while it’s not documented, it turns out that weights apply even to the fallback round-robin method.  Weights range from 1 to 100.  So now I can do something like:

bind gslb vserver glb-mozcom -serviceName mozcom-80-sj -weight 100
bind gslb vserver glb-mozcom -serviceName mozcom-80-cn01
bind gslb vserver glb-mozcom -serviceName mozcom-80-cn02
bind gslb vserver glb-mozcom -serviceName mozcom-80-nl01 -weight 50
bind gslb vserver glb-mozcom -serviceName mozcom-80-nl02 -weight 50

which will send 200 requests  to either San Jose or Amsterdam before sending -1- request to China.

It’ll be interesting to see what the actual user impact of this is and if anyone not in mainland China ends up hitting China.  We’ll be testing this for a week, starting next Tuesday, with and looking at the web server logs to see where users are coming from.

I’d also be interested in feedback from you if you find yourself in China and you should be elsewhere or your find yourself in China and is quicker for you!

[1]GSLB works by looking up the IP address of the host that made a DNS request to and determining which data center is closest.   This is a fairly common method to do global load balancing.

Tags: , ,

Categories: load balancing, Mozilla, Networking