Fx 3.0.7 release & this morning’s network performance issues

6

In computers systems (and with others) there are often bottlenecks and removing those often reveals new ones. Today’s an example of just that.

During a normal release we have tools we can use to adjust the rate at which we offer updates. We use this to reduce load on the back end systems or to help reduce load on the download mirrors.

Our preference is to do a release completely unthrottled so users get timely updates.

During the Firefox 3.0.6 release we had a number of system problems that prevented us from releasing updates unthrottled. These were all detailed in the Post Mortem.

To the Operations Team’s credit (and I’m serious here), most of those issues were removed prior to yesterday’s Firefox 3.0.7 release and by 9am this morning we were cranking along – no throttling.

Unfortunately the Mirror Network started showing pressure and instead of throttling back on the release, we opted to augment the Mirror Network with our own download servers in San Jose.

That pushed our aggregate bandwidth out of San Jose to nearly 3Gbps:

Global Bandwidth over 2Gbps

At around this time offsite monitors starting alerting about a sharp increase in page load times to various Mozilla website properties. Took a bit to track down but the newly turned up Level 3 peer was saturated:

Level3

Any outbound traffic whose best route was out through Level3 was impacted. We fixed this temporarily by turning down Level3.

(I should note that our design requirements for upstream transit is at least two connections per provider so we can push 2Gbps. Level 3 is no exception, however, the second connection has been offline because Derek was seeing a lot of packet loss across the optical connection which coincidentally got resolved today.)

These problems are solvable and we’ve had plans to put tools in place to balance load during situations like this. Unfortunately, today’s issues came up a lot quicker than we had planned.

A couple things we’ll be looking at before the next release:

  1. Evaluating Internap’s FCP to dynamically shift traffic based on cost and performance metrics. (And as luck would have it, this showed up this afternoon!)
  2. Looking to see how we can better balance outbound traffic outside of using FCP.
  3. Adding capacity to our Mirror Network (can you help?).
  4. Evaluating options around upgrading from several 1GE upstream connections to 10GE connections.

This is a great problem to have, to be sure, and a far cry from the panic three years ago of “OMG we’re about to push 100Mbps!”.

I’m really interested in how others have gone about solving problems like this. Leave me comments.

Tags:

Categories: Mozilla, Networking

6 responses

  1. Stephen Donner wrote on :

    Is FCP the stuff they acquired from NetVMG or Sockeye?

  2. mrz wrote on :

    NetVMG.

  3. Peter wrote on :

    See if you can use Peer 2 Peer services, behind the scenes. Modify your update scripts to first check BitTorrent (it gets a .torrent tracker when it checks for updates), then to maybe another P2P network. If the download is too slow, try a different method. Vuze/Azureus uses P2P to download updates for their product, and they are always fast, unlike much of the other content available on bitTorrent.

  4. Arthur wrote on :

    The thought of sending GB after GB of identical data down the same tubes sometimes makes me cringe. A better kind of multicast support would be nice, but I’m not seeing it coming from anywhere.

  5. Gen Kanai wrote on :

    I’ve been working hard to get new mirrors in Asia for many months with some good success, but it’s a long and involved process and takes time and effort and knowing the right people. I think the current documentation that we have is adequate but pretty basic. Perhaps we might want to think about revising the documentation to better highlight that we want more mirrors and how to easily become a mirror? Also, perhaps we can brainstorm as to where we should go to look for more mirrors. I’m going through my contacts, but it’s ad-hoc and I’m pretty sure that there’s better places to be looking or asking for help that I don’t know about.

  6. Gen Kanai wrote on :

    Also, the comment that I get most often from mirrors is for Bouncer to have more targeted geo-location capability. I know it is being worked on but the sooner we can serve local users with local mirrors, wherever they are in the world, the happier and cheaper it will be for those mirrors who have committed to supporting Mozilla. Whatever we can do to fast-track this will be a big win for both the user and the mirror.