IT: General update

jakem

1

A lot has been going on lately in IT, and we haven’t had a chance to make a post about it. I thought it was about time to get out an update about the team and what we’ve been working on.

Personnel changes:

For starters, the IT team has grown quite a bit since I started, and that was only back in March. There’s plenty of new folks I haven’t met in person yet, and also some departures. There’s some training time of course, but our new hires are all smart folks, and are picking up the ropes very well. Some are digging in to old projects that have been back-burnered for a while, and others are diving in to current issues.

At the same time, it’s baby fever in IT! Just a month ago my wife and I had our first, our daughter Zoey. Jeremy is out on leave right now, with his new daughter Mira. He’s also a new parent. Rob will be taking leave any day now for *his* new daughter… just as soon as she’s born. He’s not a new parent, but I don’t expect the first couple weeks will be much easier on him than Jeremy or myself. Finally, Justin (jabba) is on track to have their first baby in late February.

 

Major Projects:

Something I’ve worked on myself- the www.mozilla.org and www.mozilla.com site merge! This is now largely completed, and we’re just in cleanup mode. There’s quite a major follow-up project to redesign this site into a nice Django / Playdoh Python app, instead of the old-and-crufty PHP that it is right now. I was the main IT lead on this, but it wouldn’t have been possible without a lot of work from webdev… especially James Long, Anthony Ricaud, and Fred Wenzel. All I did was deploy their code a few times and tweak some Apache configs… they had to make the 2 code bases actually play nicely together. :)

AMO has been moved to Phoenix! Huge project, and the IT credit goes to Jeremy Orem. On a related note, thanks to his efforts the AMO webdev team is now actually able to do their own code pushes, generally without any significant involvement from IT.

There’s been a lot of work on other new clusters in Phoenix… off the top of my head, there is a new Engagement cluster designed to host short-run sites (glow and twitterparty would have been here, webifyme is here, etc). There is also a new Generic cluster, designed to replace the existing one in SJC. It’s got a good bit more horsepower, as well as being based on RHEL6 instead of 5. Props to Corey Shields for leading both of those 2 cluster rollouts.

We have started a pattern of rolling out “admin” nodes with each cluster. The admin nodes are responsible for pushing new content, running cronjobs, and generally managing the cluster as a whole. In the past we’ve centralized these things onto just a couple admin nodes, doing things for all of our clusters. This works, but gets convoluted fast and doesn’t scale as well as we’d like. So far the new Generic, Engagement, and Addons clusters in Phoenix are set up this way, with more to come. A lot of people have been involved with this, from the puppet modules to the servers themselves.

The puppet training from a couple months ago has come in very handy, and a good number of our classes and modules have been reworked. I’m already noticing it “feels” easier to find things now… not sure if it’s just me getting a handle on our puppet deployment or if it’s actually better, but either way the change is very good. I want to say Justin Dow is largely responsible for this, but honestly so many people have been committing to our puppet repo that it’s hard to keep up with.

Lots of work has gone into our internal inventory system. It is now actually possible to control DHCP allocations from within the inventory system, as well as to define your own key/value pairs for systems in it. This is pretty huge, and there are plans to expand this further, so that inventory becomes more and more of a single source of truth for our infrastructure. This is almost entirely due to the efforts of Rob Tucker.

The Mozilla Developer Network has finally been upgraded to a much newer version of MindTouch. This has been on the plate since at least April, when I took it over, and I believe quite a while before that. Some of the MDN guys are referring to as the most stable and fastest MDN they’ve had in a long, long time… thanks in large part to some good detective work by one of our technical contacts at MindTouch, Brian.

 

Upcoming projects that I can think of off the top of my head:

We are working towards merging our multiple Zeus LB clusters together, to form one super mega-cluster. This will let us improve our global load balancing capabilities, and potentially bring better global caching to a wider number of websites. No official timeline on this, but it’s on the plate… has been for a few months. In a way this is kinda like hosting our own mini-CDN.

We’re planning to move into a new datacenter very soon. Not sure how much info on this can be public, but as anyone could guess this will mean downtime for some systems, replacement for others, and generally things getting swapped around to make the migration as painless as possible. There’s an insanely complicated Gantt chart for this. Datacenter migrations are serious business, and we’ll do our best to minimize any disruptions.

 

I’m sure there’s many other IT projects I’ve forgotten… both recently completed and upcoming. Feel free to drop me a line if you know of any, and I’ll update this post. :)

 

- Jake

One response

  1. Brian Hourigan wrote on :

    I am very excited about all the changes that are going on and I look forward to all the exciting stuff I keep reading about going into production