- hg push server had an issue last week, failover to a redundant push node happened with little impact
Details: the way hg is setup there is a single server for changes over ssh (pushes) and a whole pool of servers for consumption through http. This single server has a passive replica that we’ve never had to rely on before last week. Failover to the backup node worked fine as did failing back to the primary node when it was fixed.
- Mozilla is now contributing to pool.ntp.org
We make use of GPS based time servers in our primary data centers to ensure that our infrastructure is as close to being in sync as possible. These servers are now contributing back to the greater Internet community
Software Carpentry is running a 2-day software skills boot camp in Boston, June 24-25th 2013, for women in science, engineering, medicine, and related research
areas. Registration is $20.
Boot camps alternate short tutorials with hands-on practical exercises. You are taught tools and concepts you can use immediately to increase your productivity and improve confidence in your results. Topics covered include the Unix shell, version control, basic Python programming, testing, and debugging — the core skills needed to write, test and manage research software.
This boot camp is open to women at all stages of their research careers, from graduate students, post-docs, and faculty to staff scientists at hospitals and in the public, private, and non-profit sectors.
Registration is $20; to sign up, or find out more, please visit the announcement at http://software-carpentry.org/blog/2013/04/announcing-wise-bootcamp.html. If you have questions, there is an e-mail link on the announcement page.
For those curious, they are using sqlite, not MySQL or PostgreSQL, and I will be helping out with the SQL parts. There are about 2 months left but the boot camp is about 2/3 full right now, so I wanted to make sure this opportunity was spread to as many people as possible so they do not hear about it too late.
I am not at Percona Live this week because I opted to stay home after a crazy year of travel (41 talks in 11 different countries on 3 continents in the past year). However, I realized today that Percona Live has no Code of Conduct.
I will not be attending any Percona Live events until there is an acceptable Code of Conduct. MySQL is the world’s most popular open source database; the community deserves a Code of Conduct.
ETA: I have contacted Kortney, the conference organizer for Percona Live, and asked for a Code of Conduct to be put in place ASAP.
ETA: If you want to know why this is an issue, see http://adainitiative.org/what-we-do/conference-policies/
ETA: This is my personal statement, and not a statement of what any of my Mozilla colleagues may feel. Other colleagues, including employees under me, may choose to attend or even present at any events they wish. I personally do not feel comfortable at a conference with no Code of Conduct, this is not a reflection on the technical merits of any conference.
At this month’s Back Bay LISA, Matt Simmons (aka Standalone Sysadmin) set up Lightning Talks. There were 9 presentations, and the videos are up! The playlist is on YouTube but here is a list of all the videos, with the descriptions taken from Matt’s blog post:
Back Bay LISA Lightning Talks
Mentoring by Matt Finnigan
Matt Finnigan gave a talk discussing the LOPSA Mentorship program. If you aren’t familiar, the mentorship program is a free service offered by LOPSA, where any admin who needs help, either with a project or just general career guidance, can sign up to be connected to someone with experience in their target area. You need to be a LOPSA member in order to be a mentor, but being a protege is open to anyone, regardless of LOPSA membership.
Cooking by Adam Moskowitz
Adam Moskowitz gave a talk discussing cooking for system administrators. He appealed to our sense of making things as well as our need of healthy food and good value. Adam encouraged us to try cooking, and although most people thought it was expensive to property outfit a kitchen, he reminded us that it was actually a fraction of the price of our new laptops, and the kitchen gear would last a lot longer.
Amazon SMS by KM Peterson
This talk is a result of KM Peterson’s search for a provider-agnostic method to send SMS messages that didn’t break the bank or involve maintaining an array of modems. He ended up setting up a script to talk to Amazon’s SMS service, and provided us example code in his slides.
SmartOS by Nahum Shalman
Nahum Shalman gave a really nice introduction to SmartOS, a derivative of OpenSolaris which is maintained by Joyent. Interestingly, the Linux-native KVM was ported to the SmartOS kernel, allowing creative and secure uses of jails and virtual sandboxes, all taking advantage of native ZFS, dtrace, and all kinds of delicious Solaris-y goodness.
MySQL and Puppet by Sheeri Cabral
Sheeri Cabral came from Mozilla to talk with us about how they’re deploying MySQL using Puppet. Her slides had example code, and she walked us through the abstracted object and up to the deployment on the actual nodes.
Secrets by KM Peterson
KM Peterson’s”second talk was on Shamir’s Secret Sharing Scheme, aka ‘SSSS’. The idea behind this crypto tech is that you have a secret which you want to ensure can only be recovered by the collaboration of a minimum number of involved people – say three of your team of five. You encrypt the plaintext and generate as many keys as people you have, and tell the app how many should be required to release the information. To pull the data out, you provide any of the generated keys, as long as the number of different keys meets the minimum determined when the data was encrypted.
Stick Destroyer by John Jarvis
John Jarvis talked to us about a creative use for his Raspberry Pi – he securely erases flash media using Stick Destroyer. He rigged up a light so that you have a nice visual indicator of when the stick is being erased, and when it’s done.
Sensu by Pat Cable
Pat Cable showed up to talk about Sensu, a ruby-based monitoring solution that uses AMQP queues to distribute tasks around a monitoring infrastructure that can scale out horizontally to monitor extremely large numbers of machines. It’s definitely a “next gen” monitoring solution that you should be aware of.
Sysadmins and Doctors by Matt Simmons
I got up in front of everyone and talked briefly about something that I’ve noticed – mainly about how I see our profession splintering, but that the splintered elements (such as network and storage administrators) aren’t actually specialties of “system administrators”, it’s much more like the specialized administrators are specialist doctors, and system administrators are like general practitioners. The idea is still half baked, but that’s the fun of a lightning talk, right? I didn’t offer any answers, but I asked a lot of questions.
This morning I watched Tomas Ulin’s Keynote at Percona Live: MySQL Conference and Expo, delivered yesterday. I missed this live as I am not at Percona Live (I am on a conference hiatus from March through September for personal reasons). As far as the technical content in it, there have been a few posts about the Hadoop Applier and MySQL 5.7, so there’s not much of a need to delve in there.
Message #1: Failure
I was impressed that Ulin spoke of failure. Around 7:27 in the video above, Ulin says, “We really failed with 5.0,” and “even 5.1 we weren’t fully and back on track when we released.” He spoke about the new way MySQL 5.5 and 5.6 were engineered, a hybrid agile/milestone development cycle. There are some hidden messages here:
Hidden Message #1: Oracle is a great steward for MySQL
MySQL 5.0 was GA on October 2005 and MySQL 5.1 was GA on Nov 2008. This was before Oracle was ever in the picture. Ulin said MySQL 5.0 and 5.1 failed, mentioning that the ship cycle was rushed and features were released when they were not ready, causing technical debt. MySQL 5.5 and 5.6 are different, and the hidden message is that Oracle had a part in making this better. And honestly, I believe that. Say what you will about Oracle, but this cannot be argued: they do know how to develop and ship a product.
When Sun bought MySQL, I was pretty hopeful. I knew a bunch of folks within MySQL that were unhappy, and from what I gathered, MySQL did not really need a parent company, they needed a *parent*. It looks like Oracle has been great for getting MySQL releases in shape – MySQL 5.5 had a LOT of great features from the community, when previously it could take years before a community patch was accepted, and MySQL 5.6 has a lot of innovative features from strong developers.
Hidden Message #2: Oracle is more reliable for MySQL releases
With 5.5 and 5.6, the 2-year development cycle has been almost exact – MySQL 5.5 was GA in Dec 2010, 25 months after 5.1, and MySQL 5.6 was GA in Feb 2013, 26 months after 5.5. I remember the agonizing wait for MySQL 5.0, and it looks like under Oracle we will not have a debacle like that again. Ulin specifically mentioned a 24-month cycle.
Speaking about cycles, have you noticed that Oracle has not stopped providing the MySQL binaries and code, even for the EOL’d products? I have a blog post I want to write about the lifecycle policy and how it has evolved, so stay tuned for that.
Message #2: Oracle’s Investment in MySQL
Ulin mentioned “Oracle’s investment in MySQL” a lot. Why? Well, in 2009 Oracle made a written 5-year commitment to MySQL. It is now 2013, and some folks have been wanting Oracle to make another promise. Frankly, I think it is ridiculous to ask a company to make a commitment in writing so far ahead, and nobody demands that of any other company. Oracle has doubled the number of MySQL engineers and tripled the number of MySQL QA staff, and has the largest team of MySQL developers of any company anywhere. Unfortunately we did not get exactly how many people that is…it is only a little bit impressive if you tripled the team from 1 person to 3 people, but more impressive if you tripled the team from 10 people to 30 people.
Let’s take a number we did get – the QA team now has 400 person-years of experience on it. Let’s say the QA team was 10 people before, and now it is tripled to 30 people. That means the average QA person has over 13 years’ experience in QA, which is about a year longer than my entire post-college IT career. If there are more engineers with less experience, that’s pretty impressive for the number of people working on finding and fixing bugs, and if there are fewer engineers, they have even more years of experience.
Hidden Message #3: Oracle has an open-ended commitment to MySQL
Oracle has MySQL trainings, events and tech tours on 6 continents (none in Antarctica, but plenty in at least 3 different cities throughout Africa – Nairobi, Johannesburg, Pretoria, at lesat that I’m aware of because I mention it on the podcast along with SkySQL, Percona, FromDual and Tungsten events). They have doubled the engineering staff and tripled the QA staff and are still hiring. In the past year they sponsored over 40 events, delivered over 70 talks at conferences, and of course they have a huge investment in MySQL Connect – just as Percona Live added a day in 2013, MySQL Connect is adding a day as well. With all that time and money invested in people and events, they are not going to stop working on MySQL any time soon.
(BTW if you missed it, MySQL Connect has a “super saver” registration before May 3rd, save 45%. Hard to believe it’s almost half price if you register now!)
Note that the hidden messages above are completely my interpretation, and represent nothing other than my opinion.
Ever wonder what Mozilla IT is or what we do?
We keep the lights on.
We’re like your landlord – when the roof blows off, the ceiling leaks or the outside needs repainting, we do it. But we have no control over the furniture. Sometimes we see tenants on their way in or out or we’re called in when they’re having a loud party but other than that, we stay out of your way.
In other words, “we make sure the house is in good shape but we can always give tips about the indoors.”
If you want to have a loud party, let us know! We love parties!
File under “What you don’t see might be more important than what you do see”
Last week Mozilla suffered a huge multi-day outage with our fiber provider between Northern California and Arizona. It took down both fiber connections.
Everything failed over to the backup VPN.
I bet no one even knew this!
The DevOpsDays series of events is, as the name implies, centred around the “devops” movement, and is intended as a way to introduce people to this style of IT workflow, project, and people management. Since this is not related to a programming language, this conference falls outside of the normal sorts of events that Mozilla generally finds itself involved in. I believe this to be a very good thing, and I am proud to have represented us both as a sponsor of the event, and as a European devops community member.
The audience was mostly French in composition with a not-insignificant number of attendees from both Francophone and non-Francophone European countries. Said audience was a healthy mix of developers, IT operations, and managers across a wide spectrum of company sizes, types, and even industries; frankly, I was impressed at how broad the composition was, and it was refreshing to see interest in the devops movement from such a wide group.
Though the event was held in Paris, all of the talks (with the exception of some of the ignites) were done in English. This was by design – a internally contentious decision that, in my opinion, ultimately proved itself to be the correct one. The open spaces during the afternoon were in a mix of English and French in order to ensure that everybody could participate equally. Concerning the open spaces, we weren’t sure if the format would work here in France, but they were a smash success! Everybody seemed to really enjoy the format as a platform for discussion, debate, and idea-generation. I’d wager that for many of the attendees, it was the first time they’d ever been exposed to such a thing, and my hope is that they can bring the format to others in the future.
Since devops is so new to France, the majority of the presentations themselves were entry-level, and thus not particularly interesting to me directly. That said, there were two presentations that really stood out (and would have held their own even at a more “advanced” event): “CustomerOps” by Alexis Lê-Quôc, and “Map & Territory” by Pierre-Yves Ritschard.
Alexis’ presentation on “CustomerOps” centred around the concept of providing customer support using engineering principles – and, indeed, delivered by engineers themselves. This really hit home for me because in Mozilla IT/Ops, we’re not only the people who build and provide technical infrastructure, but are also the people who provide direct support to the consumers of that infrastructure – a situation that is absolutely not a given in many other companies (i.e. the admins and the customer reps are not the same people). Alexis illustrated the importance of communication, and how to measure success (read: customer satisfaction) in meaningful ways.
Pierre-Yves’ presentation was based on a very interesting philosophical conjecture: that our mental model of the world is not the same as the reality said model attempts to describe. Put another way, a map isn’t actually land, it’s a representation of the territory it describes (hence the title). Therefore, the most valuable models are the ones that can describe reality in useful ways, and it’s in defining “usefulness” that the real effort must be made. In a more applicable sense his thesis was simple: identify your “key metrics” – the numbers which literally describe the success or failure of your business – and make sure you are collecting, analysing, and modelling them above all. Every other metric is either secondary or potentially uninteresting in the first place.
Personally, I spent a lot of time mingling with the attendees, talking about Mozilla, our projects, and our mission. Generally speaking, the first question was, “Can I have one of those Firefox stickers?”, but the second question was, “When can I get my hands on a FirefoxOS phone?” As usual, everybody wanted to see one, and (unfortunately) as usual, I didn’t have one to show them. The more events that I attend on behalf of Mozilla, the more I realise that continues to be a wasted opportunity to promote our most exciting new project. I’ll have to get my hands on one for upcoming events – but I digress.
Of course, since this was a devops-related event, people were also very curious about if and how Mozilla is implementing devops internally. The overarching theme of devops is communication, so this event was an excellent opportunity to talk about IT at Mozilla, and to promote not only our successes, but dig into our failures as well. This sort of interaction is vital in order to avoid stagnation.
In summary, it was a fine showing for our first Parisian event, and I am looking forward to the next edition. Hopefully I’ll see you there!
A glace into 2013 Q2
- We’re building a NOC.
- We’re starting a second data center construction phase.
- Building a second phase at Vantage/SCL3 to support a consolidation project (collapsing two data centers into SCL3) & building sufficient room for more Release Engineering test hardware.
- Check out the graph!
We didn’t sit still last week
- blog.mozilla.org performance & responsiveness improvements
Every Monday Mozilla holds its weekly Project Meeting.
For the past three weeks Mozilla IT has been sharing 60 Seconds with Mozilla IT. We’ll be cross-posting those notes here.
From April 8:
2013 Q1 Review, in 3 bullet points
- Rolled out Bugzilla upgrade from 4.0 to 4.2. And moved it to faster hardware. Shout out to Byron, David & Mark from the BMO team.
- Moved to a new office in Paris. Monumental amount of effort to have Internet, wifi and video conferencing up in a matter of days.
- Our product delivery CDN peaked at nearly 200Gbps. What’s a “jigga bit”? That’s like downloading a full length movie every second. Or watching 5600 Blu-ray movies AT THE SAME TIME. Or like downloading 5 DVDs every second.
We didn’t sit still last week
- Both the Web Tools and DBA teams helped Mozilla IT proactively upgrade our PostgreSQL infrastructure, in response to major security vulnerability release, starting with Socorro crash reporting systems
- Pushed out the first official release of the Firefox Health Report
Mozilla IT has data centers all over the world; in China, The Netherlands, Phoenix, Santa Clara and Mountain View.
Many of these locations are subject to Daylight Saving Time where the time shifts ahead an hour in the Spring and shifts back an hour in the Fall. This causes all sorts of havoc with computers and makes it very difficult to schedule meetings with such a large and geo-distributed team. Someone‘s always having to take a meeting in the middle of the night.
In Q2, in order to provide the best service for Mozillians, Mozilla IT will be moving our entire infrastructure to one location in the South Pacific to Pacific Republic of Kiribati (UTC+14), a small island nation in Oceania.
This move will take some coordination, but we have been working with the government and Telecom Services Kiribati Limited to make sure the transition is smooth. Says Derek Moore, Mozilla’s Data Center Operations Manager, “Kiribati’s tropical location will provide a large amount of sunlight necessary for Mozilla’s new solar powered data center.”
Other than the obvious benefits of consolidation, we chose Kiribati for two reasons:
- Kiribati does not do any time changes for Daylight Saving Time.
- Kiribati is at the highest time zone. It is the first time zone to see a new day. We feel that this represents Mozilla IT’s commitment to using new technologies to brighten everyone’s day.
This small island nation sees the first light of day and Mozilla IT will literally be on the bleeding edge. As I’ve said, “the web moves fast, and Mozilla needs to move fast too.”
We expect to have this consolidation completed by 2014 April 1.