During a recent Mozilla all-hands event Laura Thomson held a short presentation, titled “Working with IT”. Laura was the right person to give it and the feedback that we have gathered is that we need to help people understand how to work with IT, and help you all understand how our infrastructure works. Expect more brownbags and posts around this topic.
So, let’s start by talking about bugs.
Whether or not bugzilla is the right tool to track and manage IT projects and requests is up for debate. The benefit to using bugzilla is that it integrates with the rest of the project, since it is used for everything else at Mozilla. That said, let’s talk about how IT works with bugs and how you can help us when you file bugs:
Please do not assume tribal knowledge in bugs.
In the past 30 days, the IT Systems and Ops teams have grown by 5 new sysadmins. This is great, and they are all ramping up quickly. While we are throwing out numbers, we have seen 119 new bugs added to the Server Operations component in the past 7 days. We want our new guys to help out in these bugs as quickly as they can. When submitting a bug, please assume as little tribal knowledge as possible on the other side. For instance, asking for a setting change in a production site without telling us which site you work on delays the bug while someone either asks for clarification or has to ask the team what you mean. These are minor delays of course, but when this happens multiple times a day this becomes very inefficient. If you have a doc to link to giving background on the request you are making, please do it. If you know the system you are asking for a change on, please make note of it.
Where does my bug go?
The IT team is growing quickly, as is the need to sort our bugs into components lest we spin our wheels all working from one component. Here is the layout of our components for bugs coming from you as it stands today (note the change in Web Operations):
- Server Operations: Web Operations – this is where all web related bugs should go. This is new, and is modified from the old “web content push” component to encompass web server problems, new web projects, and any general request regarding the serving of our websites.
- Server Operations: Desktop Issues – this is where the desktop team currently works. Laptop issues, software license requests, and help with the office environment should all go here.
- Server Operations: RelEng – Any issues regarding the release engineering build systems (aka “the build network”) should go here.
- Server Operations: Netops – Network requests and issues should be filed here
- Server Operations: Labs – Mozilla Labs IT requests go in here
- Server Operations: ACL Request – Firewall requests for Netops
- Server Operations – Everything else that did not fall into one of the above.
Priority and escalation
The default priority for our bugs is “normal”. We will get to these as soon as we can, and by nature of your request we assume that you want them done as soon as possible. If this is a request that does not fall under that assumption and you want it to fall under the “nice to have someday” category, mark it as an enhancement. Anything higher than normal demands attention soon. Our SLA for addressing bugs higher than normal is such:
- Major – 24 hours
- Critical – 8 hours
- Blocker – immediately
These timers work around the clock, and if a bug sits unaddressed beyond those times, our oncall is paged. Blocker IT bugs will page oncall immediately. We can not guarantee that the request will be resolved within this time (ie: if you file a critical bug for a new cluster of servers, it will take us time to procure them first), but we will have admins aware of it and start working on it. In addition, we have our own internal prioritization of issues that come in. If a critical bug in a dev site comes in, that may have to wait for work that we are doing on a production site.
That was a lot to read..
And if you are still with me, thanks for taking the time to understand how we work in bugzilla. By getting bugs filed more efficiently we can spend less of our time refining the bugs and more time fixing them.