Dec 11

Growing reviewers

One of the challenges of a growing organization is that people become managers and have less time for coding. A scary proportion of module owners are managers now.

We were discussing this with Dietrich and he came up with a really simple solution: module owner’s entire team should be able to review patches for that module. Obviously new people can’t review every little detail, for those they can pass the buck to their manager(and learn in the process). I really like the new Firefox review policy of having a large set of candidates for reviews who have an option of passing the patch along.

Perhaps bugzilla can change r?:taras queries into r?:taras+his+team and do this automagically.

Dec 11

Snappy Dec 22

Work is continuing from last week: jank profiler, DOM storage fixes, font enumeration and SQL telemetry analysis. This was the last Snappy meeting of the year.

We plan to hold a perf + snappy hack week at hackerspace.be Jan 31 – Feb 3, followed by attending FOSDEM.

I’m off until January 3rd,  see #perf for taras-substitutes in meantime.

Dec 11

FOSDEM Anyone?

Team Perf and a significant number of other Mozillians will be attending FOSDEM this year. My team will be giving talks on android linkers, IO, toolchain stuff, other TBD stuff. This will be my first FOSDEM, I’m really excited about it. There are a lot of European open sourcers to meet that don’t make it to this side of the pond much.

Team perf & other snappy people (10+ people) plan to spend the week before FOSDEM hacking on Snappy and other perf projects. Does anyone have suggestions on a cool hackerspace (I sent an email to hackerspaces.be folks) or a coworking facility that could host us in Brussels?

Update: Looks like hackerspaces.be will be hosting us. Looking forward to checking out HSBXL-NG 🙂

Dec 11

Snappy update for Dec 15

Meeting notes.

Not much exciting new stuff happened this week. There was some progress on responsiveness profiler, snappy scrolling, dom storage. We are starting investigations into reporting per-tab overhead, interruptable JavaScript, etc

Slow (>100ms) SQL telemetry landed in Thursday’s nightly. Here are some preliminary results:

These files of the format: sql, count, average time per query.

Dec 11

Slow SQL tracking

Vladan is working on adding slow SQL tracking to Telemetry, part of that has already landed. He recently exposed this information via the about:telemetry addon. We are working out a few remaining issues so we can send this data to our telemetry servers. In the meantime install the addon in your nightly builds (if you are upgrading from an older version, you need to restart Firefox) and check for yourself whether SQL is to blame for pauses in Firefox.

Dec 11

Snappy summary for Dec 8

Our last meeting was last Thursday, at 11am PST (meeting notes). This blog post is late because of a combination of email disasters and travel.

Cheng from the SUMO team mined SUMO inputs to see what our users are complaining most about. There were complaints about things were slow, unresponsive, frozen, etc. See meeting notes the complete blurb.

The networking team has identified issues leading to slow shutdown and startup.

Most of the the responsiveness profiler landed.

UX team provided an extensive list of Firefox features that needed tweaking, see meeting notes for details.  We are working on improving tab interactions and scrolling.

I’ll post a beefier summary of plans/accomplishments next week.

Dec 11

24-hour reviews

I would like to see Firefox developers switch to 24hour review turn-around times. Note that in my definition review turn-around means any of the following:

  • r+/r-
  • unset/reassign r? to someone else

It is ridiculous in our recent faster release cycle if a patch takes half (or more) of the cycle loitering in the review queue. I believe that a shorter review cycle is the simplest way to accelerate Firefox evolution.

I view fast review times as a matter of respect. Posting a patch usually requires a significant time/effort commitment, reviewers should act appropriately. There is no bigger buzzkill than having your work pushed back to the bottom of somebody’s TODO list like some annoying chore.

As far as I can tell there are 3 main reasons* that lead to long review times:

1) People like gavin, bz, dbaron having disproportionally high review loads. We need a process to hand-off patches to other reviewers. High-load people shouldn’t shy away from passing on the r? to someone else when possible.
2) Bugzilla-fobic people (like myself) loosing track of bugzilla r? requests due to not having bugzilla whines setup. Bugzilla whines should be enabled by default.
3) Bad review habits.  I met a number of Mozilla developers that like to batch their reviews up and then do them all on a single weekday. Please stop, you are killing all kinds of coding momentum/fun/etc.Lets make it our policy to set aside time every day to clear the review queue.

Clearly people with existing backlogs will take a while to catch up, but most MoCo employees should be capable of this.
I have yet to hear a good reason against doing daily reviews.

It has been a few months since I proposed this on dev.platform.  I have tried to live by the 24hour rule, I think a few others tried this too. I find that morning bugzilla r? whines work best for me. I still occasionally loose track of a patch for a few days, but nobody is perfect. I think people appreciate fast reviews, but nobody thanked me yet.

Dec 6 Update: My goal is to have *some* response within 24hours with an ETA for next followup in the worst case.

Dec 11

Introducing Project Snappy

Two weeks ago I started project Snappy. The purpose of the meeting is to help us focus on eradicating jarring pauses in Firefox.

Today we had our second meeting (notes).  A surprising amount of work has happened between the first and second meeting:

  • Chromehang was briefly turned on. This converted browser stalls of >30seconds into crashes. This showed that a number of issues are worse than previously assumed
  • We are about to start tracking slow SQL queries via telemetry
  • Even though IndexedDB is the new hotness, existing websites use the evil old DOM Storage API. This API is not asynchronous and degrades browser performance. The workaround is to tell the backend to use async IO.

Just like MemShrink, Snappy bugs fall into three categories:

  • P1: Should be fixed ASAP
  • P2: Should be fixed as soon as developers have cycles for it
  • P3: Everything else

Multiple people have asked whether Snappy is appropriate for bugs caught by Chromehang: ie should we focus on one-off long delays (ie font enumeration) or small delays that happen frequently (ie tab animations). After reflecting on this I decided that UI jank can be thought of as a risk of frustrating the user. Risk has a scientific definition: severity of event multiplied by probability.  Thus a long occasional pause during browsing is equivalent to a frequent short pause.

For next week we plan to wrap up a profiler, come up with a fix for cache io on startup/shutdown, look into submitting hang stacks in a less brutal way.