14
Jun 12

Snappy, June 14th: Telemetry Investigations

There are no news from the Firefox frontend team this week.

Adventures in Measuring Changes

Necko team spent this week investigating why the recent big cache fix was not showing as a win in telemetry.

We were on a verge of a big backout when Saptashi Guha’s analysis in bug 762576 suggested that we might actually be winning. It’s frustrating to have data point us in different directions. However, it is better to try to make sense of data than have no data at all as was the case only a year ago. I’ll have more on this next week.

William McCloskey landed fix to turn on incremental GC for real (bug 761739). This might fix the mysterious recent user-responsiveness regression spotted by telemetry (bug 761722). He  also landed another GC speed up in 743396.

Mark Cote met with metrics analysts to discuss reporting peptest results robustly. The goal is avoid noise in reporting, so responsiveness regressions are acted upon

Interactivity Profiler

Benoit Girrard added added badges to mark known stacks in the profiler, see his blog post. A few weeks ago Vladan taught the symbolication server to serve data from local .pdb files, allowing developers to use Benoit’s profiler in own builds. Mike Conley added incomplete Thunderbird support to the profiler.

 


11
Jun 12

Snappy, June 7

Notes.

Justin’s FUEL fix will help add-ons avoid leaks and shutdown hangs: bug 750454.

Jared plans to start landing Australis tab strip (738491) on UX branch this week. Australis is our new, faster UI theme.

We landed a cache locking fix recently (722034), but telemetry is now showing a regression (761736), so this will likely be backed out and reworked.

Vladan blogged about first results from our non-destructive chromehang. Last year we briefly caused our nightly to crash if it hung for over 30seconds, which got us a lot of useful data (and some of the initial snappy bugs). This piggybacked on our crash-handling infrastructure so it was a very effective experiment (a bit brutal though). Vladan spent time this year working on plumbing to get the same sort of data non-destructively. As a result we are looking to turn on frame pointers in nightly builds and dial down hang detection to 5 seconds (bug 763124).


04
Jun 12

Snappy, May 31st – Less lag

On Friday, the necko team finally landed a fix that makes cache less likely to freeze the UI thread during reads: bug 722034. Cache writes, other less common cache use-cases remain problematic (tracked by bug 717761). Poor cache/main-thread interactions are one of the main causes of UI lag tracked by the Snappy project, so this is very exciting. Barring the need to backout, this fix will appear in Firefox 15.

Help Wanted: The necko team is looking for some help to determine the optimal disk cache size, please see Nick’s post. We need users to install an extension and submit detailed stats on our cache lifecycle.

There are various Firefox frontend fixes in progress: improving session restore (working towards 669603, 669034), FUEL (bug 750454),  search service (bug 722332) and the new theme (bug 732583). I will blog about these in more detail as they land.

Bill landed turned on incremental GC again. Hopefully it will stay on in Firefox 15.

Andrew is making progress on reducing CC pauses while closing tabs: bug 754495.

Brian has instrumented our event loop to measure the extent of Firefox lag when responding to user events, bug 759449. This is different than measuring general event-loop lag in that it focuses on lag that the user would actually notice. Look for  the EVENTLOOP_UI_LAG_EXP_MS histogram in our telemetry dashboard (yes, we are the only browser vendor to make this sort of data public). This should help us track progress as we tweak heuristics to delay background processing during user interaction (eg bug 712478).

Brian also landed a way to bypass the windows prefetch service via our privileged silent update service, see bug 692255. In my testing prefetch is likely to prefetch too many files, slowing down startup for complex apps like Firefox. Hopefully we can do better with our own prefetch.


29
May 12

Snappy, May 24 – meetingless

Frank and Jared are aiming to have the Australis theme up for review next week in bug 732583. There are no computed borders or gradients in the redesign, so it will be faster.  The current theme generates new borders/gradients on every tab interaction, which is very inefficient. Bas is working on fixing our graphics backend to render borders/gradients more efficiently in bug 750871.

Wlad spotted some unintended bloat in the addon database in bug 752868. Blair fixed it, this should speed up startup and other addon manager interactions.

Bill did some further IGC fixes: bug 757483, bug 754588, bug 756732, bug 731423.

We started a new project to let Firefox diagnose common Windows/etc misconfiguration issues that severely impact Firefox performance. Our new intern, Nicholas, is working on this in bug 684646. The immediate plan is to release an addon that detects when Firefox startup is unusually slow, checks for known Windows issues and pops up a link to a Mozilla support article on how to fix the problem. If this turns out to be successful, we’ll integrate this functionality into Firefox.

The gecko profiler is now ready for general consumption. See Benoit’s announcement. This will allow users running  the profiling variant of Firefox nightly builds to capture/report performance problems in a way that developers can act on. I suspect that Benoit will blog about this.

Update: I originally linked the wrong bug for windows misconfiguration detection.


21
May 12

Snappy, May 17 – Physical Room Edition

Misc

This was an unusual meeting for the Snappy project: everybody was in the same physical room (though someone dialed in 5min before the end). I love the distributed nature of Mozilla, but it’s nice to have everybody in the same room for a change.

Vlad did some super-slow-startup investigation. We have even more evidence that loading pages before the UI is up is a bad idea: bug 715402.

Jet sped up browser chrome by converting SVG masks to clip-paths: bug 752918. With a name like that, how can he not work on performance bugs :)

The necko team is looking for feedback on test builds that reduce cache-related pauses on the main thread. If you suffer from cache-related lag, give these a spin: bug 722034.

Benoit made our profiling builds useful on Linux, Android (in addition to Windows, Mac64). Work is happening on extending our debug protocol with profiling abilities. Unfortunately I do not have bug #s to link to. The Windows symbol server is almost done with security review so it can be exposed to the web. For more profiler details see bugs: 753588, 751355, 751355, 751034, 751779.

Rafael is getting close to calling exit(0) in bug 662444. Much work remains, it’ll be the most significant change since we embarked on this project almost a year ago. Our current application shutdown situation is not pretty.

GC/CC

Bill landed incremental GC again, it promptly bounced out: bug 735099.

We now do compartmental GC more often: bug 716014.

Andrew is working on reducing CC overhead (by 80% in his benchmark) when closing tabs: bug 754495.

LagBlock Plus

I’ve been running Wlad’s extension for over a week now. The browser is so much more pleasant now. Background tabs used to make text-entry a painful process. Can’t wait until we can approach a similar level of responsiveness by scheduling background tab events more intelligently.

I feel that letting tabs run out of control is a serious misfeature in the current web ‘architecture’. Modern OSes require background apps to suspend (ie Android, iOS). It is about time that browsers forced a similar behavior: ie bug 675539. Web developers should be given a way to request to run background tasks and users should be able to veto that.


21
May 12

Snappy Workweek

Workweek

The perf team + Lawrence held a snappy workweek at the Mozilla HQ last week. We spent most of the week meeting with various people involved with the Snappy project. I expect lots of good things to happen in the near future.

The most immediate outcome of all of these meetings was the public unveiling of our telemetry dashboards. See Lawrence’s post on how to log in with your persona account. We started work on telemetry about a year ago, felt great to finally reach this milestone. Lots of work remains on UI polish, data validation, etc. Telemetry is our primary mechanism for gathering snappy performance data. Every telemetry infrastructure improvement leads to better snappy decisions.

Additionally we discussed Australis theme work (how it should be faster than existing theme), dom event scheduling, main thread io work, feasibility of switching to an FTS backend for places, etc.

Vlad wrote some good posts on analyzing startup and other data exposed by about:telemetry. Planet Mozilla could use more technical posts like that.

 

 


14
May 12

Snappy, May 10: Suspending activity in background tabs

Tim landed a fix to avoid setTimeout()s when handling tab clicks: bug 743877. This should significantly improve our tab strip responsiveness.

Incremental GC is making progress towards being turned on by default again: bugs 750959, 752098.

There was also progress on cancellable SQL (bug 722243). This should result in faster shutdown.

Progress was made towards fixing cache locking, bug 722034.

Lawrence posted a summary of snappy work that went into Firefox 13.

LagBlock Plus

Some of the lag in Firefox is caused by background tabs processing timeouts willy-nilly. We are working on teaching Firefox to cope with overactive background tabs in bug 715376. The plan is to allow Firefox to throttle/group background events, especially in tabs that are CPU hogs.

To help us along, the author of Adblock Plus released an experimental addon that freezes activity in background tabs. Since this addon halts all background tab activity, it is a useful gauge of baseline performance that we’ll try to get asymptotically close to. It is also helpful for isolating responsiveness issues that are not caused by background tabs.

 

 


03
May 12

Snappy, May 3rd: Faint hope of handling mousedown events without a setTimeout

Meeting notes.

Memshink had a good idea to switch to bi-weekly meetings. We are going to try the same for Snappy. My plan is to still solicit status updates and blog weekly, but only meet in person every two weeks. The next meeting will be on May 17.

Borders/Gradients

Turns out that most of the painting overhead in accelerated versions of Firefox is spent rendering borders and gradients. I blogged a little about this earlier. It’s a combination us not caching gradients and being overly picky about rendering border corners perfectly (ie to spec). Our chrome renders particularly slowly because as our chrome CSS changed (after implementing d2d accel and optimizing for exists codepaths), we started hitting more slow paths in the border code. We need telemetry to notice when things are rendering slower than expected.

According to Bas we need to enable Azure for content and then start implementing respective caches. We should get significant speedups within 3-4 weeks, but to get close to the baseline performance of the no-borders/gradients build will take 3-4months. In the meantime we should look into simplifying our chrome to not feature as much expensive CSS.

Longer term, Frank will look into reimplementing the tab bar in pure HTML  instead of XUL to maximize responsiveness.

CC/GC Pauses

Incremental GC should turned back on soon (bugs 750424, 750416). Olli relanded bug 747675 which should reduce CC times somewhat.

Kyle’s big memory leak fix from last week turned out to occasionally cause leaks where there were none before: see bug 751466.

Frontend

Tim spent the week in a seemingly infinite r?/r- cycle attempting to prove that one handle clicks on tabs without setTimeout, bug 743877. Tim also moved thumbnail storage away from network cache. This reduced cache contention (and browser freezes), bug 744388. Paulo continued nuking sync favicon api usage, bug 728168.


01
May 12

Pushing the borders [out] of Firefox performance

As I mentioned before, we are back to investigating some gfx deceleration issues. Preliminary investigation shows that our border+gradient code is inefficient and since Firefox uses these features heavily, we get some epic slowdowns when tab switching.

To test out this theory Bas put up a test build that simply does not draw gradients or borders. See bug 75087  for a screenshot demonstrating drastic reduction in browser attractiveness. There is also a test build for people suffering from slow drawing to try out.


26
Apr 12

Snappy, April 26

Notes from today’s meeting are here.

No major snappy fixes landed this week. However, if you look in the notes, there are quite a few projects going through the review cycle.

Personally, I’m most excited by progress in getting rid of the setTimeout on tab click(bug 743877). Neil posted a diagnosis of why we need setTimeout while switching tabs. Tim followed up with a patch to avoid the setTimeout for non-focus bits.
On the subject of SetTimeouts: we devised a plan for managing SetTimeout overhead in background tabs. This will involve breaking up our global event queue into a global queue + smaller per-page queues, bug 715376. This will not be a pleasant task, but Nathan aims to have a proof of concept ready next week. With this infrastructure we should be able start prioritizing which events we handle and punish misbehaving tabs.

The graphics team is wrapping up the big Android push, freeing up cycles for elsewhere. Bas is back to looking at slowdowns due to hw acceleration (bug 721273). Bas is also looking into changing our chrome CSS to be less expensive to paint.

Ehsan is working with Paul to change firefox themes to not be horrible performance hogs.

Update: A very significant snappy fix landed this week as part of memshrink. It should significantly reduce memory usage and thus cycle-collector pauses, etc.

Update #2: I missed another very cool Snappy fix: bug 729133. This is based on revising old assumptions about cache being faster than disk. We learned from telemetry data that a significant portion of disk cache requests are processed slower than they would if we just went straight to network. Firefox now hedges bets and warms up a TCP connection while checking cache. For details see Patricks’s blog.