26
Apr 12

Snappy, April 26

Notes from today’s meeting are here.

No major snappy fixes landed this week. However, if you look in the notes, there are quite a few projects going through the review cycle.

Personally, I’m most excited by progress in getting rid of the setTimeout on tab click(bug 743877). Neil posted a diagnosis of why we need setTimeout while switching tabs. Tim followed up with a patch to avoid the setTimeout for non-focus bits.
On the subject of SetTimeouts: we devised a plan for managing SetTimeout overhead in background tabs. This will involve breaking up our global event queue into a global queue + smaller per-page queues, bug 715376. This will not be a pleasant task, but Nathan aims to have a proof of concept ready next week. With this infrastructure we should be able start prioritizing which events we handle and punish misbehaving tabs.

The graphics team is wrapping up the big Android push, freeing up cycles for elsewhere. Bas is back to looking at slowdowns due to hw acceleration (bug 721273). Bas is also looking into changing our chrome CSS to be less expensive to paint.

Ehsan is working with Paul to change firefox themes to not be horrible performance hogs.

Update: A very significant snappy fix landed this week as part of memshrink. It should significantly reduce memory usage and thus cycle-collector pauses, etc.

Update #2: I missed another very cool Snappy fix: bug 729133. This is based on revising old assumptions about cache being faster than disk. We learned from telemetry data that a significant portion of disk cache requests are processed slower than they would if we just went straight to network. Firefox now hedges bets and warms up a TCP connection while checking cache. For details see Patricks’s blog.


19
Apr 12

Snappy, April 19

At 15 minutes, this might’ve been our shortest meeting yet (notes).

Most of the work happened in frontend stuff. Most notable improvements are reduced screenshot overhead (by taking less screenshots, bug 744152) and a brand new IE migrator (a step towards fully async places API, bug 710895).

Work resumed on making peptest return more consistent results.

The necko team decided to take a more involved approach to solve our cache locking(bug 722034) which means the fix will land later than we originally hoped for.

My little investigation into setTimeout overhead exposed more overhead than I expected. After our regularly scheduled snappy meeting, we had a follow up meeting to spec out how to change our event handling to cope with this (bug 715376). Our best people are on this :) I asked for someone to prototype an extension to suspend background tab activity, sounds like  Wladimir of adblockplus might lend a hand.


18
Apr 12

Web 2.0: A Collection of SetTimeouts

Earlier I blogged about terrible UI responsiveness resulting from a poorly placed setTimeout. I’ve long suspected that SetTimeouts are to blame for everything. Now with help of bz and sfink I have proof.

Turns out webpages like to keep users entertained and spin setTimeout loops to poll the servers to synchronize news tickers, social networking shoe-ins, collaborative editing, etc. Most pages do this regardless of whether they are a background or a foreground tab. Firefox tries to mitigate this by not allowing background tabs to schedule setTimeouts < 2seconds. Turns out this is a pretty weak defense.

In my personal browsing: etherpad, twitter, zimbra burn through cpu cycles. See this bug comment for an example of setTimeout terrorism with less than 10 tabs. In Firefox these setTimeouts cause significant UI lag, but they will also eat your battery life, overheat your laptop, etc. I filed bug 715376 to add functionality to cope with this.  The plan is to prioritize foreground tab activity and do exponential setTimeout decay on abusive background tabs. We basically have to write something similar to an OS scheduler. Ideally we’d also follow that up with unloading idle tabs, ie bug 675539.

If you are curious about what tabs are abusing your browser my diagnostic builds will be available on try in a few hours. Install a modified version of about:telemetry to see the report.

What should well-behaved web apps do?

You can detect when your tab becomes inactive via window.onblur, then throttle or disable your page activity. I know using focus is suboptional for this. There is also a vendor-prefixed visibility api (thanks!).

Now you know why your browser keeps your CPU wide awake :)

 

Update:

Wrote this post in a rush on the way out, thanks for the visibility links. I am aware that some webpages need to do work in the background. However that work should be as minimal as possible, this is often not the case.

Project Idea:

Would be nice if someone wrote an extension that goes through tabs and nukes any outstanding setTimeouts/XHRs/etc.


16
Apr 12

Snappy, April 12

Notes are here. Time was spent on administrative issues like overlap/conflict between Snappy and Kilimanjaro, plans for security review for Vladan’s symbol server and bug triage.

I’m usually really bad at noticing UI lag, but now my mind is focused on tab switching and I notice the lag every time click on tabs (keyboard switching is unaffected). We discussed how to fix tab lag I blogged about earlier. It’s hard because there is some funny interaction between focus and drag & drop and us switching tabs on mousedown. Neil will look into addressing focus issues this week.

Tim is addressing thumbnail capture slowness in bugs: 744388, 742594, 726347.

Paolo is busy switching code away from synchronous places APIs: 728168, 728174, 728142 , 739213.

The networking team is expected to make a decision this week on how to fix the big networking cache lock. See Nick’s notes on options presented.

Thanks for feedback on preferred snappy update format: we’ll stick with cherries.


10
Apr 12

ARGH at our unresponsive tab strip: setTimeout(foo, 0) can be very harmful

As I mentioned before, I’ve been a manager for a year. I’ve focused solely on paper pushing for the past 6 months. Even though I love programming, it’s surprisingly enjoyable to merely tell others what to do :) However, as all technical managers find out, eventually one gets very bored without doing something technical. So here it goes…

Why The Frick Are My Tabs So Damn Laggy?

I noticed that tab switching has become unbearable lately. I finally filed a bug and with Gavin’s help investigated the brokenness. Turned out that we awesomely setTimeout(do_stuff, 0) in the mousedown handler (bug 743877). This means that I click on my tab, only to suggest to the browser to schedule an event to be handled some time in the future. The browser will also go ahead and flush the existing event queue before getting to handling my event. I measured lag anywhere from 30 to 160+milliseconds before the browser even started handling my click.

This code is pretty ancient, why has tab switching gotten slow within the last few months? Turns out we now take a tab thumbnail on every tabselect, which takes >100ms on my machine… We then carefully use async IO to store that image in our network disk cache. Unfortunately our cache uses locks in creative ways effectively making that code path synchronous (723577, 723582, 722033, 722034). See 742594 for thumbnail jank. Naturally, all of the above + cycle collection + garbage collection + etc gets scheduled right in the middle of handling tab switching :)

Thanks to a strategically misplaced setTimeout, the browser currently can spend a very long time not responding to user input (seconds sometimes). I bet we have a quite a few places that “solve” problems with setTimeouts like above.

There is hope

See bug 743069 for some proof of concept patches on making tab switching more responsive. As far as I can see, there is no technical reason for us to not to have a buttery-smooth tab strip. Next steps are figuring out why XUL draws slowly, throttling other browser activity while interacting with tab strips, etc.


05
Apr 12

Snappy, April 5: Change in meeting format

Lawrence, thanks for posting snappy updates while I was on leave.

Snappy meetings have gotten a bit dull lately. There hasn’t been much arguing about what needs fixing, or much discussion, everybody gave status updates and was in agreement. Going forward we will do less status updating, except for major developments to save some energy for discussion.

Discussion centered around personas…err themes murdering firefox performance and goals for q2. Ehsan will post details how to reimplement themes. I made a separate post about goals.

Vlad raised a question on whether we should proceed with cancellable SQL queries given SQLite limitations. SQLite can only cancel all outstanding queries, which requires us to track carefully what requests are outstanding and/or do some SQLite feature development. Consensus was that we should proceed to avoid embarrassing situations like reading the places database backwards during shutdown.

Lawrence also revamped the meeting notes to be more readable.  As an experiment I’m going to paste them in here. Let me know if you prefer my cherry-picking summaries from before.

Snappy Apr. 5 Minutes

Actions

No actions.

Incoming

  • hardware acceleration: i disabled hwa on all macs in my family, and Firefox has been noticeably snappier across multiple types of macs. related bugs: bug 600763, bug 721273, bug 721892. should have someone from perf team or gfx dig in and at least confirm.
    • Taras asked to put on gfx q2 goals

Projects

Persona slowness(ehsan?)

Results from the past week
  • issues discovered with animated personas and those that heavily use svg and css
  • Ehsan to summarize the issue in a blog/wiki
  • look at moving image decoding off of the main thread

Mainthread+Slow SQL (gavin/taras/vlad)

Results from the past week
  • Worked on cancellable sql (bug 722243), have to figure out if benefit from cancellable queries justifies additional locking
Todo this week
  • Ask SQLite guys about fine-grained cancels

Better DOM event/task scheduling – jst (telemetry)

Results from the past week
  • starting to work on slowing down parser in background tabs

Super-slow-startup investigations – vlad, taras

Results from the past week
  • Received batch of slow startup data from March

Startup optimizations – bbondy

Results from the past week
  • bbondy: Bug 692255 – WIll have implemented super review comments for prefetch

Front-end – Dietrich/bbondy

Results from the past week
Todo this week
  • telemetry for home tab vs session restored
  • telemetry for # of tabs restored
  • taras: investigate if startup cache/omnijar is still of benefit

Profiler – jrmuizel/BenWa/Ehsan (and more)

Todo this week
  • There seems to be an issue with symbolication with latest SPS extension version

Nondestructive chromehang – vlad

Results from the past week
  • Co-ordinate with Softronics QA people & Moz privacy review
  • bug 742008: Nightly profiling updates consistently failing
Todo this week
  • Integrate about-telemetry into Firefox as a bundled addon
  • Add a pass through mode so local symbolication server can pass symbolication requests to remote server (e.g. local Firefox symbols + remote Windows system symbols)

Peptest – mcote

Results from the past week
  • revision numbers, with links, now in peptest graphs as of March 30
  • results keep coming in!

GC pause reduction – billm

Results from the past week
  • bug 641025: incremental GC – disabled due to leaks
  • investigating leaks due to incremental GC
  • landed bug 716142 (allow multi-compartmental GCs), which enables:
  • worked on bug 739899, to keep compartment creation from stopping incremental GC
  • Multi compartment GCs should also enable scheduling smaller chunks of GC and CC.

CC pause reduction – smaug, mccr8 (meta bug 698919)

Results from the past week
  • smaug worked on trying to reduce the impact of leaked documents on the CC
  • mccr8 worked on bug 653191 (collapse SCCs of JS in CC graph)
  • remove more stuff from the CC graph (bug 740185)
  • QA is working on plan to test for leaked documents

 

bug710935 – measure lag in handling user input (needs owner)

  • bbondy: I’ll probably be starting on this next week, but if you have someone else please feel more than free to take :)

05
Apr 12

Snappy, Apr 5: Snappy Goals for Q2

It is time to set goals for the next 3 months at Mozilla. A lot of them should be Snappy-related.

As I mentioned before, we have made a lot of snappy progress lately. We identified a lot of problematic areas, fixed some of them and the end is in sight for others. It is extremely important that we maintain the current Snappy momentum, such that we can wrap up Snappy this year and move on to scaling Firefox on multiple cores, etc.

The following platform goals have been proposed:

  • Graphics: More rendering off main thread (this work is split between graphics/layout), Fix GFX acceleration lag (not yet on wiki)
  • Layout: More rendering off main thread (this work is split between graphics/layout), Invalidation via DisplayList Analysis
  • Video: Off main thread rendering (not yet on wiki)
  • DOM: Prevent [to a reasonable extent] background tabs from starving the main thread, Reduce CC pauses significantly when there are cycles to collect
  • Perf Team: Async local storage via blocking pageload, Combine IndexedDB/LocalStorage quotas to allow indexeddb to remove prompt, provide js file api (in workers) for all supported platforms, Reorder xul.dll on windows to speed up startup, continue exit(0) progress
  • Networking: Resolve listed high priority cache locking/async issues
  • Firefox: Fix top three Snappy offenders – lightweight themes, add-on manager, main thread SQL (under discussion, not on wiki)

We do a lot of work outside of our goal process, so above goals represent only the big ticket items that we’d like to see accomplished in the near future. There will be other snappy work going on too.