Blocking calls into the Flash plugin can temporarily hang Firefox. This is a problem because sometimes the user would be happy to kill the plugin to access their webpage and at other times it’s the only way to get certain flash apps/games to load. If you suffer from flash-related hangs see Aaron’s blog post for some builds to try. He is working a new feature to provide an option to kill hanging flash instances.
I joined our GFX+Layout teams for a workweek in Vancouver. Since profiling is most effective on slow machines, I brought along my trusty Acer Aspire 722(slow 1.3ghz CPU+ fast GPU) as my primary laptop. This hardware is great because the combination of a weak CPU + decent GPU means that if we accelerate things right the browser can perform quite well and if we don’t, things get really slow. (analogous situation exists when fast CPUs are matched with slow GPUs).
In the beginning of the week I quickly demoed menu lag, slow gmail tab switching(811472). Later in the week we looked at problematic Facebook tab switch times (811474), Australis(see Matt’s post) performance. By the end of the week tab switching improved by over 2x for both facebook and gmail. I don’t have exact figures because while we can measure general tab switch trends via telemetry, there isn’t a convenient way to do it on individual browsers yet. Help wanted: would be great if someone could do up a barebone addon to monitor tab switching in bug 812381, we’ll fill in the rest.
Matt made sure that we no longer draw layers with opacity of 0 in bug 811831. Turns rendering lots of invisible text can be expensive.
Workweeks are a more about communication than getting code landed, so it is impressive that Jeff, Matt and their reviewers managed to diagnose, fix, review, land such significant optimizations in a couple of days. My laptop of pain feels much faster already.
In the coming weeks expect to see smoother tab switching, smoother animations, lower profiling overhead as we work through issues discussed during the workweek.
Our median startup performance (as measured by SIMPLE_MEASURES_FIRST_PAINT) improved between 20%-25% at the end of Firefox 18 cycle (~Oct 26). Strangely most of the speedup seems to have come from a 50% speedup in library loading (measured by SIMPLE_MEASURES_MAIN).
We spent a lot of time focusing on tab smoothness recently. We still have a long way to go, but I checked telemetry data and the improvements are staggering. In the picture below I’m comparing tab closing animation(FX_TAB_ANIM_CLOSE) between 17, 18, 19. In the picture below there should be 0 entries to the right of 154(that’s our problematic performance tail). In 2.5 months, we went from having almost 20% of our tab animations taking > 400ms to complete to ~3%.
In addition to median perf improving, tab animations are now much less likely to vary in duration, etc. This required fixes in layout, gfx, frontend code. It’s really great to see a cross-team effort producing tangible results. It’s too bad that our analysis infrastructure makes it so hard to pinpoint specific changes that contributed most to an improvement like this.
Dão Gottwald landed bug 756313 which is similarly to his work in last Snappy update postphones doing content work until Firefox chrome is painted.
Benoit Girard changed the profiler so it now updates the url as the treeview is being navigated. This makes it much easier to discuss what we are seeing in the profile.
Thanks for the feedback on blogging platform alternatives. I turned on a better captcha plugin to deflect spam. A lot less spam gets through now (but I think it’s also preventing legitimate users from getting through). If you are having trouble commenting, use twitter for now.
I thought our frontend optimization people did not have spare cycles for snappy UI fixes due to other important projects atm, but they proved me wrong this week.
Jared Wein landed bug 804968 which fixes jank where our awesomebar popup would appear then disappear while typing in the location bar. We were flushing layout for the top and bottom result on each adjustment to the awesomebar results, those flushes weren’t necessary for each time, they are now skipped after the first pass in the browser session.
Profiler-assisted Bug Reporting
I looked at bug 642257 and gave up figuring out what causes the problem because I could not reproduce it. I asked the reporter to try to record a profile of the problem with the gecko profiler. Within 2.5 hours of the profile being posted in the bug, Timothy Nikkel identified the problem and posted a patch for it.
I’m very excited about this because the reporter has never used a profiler and yet on the first try helped fix a hard to reproduce bug. Thanks to a dedicated bug reporter, keen layout hackers and our new profiling infrastructure Flash in background tabs will no longer slow down our layout calculations. For many types of bugs identifying the problem is the hardest part, this is very promising.
Moving Blogs Soon
I will be moving to a new blog location as soon as I decide on a better blog setup. I’ve been irritated by WordPress since I started at Mozilla in 2006. The volume of comment spam has increased exponentially this year. After 6 years of suffering a terrible UI, spam, slowness, lossyness, I’m ready to move on to a blogging service elsewhere. If you have any suggestions for blog providers, ping me on twitter as I likely wont see your comment in the mountain of spam.
Jeff Muizelaar may not have cut tab switch times in half in my last update. The overhead moved to a later part of the process that we were not measuring before the change landed. We’ll be able to tell the magnitude of the tab-switch improvement was by landing bug 800031 on Aurora.
Matt Woodrow reduced tab-close animation jank in bug 750417.
This was an interesting week. On one hand all performance aspects of tab strip work were suspended until end of year on the frontend team, on the other impressive gains were made on platform side of things.
Jeff Muizelaar cut tab switch times in half
Data came in on Jeff’s optimization I mentioned last week: bug 792199. This halved our median tab switch time. Since this also landed at the end of our 18 cycle, a comparison between 18/19 Nightlies gives us an idea of how this changed our tab switching times overall. Note the actual difference would be greater since both 18 & 19 include data points with Jeff’s patch, but the majority of data in 18 is without Jeff’s patch.
Black: v18, Blue V19. X-axis represents time in milliseconds to do a tab switch, excluding time to paint.
Above graph shows a shift towards fast tab switch times across the board with particularly nice improvements in the tail. See the corresponding 50% fall in medians on our telemetry evolution dashboard.
Unfortunately, Jeff’s patch was too good. Instead of decoding less images, it ended up decoding no images at all causing unnecessary flicker when switching tabs. The patch got backed out, but this accident provided us with a good baseline of how fast tab switching can be without decoding images Jeff landed a correction in bug 799335.
As I mentioned above graph does not include paint times. Jeff also landed bug 800031 which measures the complete tab switch duration (including paint time).
Timothy Nikkel’s Visible Image Decoding
Currently Firefox tends to decode too many images while browsing image-heavy sites. This hurts our total memory consumption, increases tab switch times, etc. Timothy has been posting tests builds in bug 689623 which try to only decode visible images. Please give those builds a spin if you suffer from poor Firefox performance while browsing image-heavy sites.
Startup Time Profiling
Benoit Girard taught the the Gecko profiler how to capture most of browser startup in bug 799638. This might be the most important achievement so far in tackling startup problems caused by extensions. This means that every Nightly user (and in a few weeks every Firefox user) can install the Gecko profiler, click ‘Profile Startup’ and get a report on what makes Firefox startup slow. This can then be posted to bugzilla, SUMO, AMO or this blog so we can easily identify problematic addons, problematic APIs used by addons and the extent of startup overhead contributed by them.
Making extension-aware startup profiling easy has incredible potential for making Firefox startup faster for extension-addicted users. If you can’t wait to try this out, you can install a development snapshot of the profiler extension. Feel free to post your startup profile links in a comment.
As I mentioned last time, Patrick McManus fixed proxy-related jank in bug 769764. Last week Vladan Djeric analyzed our chromehang data and confirmed that proxy jank went from being one of our top offenders to not happening.
We should no longer do proxy-related IO on the main thread now that Patrick McManus landed bug 769764. Synchronous proxy code resulted in a miserable user experience for people using proxies, but it also affected normal users during proxy-detection. This was one of the top intermittent freezes that we’ve seen.
Tab switching should be much faster in Firefox 18 than before. Jared Wein got rid of an expensive regexp that was applied during a tab switch in bug 781588. Jeff Muizelaar landed bug 792199 which should make switching to a image-heavy sites much faster. Jared’s change is already on nightlies, Jeff’s change should show up tomorrow if everything goes well.
Matt Woodrow landed the huge change that is DLBI in bug 539356 (yet again). DLBI speeds up reflows, reduces repaints resulting in a more responsive browser. For more details see Robert O’Callahan’s announcement.
After working on improving startup for the last couple of releases we seem to have regressed it during the 18 cycle. I filed bug 798130 on this. I just noticed the regression a few hours ago. If anyone has ideas on what might’ve caused it, please comment.
Gecko profiler now lives on AMO.
After MozCamp, we held a snappy meet-up at NoaCowork in Warsaw. I believe this was one of the most productive weeks I had the pleasure of participating in since I started at Mozilla. My only regret I was not motivated to organize any memorable after-work activities while suffering the MozCamp.EU plague (Mozilla gatherings are great for exchanging global influenza strains).
Benoit Girard went through existing and upcoming profiler features. We made sure that everyone in attendance knew how to use the profiler. We also discussed potential UX improvements.
Markus Stange is a community contributor who originally designed and implemented the current profiler UI. He attended MozCamp and spent most of Monday with us planning future profiler improvements with Benoit.
Bas-tool: Azure Drawing Tracer
Bas Schouten presented his work-in-progress graphics tracing tool. Our graphics people have been using the Microsoft PIX tool to debug accelerated drawing issues with Direct2D. I believe Bas got fed up with the buggyness and limitations of an otherwise excellent tool and wrote a similar Azure-specific tool with some special Bas-sauce.
Bas-tool presents a graphics trace so one can see how Firefox draws on the screen. Seeing how something is drawn step-by-step helps us see when we not using efficient graphics primitives, are doing redundant invalidations, etc. The tool can also do tricks like bruteforce graphics operations to find redundant ones, etc.
I expect Bas will present this tool + accompanying patches soon.
OMTC & Tab Strip
Current Firefox tab-strip implementation is crufty. It uses expensive graphics primitives, inefficient CSS transitions, implements scrolling/overflow animations in JS and does other non-performant things (tracked by bug 593680). These things happen when one keeps adding features without having good profiling/tracing tools.
Tim Taubert lead the effort to prototype a new tab strip that is implemented without JS animations and uses OMTC-friendly, efficient graphics primitives. Bas-tool was used heavily to see whether CSS transitions were animating efficiently. We sorely missed having a layout person around help diagnose layerizing issues, etc. Turns out CSS transition scheduling is very jank-sensitive. We may also need come up + implement some new CSS transition to make an attractive tab strip. Good news is that any backend improvements we make in this area should make it easier to implement fluid, responsive web apps.
Tim Taubert, Benoit Girard & Jared Wein cobbled together a desktop OMTC throbber demo where the tab throbber was implemented using CSS rotations which made it animate smoothly through content jank.
Me, Josh Aas, Vladan Djeric, Lawrence Mandel went through our new non-destructive chromehang report. Chromehangs are multi-second browser stalls that we report via telemetry. See the complete list that we went through here.
Looks our recently-discovered synchronous proxy code and flash are to blame for most of our temporary hangs. Proxy stuff should disappear once bug 769764 is fixed. Click-to-play will help with some of the plugin-caused hangs. We will be discussing how to deal with the rest of the plugin-jank in the coming weeks.
My favourite chromehang was the one that pinpointed why downloads jank Firefox so much: bug 789932. We tried to pin this on anti-virus scans, download manager sqlite activity, but the main reason turned out to be very simple. Turns out we do network traffic on a networking thread only to write out file contents to disk on main thread.
Paulo Amadini, Lawrence Mandel, Gavin Sharp and me made plans to get rid of main thread SQL usage in download, addon manager.
Vladan Djeric explained his plans to speed up & reduce jank caused by DOM Local Storage.
Margaret Leibovic worked on removing synchronous cache API usage, added pageload telemetry. She also filed a bug that resulted in 20% faster link navigation in Fennec (bug 789889). Perhaps we should do the same on our Metro build?
Olli Pettay & Felipe Gomes worked on making our social api features not leak memory.
Julian Seward, Mike Hommey, Benoit Girard worked on improving our profiling infrastructure and making it work on Android, B2G, Linux.
Josh Aas, Lawrence & me coordinated on Snappy priorities on necko team.
I’m sure I missed a few projects, I hope other attendees blog about their work last week.
Last week a few of us attended MozCamp.EU in Warsaw. Me, Benoit & Vlad presented a talk on performance work. Primary aim of our talk was to inform our community about various performance tools that came to fruition of the past year and how to use them to investigate Firefox performance problems. Hopefully we’ll see a spike in bug reports with detailed performance information (profiler traces, telemetry histogram+chromehang excepts, etc).
My favourite part of MozCamp was finally meeting some a couple of the impressive community contributors in person. I finally had the pleasure of literally buying beer to thank someone for cleaning up some nasty code. I hope some day we can do a developer-oriented MozCamp-like conference.
My favourite talk was Anant & Tim’s presentation on WebRTC. There is something incredibly attractive about having an encrypted, cross-browser, firewall-punching p2p implementation (realtime open video conferencing is a nice bonus). See Anant’s blog post for more details.
End of summer is a tough time to make progress because a lot of people are on vacation. Surprisingly, Firefox got some good fixes in since the last update.
Less Slow Startups
Bug 726125: should get rid of a lot of super-slow startups. Due to an abstraction accident we ended up validating jars more eagerly than expected. Firefox would go on the net (on the main thread) to check the certificate every time a signed jar was opened. There are over 500 signed extensions on AMO with over 14million active users. See the following for background on the (now dead) feature that caused our jar code to go nuts: signed scripts and note on removal of signed script support. Thanks for Nicholas Chaim and Vladan Djeric for fixing this.
Less Proxy Lag (WIP)
Bug 769764. We have received a lot of strange complaints about Firefox network performance that we could never reproduce. Turned out this was because none of us used proxies. Patrick McManus discovered a lot of synchronous proxy and DNS code in our network stack.
Fix for this should also improve performance for people without proxies since proxy-autodetection code was also doing main thread IO. As a result all of us replacing sync APIs with async ones all of the existing proxy-related addons will have to be updated. Patrick is reaching out to addon authors to make sure addons are updated in time for the next release.
Less UI Repaint Lag
Bug 786421: Nightlies got unbearably slow for me recently. Turned out we ended continuously resizing + applying theme + redrawing invisible tooltips on every paint. Thanks for Timothy Nikkel for fixing this. This bug never affected anyone outside of the Nightly/Aurora testers, but it serves as yet another example of how the Gecko Profiler makes it easier than ever to diagnose weird performance problems. The single biggest contribution anyone can do at the moment is to provide instructions of how to reproduce lag with accompanying profiler traces.
Less Gradient Lag
Bug 761393: Paul Adenot implemented a gradient cache. This was landed as a Telemetry experiment so we can determine what the optimal cache retention strategy is. We’ll be watching the relationship between GRADIENT_DURATION and GRADIENT_RETENTION_TIME in the coming weeks.
Currently rendering gradients cause stalls in the GPU pipeline. In previous experiments we found out that most of the tab-switch rendering time in hardware-accelerated Firefox is spent rendering gradients :(. Gradients are hard to notice for casual users, but they are heavily used in our tab strip and on Google web properties.
I may not have a chance to post the next snappy update as I’ll be hopping on the plane to Warsaw right after our meeting. If you are attending MozCamp come to our ‘All About Performance’ session. Our goal for the talk is to significantly expand the pool of people who can diagnose Firefox (and web) performance problems.