New about:telemetry page in Firefox 19

Firefox 19 now has a built in “about:telemetry” page which displays the data gathered by the “Telemetry” system. If the user has opted in, Telemetry will gather information about Firefox performance, hardware, usage and customizations and submit it to Mozilla. The information in the about:telemetry page is meant to inform users about the data being sent to Mozilla, but it can also help technical users report performance bugs to Mozilla.

For example, the “Slow SQL Statements” section identifies slow SQLite operations which have caused Firefox “jank” (temporary hangs):

about:telemetry "Slow SQL Statements" section

In Nightly builds which support profiling, the “Browser Hangs” section will show call stacks from the main thread whenever it took longer than 5 seconds to process an event. Clicking on “Fetch function names for hang stacks” converts the stack PCs to human-readable function names:

about:telemetry "Browser Hangs" section

The Histograms sections contains various performance and usage measures expressed in histogram form. The histogram data is primarily intended for Firefox developers, but the brave can look up the histogram definitions in Histograms.json and compare their own values to the Firefox population using the Telemetry Dashboard.

about:telemetry "Histograms" section

I invite you to leave a comment or file a bug if there are any changes you would like to see made to the about:telemetry page to make it more useful for you.

Please note that the old about:telemetry extension has been obsoleted by the built-in page and that the extension is now marked incompatible with Firefox 19+.

Cache, plugin, font operations most common in chrome hang reports

A few months ago, in bug 712109, I added functionality for reporting temporary main thread hangs to Telemetry and recently I obtained the first batch of reports. Hang reporting works by monitoring the event loop for inactivity — if the event loop has not started processing a new event for 10 seconds, then the hang monitor will capture a snapshot of the main thread’s call stack and attach the addresses to the next Telemetry report along with the measured duration of the hang. The default minimum hang duration is 10 seconds, but this value is configurable through the hangmonitor.timeout preference. The chrome hang reporter only captures a single snapshot of the call stack, so the captured stack may not always be representative of all of Firefox’s activities during the hang. Generally, it is enough to identify the offending code.

The Hang Reporter Population

Over the 2-month period covered by this batch of data, there were only 42 Telemetry reports containing chrome hangs. This is because hang reporting is only enabled on the nightly-profiling branch as it requires frame pointers for  unwinding the call stack. Additionally, there aren’t many users on the nightly-profiling branch — in fact, yesterday, there were only 14 unique Telemetry submissions by users of the nightly-profiling branch.  Further, we currently don’t save chrome hangs in persistent Telemetry, so some of the reports were likely lost during browser restarts (bug 763113).

Some details immediately jumped out of the hang report data:

  • The median hang duration is 20 seconds.
  • Two underpowered hardware configurations kept popping up in the hang reports, so these two users are over-represented in the data. Together, they contributed 25 of the 42 Telemetry reports.
  • Most Telemetry pings with a chrome hang have only a single hang, but some reports have as many as 10 hangs during a single browsing session.

In order to symbolicate the PCs in the hang reports, we rely on Breakpad symbols. Unfortunately, it seems that symbols for nightly builds are deleted after 30 days, so most of the hangs from April and early May can not be symbolicated. That left 17 Telemetry pings out of 42 whose hangs could be symbolicated. Going forward, the plan is to automate this process and have symbolication done every day (bug 763116).

I think it would also make sense to lower the hang threshold (bug 763124). A threshold of 5 seconds likely wouldn’t overwhelm the Metrics servers even if we deployed hang reporting on the regular nightly channel and it would allow us to detect more subtle performance issues.


Cache Operations

Out of the 33 symbolicated hang stacks, call stacks ending with cache operations are the most common (12 hang stacks). A variety of cache operations are represented in these stacks:

  • nsCacheEntryDescriptor::GetMetaDataElement x3
  • nsCacheEntryDescriptor::GetStoragePolicy x3
  • nsCacheEntryDescriptor::GetDeviceID x2
  • nsCacheEntryDescriptor::Release x2
  • nsCacheEntryDescriptor::GetExpirationTime x2

All of the stacks were reported by a single Windows XP machine, however the reports are over the span of a week and this machine is responsible for almost half of the hang stacks in the entire data set. I think these stacks are a good argument for devoting additional resources to investigating and improving the performance of our caching mechanisms.


The second most common category of hangs are those involving plugins: loading plugins, destroying plugins, setting the window for a plugin and scripting plugins. In each of these cases, the main thread gets stuck waiting for the plugin running in the plugin-container process. It’s not possible to identify the plugin using only the callstack and unfortunately, we currently don’t collect the list of installed plugins.


There were 2 hangs reported while getting font tables. I’ve also recently personally witnessed brief hangs lasting a couple of seconds while fetching the font list. I filed bug 763134.

Excerpt from a sample hang, lasting 16 seconds:

KiFastSystemCallRet (in ntdll.dll)
GDIFontEntry::GetFontTable(unsigned int,FallibleTArray<unsigned char> &) (in xul.dll)
GDIFontEntry::ReadCMAP() (in xul.dll)
gfxFontFamily::ReadAllCMAPs() (in xul.dll)
gfxPlatformFontList::RunLoader() (in xul.dll)
gfxFontInfoLoader::LoaderTimerFire() (in xul.dll)
gfxFontInfoLoader::LoaderTimerCallback(nsITimer *,void *) (in xul.dll)
nsTimerImpl::Fire() (in xul.dll)


The data set contained two hangs where GC took 15 seconds and 31 seconds, and another hang where GC took 24 seconds but it also encompassed destructing an HTML document.


There was a single Telemetry report containing 10 chrome hangs with very similar hang stacks of about ~30 seconds each. All of the stacks had to do with painting gradients via D2D. I think this issue might already be covered in bug 750871.

DrawingContext::FillRectangle(D2D_RECT_F const *,ID2D1Brush *) (in d2d1.dll)
D2DRenderTargetBase<ID2D1BitmapRenderTarget>::FillRectangle(D2D_RECT_F const *,ID2D1Brush *) (in d2d1.dll)
_cairo_d2d_fill (in gkmedias.dll)
_cairo_gstate_fill (in gkmedias.dll)
_moz_cairo_fill_preserve (in gkmedias.dll)
nsCSSRendering::PaintGradient(nsPresContext *,nsRenderingContext &,nsStyleGradient *,nsRect const &,nsRect const &,nsRect const &) (in xul.dll)
nsImageRenderer::Draw(nsPresContext *,nsRenderingContext &,nsRect const &,nsRect const &,nsPoint const &,nsRect const &) (in xul.dll)
nsCSSRendering::PaintBackgroundWithSC(nsPresContext *,nsRenderingContext &,nsIFrame *,nsRect const &,nsRect const &,nsStyleContext *,nsStyleBorder const &,unsigned int,nsRect *) (in xul.dll)
nsCSSRendering::PaintBackground(nsPresContext *,nsRenderingContext &,nsIFrame *,nsRect const &,nsRect const &,unsigned int,nsRect *) (in xul.dll)
nsDisplayCanvasBackground::Paint(nsDisplayListBuilder *,nsRenderingContext *) (in xul.dll)
mozilla::FrameLayerBuilder::DrawThebesLayer(mozilla::layers::ThebesLayer *,gfxContext *,nsIntRegion const &,nsIntRegion const &,void *) (in xul.dll)
mozilla::layers::ThebesLayerD3D10::DrawRegion(nsIntRegion &,mozilla::layers::Layer::SurfaceMode) (in xul.dll)
mozilla::layers::ThebesLayerD3D10::Validate(mozilla::layers::ReadbackProcessor *) (in xul.dll)
mozilla::layers::ContainerLayerD3D10::Validate() (in xul.dll)
mozilla::layers::ContainerLayerD3D10::Validate() (in xul.dll)
mozilla::layers::LayerManagerD3D10::Render() (in xul.dll)
mozilla::layers::LayerManagerD3D10::EndTransaction(void (*)(mozilla::layers::ThebesLayer *,gfxContext *,nsIntRegion const &,nsIntRegion const &,void *),void *,mozilla::layers::LayerManager::EndTransactionFlags) (in xul.dll)
nsDisplayList::PaintForFrame(nsDisplayListBuilder *,nsRenderingContext *,nsIFrame *,unsigned int) (in xul.dll)
nsLayoutUtils::PaintFrame(nsRenderingContext *,nsIFrame *,nsRegion const &,unsigned int,unsigned int) (in xul.dll)
PresShell::Paint(nsIView *,nsIWidget *,nsRegion const &,nsIntRegion const &,bool) (in xul.dll)
nsViewManager::Refresh(nsView *,nsIWidget *,nsIntRegion const &,bool) (in xul.dll)
nsViewManager::DispatchEvent(nsGUIEvent *,nsIView *,nsEventStatus *) (in xul.dll)

It’s also possible that a developer had a debugger attached and that these pauses are from code hitting breakpoints. Telemetry submissions should add a flag to indicate whether a debugger is attached (bug  763138).

Other Hangs

There were half a dozen other hangs in this data set that I still need to investigate further as their causes are somewhat perplexing and I am not familiar with the code in question. For example, in one hang stack, the CreateToolhelp32Snapshot API call used for gathering a list of libraries took 19 seconds while waiting to enter a critical section. In another, it looked like JavaScript ran for 63 seconds without being interrupted by the slow-script dialog box.

If anyone is interested in diagnosing these hangs, I’d appreciate it if you could take a look at the stacks and point me in the right direction or file bugs if you recognize a potential cause. You can leave me a comment on this post below.

These are the remaining stacks:


Profiling with the Built-in Gecko Profiler and Local Symbols on Windows

If you would like to use the built-in Gecko Profiler with a local build of Firefox for Windows, you will need to point the profiler to a local Snappy Symbolication Server instead of the official Mozilla symbolication server. The server will host the local Firefox symbols for the profiler and it will fetch any symbols not available locally from the official Mozilla symbolication server (e.g. symbols for Windows DLLs and plugin DLLs).


Looking at my own about:telemetry data

After I looked at Telemetry data from users with super-slow startups, I decided to take another look at my own about:telemetry page for comparison.

First off, there were 9 SQL statements on the main thread which required longer than 100ms to execute, as well as 46 slow SQL statements on other threads. This is a bit worse than usual for my about:telemetry page + a bit surprising since my machine is a recent i7 laptop with a 7200rpm drive, running on a “maximum performance” power profile. My first suspicion fell on a busy session restore period with 5 windows and ~50 tabs but my browser was not configured to actually load the pages in the background tabs on session restore. I took some comfort in knowing that most of the components which executed those main-thread queries are already being refactored to do async I/O instead.

I was also pretty surprised by the distributions found in a few of the histograms in about:telemetry. My CACHE_DISK_SEARCH histogram, which represents the number of milliseconds to search the disk cache (grouped into buckets), had a surprisingly long tail, with 11 cache reads requiring 100ms or more:

(click for full size)

The CACHE_SERVICE_LOCK_WAIT_MAINTHREAD histogram was also concerning with a few dozen accesses requiring 100ms or more. I didn’t get a screenshot of the original very-long-tail histogram, but I do tend to see this kind of tail show up fairly often:

(click for full size)

This is a known bug and a patch is getting reviewed in bug 722034.
GC_MS is also showing about half the GC operations taking more than 100ms, but  incremental GC is going to be re-enabled soon from what I understand.
Finally, there was the issue of startup time. Startup always takes a few seconds on this machine if I’m restoring a big mess of tabs, but it felt extra slow this time so I took a look at the startup timestamps:
  20ms            main
1856ms            createTopLevelWindow
2806ms            firstLoadURI
4025ms            delayedStartupStarted
4011ms            firstPaint
4060ms            sessionRestoreInitialized
4062ms            sessionRestoreRestoring
4604ms            delayedStartupFinished
6461ms            sessionRestored
We can see a URI being loaded before first paint and adding up to 1.2 seconds to the startup time. I mentioned this in my previous blog post and I’ve filed bug 756313.
In general, looking at about:telemetry can be a bit daunting at first but it can really help narrow down sources of performance problems and give insight into how well the browser is performing in your specific environment. In the near future, we’ll have about:telemetry integrated into the browser (bug 661881) and with a makeover from the UX team, it will get a lot easier to extract meaning & file detailed bugs.

More Telemetry from Super-Slow Startups

Last week, I revisited the issue of super-slow startups by poring over another batch of Telemetry data from February/March of this year. There were 2263 Telemetry submissions matching the following criteria:

  • uptime < 5minutes
  • firstpaint > 30 seconds
  • uninterrupted startup
  • Windows OS
  • containing “appUpdateChannel” field (Firefox 11+)

I further limited my investigation to submissions with fewer than 5 addons, bringing the total down to 1209 such Telemetry submissions — a decidedly small number compared to the number of daily Telemetry submissions over a 2 month period. I was motivated to take another look at super-slow startup reports because Telemetry now reports Firefox startup metrics separately: select performance histograms from startup are now reported separately, as are slow SQL statements from startup.

As in the previous analysis, part of the startup woes seen in these reports can be blamed on Firefox occasionally trying to restore the previous session before painting the UI. There is a fix available in bug 715402 but it hasn’t landed yet. However, my overall impression is that these super-slow startups are primarily caused by very slow computers or computer resources temporarily being in very short supply while Firefox is starting. As such, we may not have many options for helping these users. This is a fairly typical timeline of a very slow startup:

14.406s                         main
51.110s                         createTopLevelWindow
62.344s                         firstLoadURI
62.543s                         delayedStartupStarted
62.547s                         firstPaint
62.584s                         sessionRestoreInitialized
62.614s                         sessionRestoreRestoring
62.684s                         delayedStartupFinished
62.688 s                        sessionRestored

It takes 14 seconds just to reach Firefox’s main and then another 37 seconds to create the top level window, strongly suggesting an over-loaded system. It would be nice if we could confirm this through Telemetry, for example by gathering Firefox page fault stats but it seems that’s pretty tricky to do on Windows. Another very common pattern in the reports (also seen in the sequence above) is that the first real URI (not about:blank) most often starts to load before first paint is complete (filed bug 756313).

Slow SQL statements and other Firefox I/O activities definitely do contribute to the super slow startup times but, as before, their totals are roughly an order of magnitude smaller than the total time required for startup. In particular, STARTUP_NETWORK_DISK_CACHE_OPEN and STARTUP_CACHE_DEVICE_SEARCH frequently appear in super-slow startup reports, contributing several seconds each to startup times. Additionally, MOZ_SQLITE_COOKIES_READ_MAIN_THREAD_MS, MOZ_SQLITE_OTHER_READ_MAIN_THREAD_MS, MOZ_SQLITE_OTHER_SYNC_MAIN_THREAD_MS, MOZ_SQLITE_OPEN_MAIN_THREAD_MS also very commonly report multi-second values. Surprisingly, it also seems very common in these reports for Firefox to read over a megabyte from the cookies.sqlite database during startup alone. This flurry of activity likely comes from the first URI (the homepage?) being loaded and the previous session being restored.

Interestingly, the addons system also seems to be a source of slow main-thread SQL during startup:

  • INSERT INTO locale (name, description, creator, homepageURL) VALUES (:name, :description, :creator, :homepageURL)
  • DELETE FROM addon WHERE internal_id=:internal_id

I believe these SQL statements are associated with addon updates during startup. There may be an opportunity to improve performance by moving more of these operations to a background thread.

Finally, there also appears to be lock contention between the main thread and the async thread on cookies.sqlite DB accesses, e.g. the following statements:

SELECT name, value, host, path, expiry, lastAccessed, creationTime, isSecure, isHttpOnly FROM moz_cookies WHERE baseDomain = :baseDomain

SELECT name, value, host, path, expiry, lastAccessed, creationTime, isSecure, isHttpOnly, baseDomain FROM moz_cookies WHERE baseDomain NOTNULL

This analysis was a lot quicker than the last since I had scripts for sifting through the data and generating the reports. The next step will be to follow up on some of these issues (homepage load before first paint, lock contention, synchronous SQL). Ideally, it will eventually be possible to automate analysis of Telemetry data and have scripts automatically flag troublesome SQL queries or regressions in reported startup times.

Setting up Snappy Symbolication Server locally

UPDATE: If you have been granted access to Mozilla’s jump host, you can create an SSH tunnel to the internal symbolication server:

ssh -L -N

You will then be able to use Benoit’s SPS extension and the custom about:telemetry extension out of the box on Windows profiling builds.

UPDATE #2: Refreshed the links to point to a more recent build of Firefox, the old version was incompatible with the latest SPS extensions

In my previous blog post, I gave an overview of the new Hang Detector feature and the Snappy Symbolication Server. Since then, more than one person has asked me for info on deploying the server locally so I’ll reproduce a quick and dirty walkthrough here for easy reference. The instructions below are the minimum required to see a demo of the functionality.

Setting up a profiling Nightly:
1. Download and install the profiling Nightly from March 28th, 2012:
2. Disable updates in the profiling Nightly from Firefox -> Options -> Options -> Advanced -> Update -> “Never check for updates”.
This will prevent this test version of Nightly from getting out of sync with the debug symbols you will download in the next step.
3. Opt into Telemetry from Firefox -> Options -> Options -> Advanced -> General -> “Submit performance data”. Restart the browser.
4. Install a custom version of the about:telemetry extension:

Setting up a Symbolication Server locally:
5. Download some of the symbols for this Nightly profiling build from Extract the zip file locally.
6. From Windows command line: “git clone git://”. You can get “git” for Windows from or you can install Cygwin.
7. Edit the Snappy Symbolication Server’s “sample.conf” file so that firefoxSymbolsPath and osSymbolsPath point to the “symbols_ffx” directory extracted in step #5. You might also want to set enableTracing = 1 to help with identifying issues.
8. Run the symbolication server locally with “python sample.conf”. You can get python for Windows from or you can install Cygwin.

9. Click on “Force Hang” from the “about:telemetry” URL in the profiling Nightly, acknowledge all alerts. Reload the page and click “Symbolicate Stacks”. You should now see a stack that looks something like this:

Hang report #1 (18 seconds):

??? (in ntdll.dll)
PRMJ_Now (in mozjs.dll)
js_Date(JSContext *,unsigned int,JS::Value *) (in mozjs.dll)

The question marks above are for OS libraries (ntdll.dll) for which you don’t have local symbols and JIT-ed code on the bottom of stack which is not possible to stackwalk reliably.

Note: To generate your own Firefox symbols instead of downloading a sample as above, you’ll first have to make a local debug build of Firefox with an additional “ac_add_options –enable-profiling” line in your .mozconfig. After you’ve built this profiling version of Firefox, you can generate symbols for all your Firefox libraries by running “make buildsymbols” from your object directory.

Introducing Chrome Hang Reporting and the Symbolication Server

The Hang Reporter is a new Firefox feature that reports transient main thread hangs to Telemetry. It relies on the existing hang monitor thread to detect when the main thread has not processed any new events in the last X seconds. If it detects a hang, it will suspend the main thread to walk its call stack for PCs and to gather version and address information about the modules loaded in process memory. This data along with the hang’s duration is reported to Telemetry, assuming the user has opted in to submitting performance data. This Telemetry hang information will help us identify and reduce common causes of browser “stalls”. In the interest of saving Telemetry bandwdith, only the information about the modules in memory directly involved in chrome hangs is reported + the memory maps in each hang report are a diff against the memory map in the first hang report of the session.

Currently, the Hang Reporter is only active on Windows Nightly builds from the Profiling branch because it relies on having Firefox compiled with frame pointers to unwind the call stacks. The Hang Reporter functionality is #ifdef’d out on regular release builds. The default minimum interval for capturing a hang is 10 seconds, but the value can be changed via the “hangmonitor.timeout” config preference.

This next bit of functionality is not quite ready for prime time, but you can examine your own browser’s hang reports locally by installing this custom version of the about:telemetry extension that adds a “Main Thread Hangs” section to the about:telemetry page and a button for causing hangs at will for testing.  You have to use either a local debug build of Firefox with symbols or a Nightly Profiling build to see the hang reports.

The hang reports contain raw PCs from the hung stack and a list of the modules involved:

Main Thread Hangs:

Hang report #1 (1 seconds):

Stack: 0x6027b8df 0x6027b897 0x602ce2b7 0x602cf482 0x602d2a7a 0x602d24c1 0x1d6cef50

Memory map:

Hang report #2 (8 seconds):

Stack: 0x75b9f5be 0x10e9051d 0x10e90928 0x112441a6 0x111cb534 0x11018c6f 0x112a4f4e 0x112a4e72 0x112a4d7d 0x10e905b0 0x10e33907 0x10b59f4a 0xf884cac 0x12324ac 0x1231d26 0x1231319 0x1237cf8 0x1237b3f 0x74cf339a 0x775d9ef2 0x775d9ec5

Memory map:

Pending security & privacy review, a symbolication server will eventually be available publicly to symbolicate these hang reports for about:telemetry users. This symbolication server will also be used by the SPS profiler. If you are a developer comfortable with Firefox development and you feel like killing a bit of time :), you can already symbolicate the Firefox portions of these hang reports by running our new Snappy Symbolication Server locally (available on Github). You will need to run the server locally on port 8000 with the config file pointing to your local .sym symbol files. You can generate these .sym files by running “make buildsymbols” from the object directory of your local build. You can then use the “Symbolicate Stacks” functionality in the about:telemetry page.

I don’t expect many any people to go through the trouble of setting all this up :), but the Hang Reporter really does become informative to use after configuring the hang interval to its minimum (1 second) and observing how often and why the browser hangs during regular browsing.

EDIT: Further instructions on setting up the Snappy Symbolication Server can be found here.

Telemetry from Super-slow Startups, First Impressions

I have been investigating very slow startups by looking at Telemetry reports of startups taking longer than 30 seconds. There were 49 reports of super-slow startups during a one month period starting Dec 15th that met the following criteria:

  • First paint* > 30 seconds
  • Uninterrupted startup
  • Windows OS
  • Recent Nightly or Aurora build reporting on slow SQL Telemetry
  • Uptime < 5minutes

  * There are 3 metrics used for representing startup speed: “main”, “firstPaint” and “sessionRestore”. The names are mostly self-explanatory, but you can find more detail here.

Two thirds of the slow startup reports had more than 5 addons. While addons may be contributing to slow startups in some cases, there doesn’t seem to be much difference between Telemetry reports for browsers with few addons and many addons. The were also two reports with negative values for uptime (filed bug 722651, similar to bug 670008), but this is a minor bug and not a sign of corruption in the reports.

UPDATE: Some of these reports may come from developer builds with Telemetry manually enabled (Bug 722240) which might explain some of the long startup times. It doesn’t look like there is any way to distinguish local builds from official builds in this dataset (fixed in Bug 720875).


The root causes behind the long startups are not immediately obvious from these Telemetry reports. This is partly because Telemetry doesn’t measure everything of interest yet and partly because the Telemetry data was not divided into startup and post-startup measurements (now fixed in bug 720456). Currently, the Telemetry histograms contain measurements both from startup and the initial 5 minutes of browser use.

Most of the startups (as measured by firstPaint) were between 30-60s long. In 32 of the 49 reports, there was a delay of at least 10 seconds before XRE_main was called, perhaps suggesting that these systems were very slow or the startup took place during a time of very high contention for the computer’s resources, perhaps during boot up or a resume from hibernation. In 7 reports, the time to reach XRE_main was even greater than the additional time for producing the first paint. The additional time required for session restore was either negligible (nothing to restore) or surprisingly relatively small compared to total startup time (less than 15%).

One observation of note is that sessionRestore sometimes occurs before firstPaint (bug 715402). This likely further degrades the user experience during slow startups.


The total times recorded by Telemetry histograms in the slow startup reports are not large enough to fully explain the lengthy start up times. The only histograms reporting times on par with the total start up times (or much larger!) are those for asynchronous SQL execution (MOZ_STORAGE_ASYNC_REQUESTS_MS), URL classifier work (URLCLASSIFIER_PS_CONSTRUCT_TIME, MOZ_SQLITE_URLCLASSIFIER_READ_MS, …) and HTTP page loading (HTTP_SUBITEM_OPEN_LATENCY_TIME, HTTP_SUB_COMPLETE_LOAD_CACHED, …) None of these operations should be affecting startup times: page-loading should not be happening before first paint (barring bug 715402 above) and URL classifier operations &  asynchronous SQL are executed on worker threads and should not delay startup.

There are, however, a few operations lasting multiple seconds that should not be, such as DWRITEFONT_DELAYEDINITFONTLIST_TOTAL and DWRITEFONT_DELAYEDINITFONTLIST_COLLECT taking 17 seconds each in one report. Similar font performance issues are already being addressed in bug 705594.

The table below shows a few histograms of interest with total times in excess of 1 second during the initial 2-4 minutes:

# of Telemetry reports






























































Although the operations measured by the histograms above are likely not solely responsible for the long startup times, they merit further performance investigation.


Adding to the slow system I/O hypothesis, all of the slow startup reports contain multiple SQL statements with execution times over 100ms. Nevertheless, the time taken by these slow SQL statements is still not enough to adequately explain the slow startups.

These are the top 10 most common slow SQL statements across the slow startup reports:

Main Thread

# of Telemetry reports


SQL string




SELECT, h.url, IFNULL(b.title, h.title), h.rev_host, h.visit_count, h.last_visit_date, f.url, null,, b.dateAdded, b.lastModified, b.parent, null, h.frecency, b.position, b.type,, b.guid FROM moz_bookmarks b LEFT JOIN moz_places h ON = LEFT JOIN moz_favicons f ON h.favicon_id = WHERE b.parent = :parent ORDER BY b.position ASC


INSERT OR IGNORE INTO moz_places (url, title, rev_host, hidden, typed, frecency, guid) VALUES (:page_url, :page_title, :rev_host, :hidden, :typed, :frecency, GENERATE_GUID())


SELECT session FROM moz_historyvisits ORDER BY visit_date DESC


SELECT COUNT(1) AS numEntries FROM moz_formhistory


DELETE FROM addon WHERE internal_id=:internal_id


DELETE FROM moz_formhistory WHERE lastUsed <= :expireTime


SELECT a.item_id FROM moz_anno_attributes n JOIN moz_items_annos a ON = a.anno_attribute_id WHERE = :anno_name


UPDATE moz_bookmarks SET lastModified = :date WHERE id = :item_id


PRAGMA page_size

Other Threads

# of Telemetry reports


SQL string


SELECT domain, partial_data, complete_data FROM moz_classifier


DELETE FROM moz_classifier WHERE table_id=?1 AND chunk_id=?2


UPDATE moz_places SET frecency = ROUND(frecency * .975) WHERE frecency > 0


SELECT 1 FROM moz_places h WHERE url = ?1 AND last_visit_date NOTNULL


INSERT INTO expiration_notify (v_id, url, guid, visit_date, expected_results) SELECT, h.url, h.guid, v.visit_date, :limit_visits FROM moz_historyvisits v JOIN moz_places h ON = v.place_id WHERE (SELECT COUNT(*) FROM moz_places) > :max_uris AND visit_date < strftime(‘%s’,’now’,’localtime’,’start of day’,’-7 days’,’utc’) * 1000000 ORDER BY v.visit_date ASC LIMIT :limit_visits


UPDATE moz_places SET frecency = CALCULATE_FRECENCY(:page_id) WHERE id = :page_id


SELECT id, title, hidden, typed, guid FROM moz_places WHERE url = :page_url


ANALYZE moz_places


INSERT INTO moz_historyvisits (from_visit, place_id, visit_date, visit_type, session) VALUES (:from_visit, :page_id, :visit_date, :visit_type, :session)


UPDATE moz_places SET frecency = CALCULATE_FRECENCY(id) WHERE frecency < 0

Most of the statements above were executed during regular browser use and not during startup time. Perhaps the most interesting query in the list above is the main thread “PRAGMA page_size” query. It should be a quick read-only query, but in one report it took 4.888s on the main thread while another, more complex query took the exact same amount of time on a different thread. As it turns out, several main thread queries suffer from lock contention with queries on other threads (e.g. bug 722242).
Another interesting point is the factor of 10 or even 100 difference between MOZ_STORAGE_ASYNC_REQUESTS_MS and the slow SQL telemetry times. This is caused by slow SQL telemetry limiting itself to reporting only on prepared statements executed against known Firefox databases. We should also track slow SQL executed from dynamic strings, perhaps by reporting the state of the JavaScript stack instead of the SQL string itself which may contain privacy-sensitive argument values (bug 722368).


The obvious next steps are to 1) eliminate the SQL blind spots by tracking times for all SQL executed in the browser and 2) analyze startup data from the most recent Nightlies containing separate startup metrics. I will also be posting more startup analysis from the individual slow startup reports in this batch of data.

Hello World

I have been postponing blogging for a while now, but I would like to start 2012 on the right foot, so here goes :) I’m Vladan, a new-ish member of Mozilla’s Performance team, having started back in October 2011. I did my schooling at the University of Waterloo (undergrad) and University of Toronto (grad) and worked on a foreign exchange retail trading platform before joining Mozilla.

So far I’ve had the opportunity to work on a couple of interesting performance projects. My first major bug was to extend Telemetry performance reporting to report slow SQL statements (bug 699051). As a privacy precaution, Telemetry will only record the strings of prepared SQL statements executed against known Firefox DBs. So, for example, query arguments will never be recorded and neither will any SQL executed against extension-specific DBs.

For the time being, only Firefox Nightlies have the new slow SQL instrumentation. You can examine the data gathered by Telemetry by installing the (restartless!) about:telemetry extension and then navigating to the the “about:telemetry” URL. If any SQL statements in your browser needed more than 100ms to execute, you will see tables looking something like this:

Slow SQL Statements on Main Thread
Hits Avg. Time (ms) Statement
1 109 UPDATE moz_bookmarks SET lastModified = :date WHERE id = :item_id

Slow SQL Statements on Other Thread
Hits Avg. Time (ms) Statement
3 135 DELETE FROM moz_classifier WHERE table_id=?1 AND chunk_id=?2

In addition to work on SQLite performance, I am also currently working on a few other tasks such as reporting main thread hangs (bug 712109) and helping speed up Firefox shutdown (bugs 684513, 662444). I will try to blog about some of this work next week.