Studying Fennec Performance

Working on Fennec performance N810 has been very educational. I have been learning more and more about performance profiling on crappy platforms. I define a platform as crap if it has poor development tools, limited OS or other significant limitations. Linux is a crappy platform in this case because it’s running on ARM where oprofile barely does anything and there are no other performance tools for N810 (more modern ARM cpus should user-space perf counters and be more useful for instrumentation).

JSD Secret Sauce for JS Optimization

In general there is a misconception that implementing stuff in JavaScript will result in slower code than doing the same in C++. That may be true if the code is implemented in the exact same manner. But in real life the expressiveness and safety of a high-level language like JavaScript permits algorithmic optimizations that would often not be realistic to do in C++ because of time/safety constraints.

So how does one figure out what’s slow in JS? timeless suggested that I checkout the JavaScript Debugging API. Using the API and a small hack in spidermonkey(JSD doesn’t expose fast dom calls) I was able to hook into chrome code to get a timed trace of JavaScript functions being run.

Once I had a trace it was relatively easy to figure out to “do not call this slow thing all the time” dance (aka optimize code). I collected that work in bug 470116. Last I checked there was relatively little room for optimization left on the JS side of Fennec, so then I went to look at what’s lurking in C++.

PS. Firebug is a popular consumer of the JSD API for those times when one isn’t willing to write JS components to figure out why something is slow.

C++ Is Harder

I’ve some success with inserting probes into C++ code. I would find interesting code by running oprofile on the desktop (while doing things that I felt were slow on N810). Oprofile would then provide me a callgraph which I would visualize with this awesome little script. Then I would stick MeasurerOfTime timing blocks into interesting “hot” code and hope that I would learn something useful.

This got me thinking. Wouldn’t it be nice if there existed a JSD for C++? It’d be cool to inspect the C++ callgraph just like one does for JS. It seems like it would help on platforms that aren’t gcc and can’t inject tracing code via -finstrument-functions. Even -finstrument-functions is of limited use due to the pain of looking up symbols in shared libraries. Stay tuned.

Measuring Progress

The worst part of doing optimizations is knowing that some time in a future an innocent programmer will slightly change some seemingly innocent code and things will no longer happen quickly. Short of policing every single patch by people who previously optimized code in question there is only one thing one can do: performance tests.

Fennecmark is a benchmark for measuring responsiveness of the Fennec features that I worked on most: panning, zooming and lag during pageload. I blogged about it before. Since then Joel Maher has gotten Fennecmark to run automatically and produce results on the graph server. I think we should be logging more numbers (Tpan, Tzoom), but it’s an excellent first step in monitoring performance regressions.


  1. Curtis Bartley

    I’ve been using source-level instrumentation to get callgraphs out of Firefox. This approach is deficient in all sorts of ways but it does save you the trouble of looking up symbols. For me the real problem is not getting decent data, it’s simply being able to visualize the data that I can collect.

  2. Hi Tara,

    I wont to measure the time needed to load a site (Page).
    Can u help me out?

  3. > Linux is a crappy platform in this case because it’s running on ARM where oprofile barely does anything and there are no other performance tools for N810

    Best thing in optimizing code is understanding what it does. Oprofile can provide whole system bottleneck functions. However, trying to take callgraphs on a slow system like N810 is not really that useful or much fun, so I typically use valgrind / callgrind / kcachegrind on desktop for analyzing how a particular process’ code arrives at/uses the bottleneck functions that Oprofile highlighted.

    As long as the code logic is same on two platforms, this works really fine.

    Strace and functracer give also interesting information although they don’t directly tell about the performance: