Over the last couple of weeks, I’ve been working with Jan Bambas’s visual event tracer and using it to analyze why we’re slower than the stock Android browser on Eideticker’s New York Times pageload test.
The first order of business was to add trace-saving capabilities to the tracer. With this change, I could capture traces on my Android device and view them on my desktop, which is much more convenient than looking at them on my phone. (Especially because Simile Timeline, which the about:timeline extension uses, is difficult to use on mobile and slow in Firefox, so it needs all the CPU power it can get. If any JS hackers out there have a replacement timeline widget recommendation, I am all ears!)
The second order of business was to modify Firefox to start and stop tracing automatically, rather than requiring the user to push buttons in about:timeline. Pushing buttons works fine on desktop, not so much on automated tests on mobile devices. I experimented with way too many fully general solutions before settling on the straightforward path: modifying nsDocShell::LoadURI to start and stop the tracer when certain URIs were fetched.
Now the tracer would automatically record profiles, on to viewing them! I modified Jan’s about:timeline extension to function properly in a browser not built with event tracing support. I also added the ability to use a trace saved in a file, rather than recording one in a browser. You can see the changes I made in my about:timeline github repository.
With all those changes in hand, we can view pretty pictures of what’s going on when the browser runs Eideticker tests. First, a normal Firefox for Android.
That almost-4-second delay you see there is the cache getting opened. Yes, that’s happening on the main thread and yes, it’s blocking everything else from moving forward. This is a great illustration of why we are rewriting the network cache. If the cache is so slow, what happens if we disable the cache?
It’s hard to tell from the picture, but if you looked at the Eideticker log of network requests, you’d see that everything completes somewhere around a 0.5-1 second faster (!). Why doesn’t it speed up more than that, since we’ve removed all that cache blocking by disabling the cache? Not sure yet, although you can see that cache entries are still getting opened. Also, you can see highlighted in the picture that the first network request takes about 250ms to start executing from the time that it’s fired off. That also needs to be investigated (and fixed!).
What about this new cache rewrite, how does it fare? Thought you’d never ask.
Again, you have to check out the HTTP request log, but this is still another 0.3 seconds or so faster than the uncached case that we see above. So the new network cache is a clear win in this instance. It’s worth pointing out that the network request delays are still present here and are taking slightly longer than the cache-disabled case. Variance in profiles? Effects of the new cache? Not sure yet!
I’m still figuring out my way around the tracer, but it’s a useful tool to peer into the Firefox network stack. It would be great if other subsystems (images, CSS, layout) would hook into the event tracer so we could have visibility into what they’re doing, too. The API is pretty simple and causes no runtime overhead in our usual release builds.