After talking to a couple people at all-hands, it became clear that writing your own profiler was a popular activity. (Jeff Muizelaar informed me that last year, the pet project was heap analyzers. What’s next for 2012?) A short, non-exhaustive list:
- Patrick Walton’s Piranha profiler for Android
- Benoit Girard’s Simple Profiling System
- Luke Wagner’s measure profiler
- The ever-venerable jprof
…and so forth. So it seemed like getting interested parties together to talk about One Profiler to Rule Them All would be good. And it worked; we had probably 20 people come to the BoF. Below is my summary/recollection of what we discussed.. All the omissions and/or misrepresentation in the notes are my own; please leave a comment if you felt I left something out or need to describe something better.
What does the ideal profiler look like?
- Low overhead, both when profiling and not
- Collects more-or-less complete call stacks (this is fairly easy everywhere except x86/Linux)
- Built with the browser itself, not an external process or loaded via LD_PRELOAD or similar; this means we can ship it with the browser and diagnose problems in the field
- Pretty pictures for viewing collected data. Who doesn’t love pretty pictures, right?
Bas Schouten also pointed out that it might be much more efficient to just buy suitable profiling technology from a vendor; there was some skepticism that a profiler fulfilling the desiderata existed, though.
Sprinkling annotations all over the tree sounded like a tedious process. Somebody pointed out that Chrome does this, though they only place annotations at “entry points” for modules, so you might have one entry point for layout, one for graphics, etc. etc. That way, given a profile on some random performance bug, you can at least tell who should be exploring the bug further with minimal overhead, since you’re not unwinding and a handful of RAII objects isn’t going to cost much. Granted, this doesn’t do much for the folks who need to dig deeper, but perhaps we can have other tools for that.
There was some discussion of unwinding instead of annotations. Unwinding is reasonably cheap when using libunwind and caching decoding of the unwind information; it’s even cheaper when you can just walk frame pointers. The only wrinkle is you aren’t guaranteed to have frame pointers or unwind information on x86/Linux, so unwinding is not generally doable there. Sometimes assembly coders also forget they need to insert unwind annotations, though most if not all of the problematic code in glibc, at least, has been so annotated. Taras Glek suggested that we could insert RAII objects before calling out to code that we don’t control and to make those objects record something about frame/stack pointers so that we could unwind around third-party code if necessary. I don’t believe we came to a consensus on using unwinding instead of or in addition to annotations.
We didn’t talk about displaying the collected data. Drawing pretty, understandable pictures is hard.