PLDI Sampler, part I
I’m back from PLDI in Dublin. The weather was nice and the Guinness was excellent. The research program was also very good. It seemed like this year there was a lot of variety and a different topic selection from previous years. I’m going to blog some notes on an arbitrary (slanted toward things with more likely Mozilla application) sample of talks.
PetaBricks: A Language and Compiler for Algorithmic Choice
Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, Saman Amarasinghe
PetaBricks aims to help programmers produce well-tuned, possibly parallel, high-performance algorithms. The initial focus has been on dense array algorithms, and also sorting as a basic example. In PetaBricks, the programmer can express directly a choice of algorithms and tuning parameters. For example, a programmer can write a sort procedure that offers a choice between insertion sort, n-way merge sort (with n as a tuning parameter), and quicksort. The PetaBricks compiler will then test the algorithm over a range of input sizes and determine the best algorithmic choices. For example, for sorting on an 8-way Xeon, PetaBricks found that it’s best to do a 2-way merge sort until the sublists are smaller than 1420 elements, then quicksort until the sublists are smaller than 600, and finish with insertion sort. Programmers can do this sort of thing by hand, but it’s a ton of work, and has to be redone for every platform.
We at Mozilla have already done a bit of autotuning of TraceMonkey’s “magic number” parameters (such as the number of times to attempt to record a certain trace before “blacklisting” it), and Jesse and I have had discussions about autotuning other parameters in Mozilla. The PetaBricks paper probably has other ideas and insights we could apply in this domain. We could also consider using its ideas more directly to tune Mozilla data structures such as hash tables and string representations.
Ravi Chugh, Jeff Meister, Ranjit Jhala, and Sorin Lerner
The main problem is that such a static analysis would run very slowly on the client machine–there would be a delay of 10 seconds or more, which is clearly unacceptable. Enter the staged analysis: this paper describes how the author of site A can run a static analysis on just the code from site A, which determines a small set of things that have to be checked about site B, the “residual” check. Then, when the user goes to site A and JS is loaded from third parties, the residual check is run on the third-party JS in about 0.1 seconds. This seems potentially useful on the web and is also a great example of how to partition a static analysis into a long-running ahead-of-time analysis and a fast online analysis.
Efficiently and Precisely Locating Memory Leaks and Bloat
Gene Novark, Emery D. Berger, Benjamin G. Zorn
This paper is about, uh, detecting memory leaks and bloat. More specifically, the paper describes how to make a particular leak detection technique, staleness metering, run fast and produce no false positive leak reports.
As background, the “staleness” of a heap-allocated object is simply the length of time since that object was last touched (by reading from it or writing to it). Thus, objects with low staleness are unlikely to be leaks, while objects with high staleness are probably either leaks (if they will never be accessed again) or bloat (if they are just accessed rarely and there is a way to compact them or release them and recompute them later). The main problem with measuring staleness is that doing it exactly would slow down the program by 100x or so. The main solution that was explored previously was to sample a fraction (say, 1/1000) of memory accesses and estimate staleness from that. The main further problem is that if only 1/1000 of accesses are measured, then a lot of objects may appear stale when they are really not, just because sampling missed the accesses.
This paper proposes a new way of doing the sampling, “data-based” sampling, that produces no false positives. The key mechanism is to use the virtual memory system to protect heap pages so there will be a page fault when they are touched. When a page is touched, the current time is recorded as the estimated time of last touching for every object on the page. Thus, this time provides an underestimate of the staleness of every object on the page, and so objects with high estimated staleness must be stale in reality.
Without any additional cleverness, what I just described would be really slow, because every heap access would incur a page fault. The additional cleverness is that the paper uses a heap allocator that groups objects from the same allocation site and of the same approximate allocation time (and thus age) on the same pages. The paper says that objects that have the same allocation site and age tend to have the same access pattern. Given that fact, it makes sense to classify pages adaptively as hot or cold, based on how recently they were touched. Hot pages are left unprotected and not tracked, so most heap accesses don’t page fault. And this shouldn’t cause the detector to miss many leaks, because the allocation strategy means a hot page probably contains mostly hot objects, which aren’t leaks. Conversely, cold pages are protected and so they are tracked closely, but this doesn’t cause many page faults because cold pages aren’t accessed often.
The authors put all this together into a tool called Hound, which they tested on some standard benchmarks. Unfortunately, it doesn’t appear to be distributed online currently. Running a program under Hound is supposed to increase heap usage by only 10% or so, and execution time usually only a few percent (although more in some cases), so it sounds like it would be great to be able to enable Hound for nightly test builds.