Main menu:

Site search

Categories

Archive

PLDI Sampler, part I

I’m back from PLDI in Dublin. The weather was nice and the Guinness was excellent. The research program was also very good. It seemed like this year there was a lot of variety and a different topic selection from previous years. I’m going to blog some notes on an arbitrary (slanted toward things with more likely Mozilla application) sample of talks.


PetaBricks: A Language and Compiler for Algorithmic Choice

Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, Saman Amarasinghe

PetaBricks aims to help programmers produce well-tuned, possibly parallel, high-performance algorithms. The initial focus has been on dense array algorithms, and also sorting as a basic example. In PetaBricks, the programmer can express directly a choice of algorithms and tuning parameters. For example, a programmer can write a sort procedure that offers a choice between insertion sort, n-way merge sort (with n as a tuning parameter), and quicksort. The PetaBricks compiler will then test the algorithm over a range of input sizes and determine the best algorithmic choices. For example, for sorting on an 8-way Xeon, PetaBricks found that it’s best to do a 2-way merge sort until the sublists are smaller than 1420 elements, then quicksort until the sublists are smaller than 600, and finish with insertion sort. Programmers can do this sort of thing by hand, but it’s a ton of work, and has to be redone for every platform.

We at Mozilla have already done a bit of autotuning of TraceMonkey’s “magic number” parameters (such as the number of times to attempt to record a certain trace before “blacklisting” it), and Jesse and I have had discussions about autotuning other parameters in Mozilla. The PetaBricks paper probably has other ideas and insights we could apply in this domain. We could also consider using its ideas more directly to tune Mozilla data structures such as hash tables and string representations.

Staged Information Flow for JavaScript
Ravi Chugh, Jeff Meister, Ranjit Jhala, and Sorin Lerner

This work is designed to protect against attacks made by JavaScript added to web pages by third parties. For example, if site A includes a box with advertising content from site B, the content from B could include JavaScript that makes some kind of attack, or maliciously modifies the main content of A. Site A can in theory prevent this kind of attack by providing an information flow policy and then running a static analysis on the entire web page to verify that the policy is obeyed. For example, site A could specify that JS from other sites is not allowed to navigate away by writing to document.location. Then, site A would provide a static analysis (possibly as a Firefox addon) that would analyze the entire page before executing any JS.

The main problem is that such a static analysis would run very slowly on the client machine–there would be a delay of 10 seconds or more, which is clearly unacceptable. Enter the staged analysis: this paper describes how the author of site A can run a static analysis on just the code from site A, which determines a small set of things that have to be checked about site B, the “residual” check. Then, when the user goes to site A and JS is loaded from third parties, the residual check is run on the third-party JS in about 0.1 seconds. This seems potentially useful on the web and is also a great example of how to partition a static analysis into a long-running ahead-of-time analysis and a fast online analysis.

Efficiently and Precisely Locating Memory Leaks and Bloat
Gene Novark, Emery D. Berger, Benjamin G. Zorn

This paper is about, uh, detecting memory leaks and bloat. More specifically, the paper describes how to make a particular leak detection technique, staleness metering, run fast and produce no false positive leak reports.

As background, the “staleness” of a heap-allocated object is simply the length of time since that object was last touched (by reading from it or writing to it). Thus, objects with low staleness are unlikely to be leaks, while objects with high staleness are probably either leaks (if they will never be accessed again) or bloat (if they are just accessed rarely and there is a way to compact them or release them and recompute them later). The main problem with measuring staleness is that doing it exactly would slow down the program by 100x or so. The main solution that was explored previously was to sample a fraction (say, 1/1000) of memory accesses and estimate staleness from that. The main further problem is that if only 1/1000 of accesses are measured, then a lot of objects may appear stale when they are really not, just because sampling missed the accesses.

This paper proposes a new way of doing the sampling, “data-based” sampling, that produces no false positives. The key mechanism is to use the virtual memory system to protect heap pages so there will be a page fault when they are touched. When a page is touched, the current time is recorded as the estimated time of last touching for every object on the page. Thus, this time provides an underestimate of the staleness of every object on the page, and so objects with high estimated staleness must be stale in reality.

Without any additional cleverness, what I just described would be really slow, because every heap access would incur a page fault. The additional cleverness is that the paper uses a heap allocator that groups objects from the same allocation site and of the same approximate allocation time (and thus age) on the same pages. The paper says that objects that have the same allocation site and age tend to have the same access pattern. Given that fact, it makes sense to classify pages adaptively as hot or cold, based on how recently they were touched. Hot pages are left unprotected and not tracked, so most heap accesses don’t page fault. And this shouldn’t cause the detector to miss many leaks, because the allocation strategy means a hot page probably contains mostly hot objects, which aren’t leaks. Conversely, cold pages are protected and so they are tracked closely, but this doesn’t cause many page faults because cold pages aren’t accessed often.

The authors put all this together into a tool called Hound, which they tested on some standard benchmarks. Unfortunately, it doesn’t appear to be distributed online currently. Running a program under Hound is supposed to increase heap usage by only 10% or so, and execution time usually only a few percent (although more in some cases), so it sounds like it would be great to be able to enable Hound for nightly test builds.

Comments

Comment from Leo
Time: July 26, 2009, 12:20 am

I like the PetaBricks projects. HPC folks have been doing this for FFTs etc., so it’s cool to start to see it being packaged up. It was unclear to my why it needed to be language level instead of library level.

The staged information flow work seems representative of a lot of static analysis papers coming out: the analysis doesn’t work precisely enough (or have all the necessary files) because the languages are dynamic, so let’s offshoot a bunch to the runtime. I think my favorite form of this was the work a couple years ago on speeding up python by doing such predictions. There’s an interesting space for doing global loadtime optimizations because of this even in languages like C. Staging (esp. with a range control for precision vs. static guarentees), concurrency support, and adaptivity (e.g., for IDE integration) and maybe parallelization seem to be causing a lot of traditional algorithms to be revisited.

The Hound paper sounds fun. Tainting pages or objects is a very old idea — I’m guessing their contribution is an allocation strategy that is an improvement upon similar benefits from generational collectors?

Comment from Leo
Time: July 26, 2009, 12:25 am

Ah, I meant to point out a fun paper by Michael Bond you might have seen: he annotates runtime objects with a single bit to enable probablistic reasoning of its context (e.g., allocation site, last modification site, whatever you want). I’m wondering if that might help the Hound approach.

Comment from Chiptuning
Time: September 11, 2010, 12:46 pm

nice post