OSQ: Dynamic language optimization
Random bits from the second half of yesterday’s OSQ events:
Bill McCloskey gave a quick talk on his experience doing some optimizations to Python. (From a VM optimizer’s point of view, Python and JS are very similar.) He had previously used whole-program type inference to drive a translation to C, and in this way got performance about a factor of 2 or 3 worse than C, which is excellent for a dynamic language. But he said that the inference process was kind of slow, so he wanted to try some lightweight techniques to see how far they would go.
First, he tried something he called “dictionary hints”, in which he used a static analysis to guess the address of the result of a hash table property lookup (as in ‘o.f’). He modified the interpreter to look there first, and only if that fails do the general hash table lookup. That gave him a nice speedup (20%+). IIUC, dictionary hints are the static analogue of property caching, as done in SpiderMonkey (and Nitro and V8, I believe).
Bill also tried replacing Python’s basic value representation of a pointer to the heap with a tagged value that keeps integers inline, thus avoiding the extra memory load and a lot of extra memory allocations. He got a big speedup (3-4x) from that. SpiderMonkey also already uses this technique, as do Nitro and V8.
In summary: he found these rough levels of run time relative to C in practice:
- 2-3x for Python compiled by Dynamite (his type inference-based compiler)
- 4-10x for Python run with psyco (a dynamic specializer somewhat related to TraceMonkey)
- 20x for normal Python
I haven’t looked at JS vs. C lately, but I believe TM is currently on the order of 4-10x vs. C, like pysco. In theory, dynamic compilers should be generate code that runs at least as fast as static compilers, because they have strictly more information. But in practice, static compilers seem almost always to be faster. In this case, Bill’s static compiler can generate very fast code, but I think it takes a few minutes to do the type inference, so it would be beneficial in a dynamic compiler only for a very long-running program. (If you have 2 cores, the VM could start the compiler at the same time the program starts running, and as long as the program runs for several minutes, it would eventually speed up greatly.) But at least on the Web, programs that need even 1 minute of CPU time are still rare. But this may change.