The Google Chrome team recently extended their V8 benchmark suite with five new benchmarks, and renamed it “Octane“. The descriptions page says the following about the new benchmarks.
I haven’t looked closely at these benchmarks, but the descriptions are very promising. Hennessy and Patterson’s classic Computer Architecture lists the following five categories of benchmarks, from best to worst.
- Real applications.
- Modified applications (e.g. with I/O removed to make it CPU-bound).
- Kernels (key fragments of real applications).
- Toy benchmarks (e.g. sieve of Erastosthenes).
- Synthetic benchmarks (code created artificially to fit a profile of particular operations, e.g. Dhrystone).
I’m not saying these benchmarks are perfect — for example, there’s arguably too much focus on games, and the use of the proprietary Mandreel instead of the open source Emscripten is unfortunate — but they certainly pass the initial “sniff test”. [Update: Alon Zakai has written a more detailed critique of the new benchmarks, particularly Mandreel and Box2DWeb.] Much more so, in fact, than the existing eight benchmarks that have been carried over from V8. I’ve listed their descriptions below; my annotations are in square brackets.
- Richards. OS kernel simulation benchmark, originally written in BCPL by Martin Richards (539 lines). [BCPL is a programming language that predates C. Martin Richards is a former colleague of mine, and I remember him saying in 2004 that its main use these days is running the control systems for some ancient car factories in South America!]
- Deltablue. One-way constraint solver, originally written in Smalltalk by John Maloney and Mario Wolczko (880 lines). [Ported SmallTalk code?!]
- Raytrace. Ray tracer benchmark based on code by Adam Burmister (904 lines).
- Regexp. Regular expression benchmark generated by extracting regular expression operations from 50 of the most popular web pages (1761 lines). [A kernel, but at least it comes from real websites. However, the results of the regexp invocations are not used which makes it easy to game.]
- NavierStokes. 2D NavierStokes equations solver, heavily manipulates double precision arrays. Based on Oliver Hunt’s code (387 lines). [This was added to V8 only a few months ago.]
- Crypto. Encryption and decryption benchmark based on code by Tom Wu (1698 lines).
- Splay. Data manipulation benchmark that deals with splay trees and exercises the automatic memory management subsystem (394 lines). [The data inserted into this splay tree is completely synthetic, which greatly limits it usefulness as a benchmark.]
These are all much smaller. Also, Regexp is the only one that is clearly based on code commonly run in web browsers.
In fact, these new benchmarks are so much better than the old benchmarks that I wish the Google Chrome team had instead released them as a separate benchmark suite. That would have allowed the old benchmarks to gradually move into a well-earned retirement (along with the equally venerable and flawed SunSpider). I guess there’s nothing stopping people from effectively doing this by running just the new benchmarks. Perhaps this could be called the “Octane minus V8” benchmark suite…