Benchmarks

Octane minus V8

The Google Chrome team recently extended their V8 benchmark suite with five new benchmarks, and renamed it “Octane“. The descriptions page says the following about the new benchmarks.

pdf.js. Mozilla’s PDF Reader implemented in JavaScript. It measures decoding and interpretation time (33,056 lines).
Mandreel. Runs the 3D Bullet Physics Engine ported from C++ to JavaScript via Mandreel (277,377 lines).
GB Emulator. Emulate the portable console’s architecture and runs a demanding 3D simulation, all in JavaScript (11,097 lines).
Code loading. Measures how quickly a JavaScript engine can start executing code after loading a large JavaScript program, social widget being a common example. The source for test is derived from open source libraries (Closure, jQuery) (1,530 lines).
Box2DWeb. Based on Box2DWeb, the popular 2D physics engine originally written by Erin Catto, ported to JavaScript. (560 lines, 9000+ de-minified).

I haven’t looked closely at these benchmarks, but the descriptions are very promising. Hennessy and Patterson’s classic Computer Architecture lists the following five categories of benchmarks, from best to worst.

Real applications.
Modified applications (e.g. with I/O removed to make it CPU-bound).
Kernels (key fragments of real applications).
Toy benchmarks (e.g. sieve of Erastosthenes).
Synthetic benchmarks (code created artificially to fit a profile of particular operations, e.g. Dhrystone).

Four of the five new Octane benchmarks are category 1 or perhaps 2 (some have minor modifications to make them benchmarkable). “Code loading” is the only exception; it sounds like a kernel. Furthermore, most of these benchmarks are large (look at those line counts!) and represent cutting-edge JavaScript code that real websites and browsers are using today (pdf.js! jQuery! Game engines!)

I’m not saying these benchmarks are perfect — for example, there’s arguably too much focus on games, and the use of the proprietary Mandreel instead of the open source Emscripten is unfortunate — but they certainly pass the initial “sniff test”. [Update: Alon Zakai has written a more detailed critique of the new benchmarks, particularly Mandreel and Box2DWeb.] Much more so, in fact, than the existing eight benchmarks that have been carried over from V8. I’ve listed their descriptions below; my annotations are in square brackets.

Richards. OS kernel simulation benchmark, originally written in BCPL by Martin Richards (539 lines). [BCPL is a programming language that predates C. Martin Richards is a former colleague of mine, and I remember him saying in 2004 that its main use these days is running the control systems for some ancient car factories in South America!]
Deltablue. One-way constraint solver, originally written in Smalltalk by John Maloney and Mario Wolczko (880 lines). [Ported SmallTalk code?!]
Raytrace. Ray tracer benchmark based on code by Adam Burmister (904 lines).
Regexp. Regular expression benchmark generated by extracting regular expression operations from 50 of the most popular web pages (1761 lines). [A kernel, but at least it comes from real websites. However, the results of the regexp invocations are not used which makes it easy to game.]
NavierStokes. 2D NavierStokes equations solver, heavily manipulates double precision arrays. Based on Oliver Hunt’s code (387 lines). [This was added to V8 only a few months ago.]
Crypto. Encryption and decryption benchmark based on code by Tom Wu (1698 lines).
Splay. Data manipulation benchmark that deals with splay trees and exercises the automatic memory management subsystem (394 lines). [The data inserted into this splay tree is completely synthetic, which greatly limits it usefulness as a benchmark.]
EarleyBoyer. Classic Scheme benchmarks, translated to JavaScript by Florian Loitsch’s Scheme2Js compiler (4684 lines). [“Classic” here means “old”. Also, auto-compiled Scheme code?!]

These are all much smaller. Also, Regexp is the only one that is clearly based on code commonly run in web browsers.

In fact, these new benchmarks are so much better than the old benchmarks that I wish the Google Chrome team had instead released them as a separate benchmark suite. That would have allowed the old benchmarks to gradually move into a well-earned retirement (along with the equally venerable and flawed SunSpider). I guess there’s nothing stopping people from effectively doing this by running just the new benchmarks. Perhaps this could be called the “Octane minus V8” benchmark suite…

8 replies on “Octane minus V8”

Actually, I happened to run that crypto code quite extensively a few days ago – and was surprised that Chrome performed so much better than Firefox here. Now I know the reason 🙂

So, can we hace Octane minus V8 as a standalone item on AWFY?

This is all very interesting.
But if you please allow me to rant a little, IMO a problem about all of these benchmarks is that they tend to measure “number-grinding” capabilities of the javascript engine. I have personally found when developing “real-world” websites that more often than not, what is slow when making interactive websites is almost never the javascript itself, but the render engine reacting to objects moving, changing styles, etc. For instance jQuery Resizable tends to be sluggish in firefox and fast in chrome in a website I develop (sorry I don’t have a public test case I can share), but I am pretty sure from profiling that the speed of the javascript engine is negligible in both cases, what makes the big difference is how quickly the layout engine can update itself. So while I agree a fast javascript engine is important, I think there is too much focus put on that : javascript engines are already quite fast and many real-world gains could be obtained by optimising the render engine itself IMO.

the DOM?

Sure. There are other benchmark suites designed to test other aspects of browser performance. Anyone who judges browser performance as a whole based on one benchmark suite is going to reach poor conclusions.

if V8 and sunspider are somewhat flawed, is kraken or dromaeo better?

Kraken’s a bit better — the benchmarks are newer, and stress the browser more. I know less about Dromaeo, but it tests more than just JS performance. However, I think it includes Sunspider as part of it for testing JS, so that’s not ideal.

@AV: yes, both Kraken and Dromaeo are better!
Dromaeo was praised by the IE team, because it uses large parts of browser, including the DOM testing, which is more important (in some benchmarks) than just the JIT performance.
Kraken is a complementary benchmark that focuses where the JIT improvement should bring, and exposes algorithm kernels that use processing of JS.
In fact V8 was criticized because it exposed most of the V8 JIT (like taking an unfair advantage to have a generational GC) or how the types specialize.
@Nicholas: maybe is not proper to ask: what about Ion? Should be in FF18? As Ion would permit generational GC, it will make our browsers snappier? I ask this as far as I’ve understood: IonMonkey is a “second level” optimizing compiler, so it will have it’s own extra memory usage. About the Generational GC, I think it is great to have one, it will reduce the pauses and it will make them more consistent. So I just wait for it.
As for IonMonkey, I’ve noticed of Brian’s work to make IonMonkey off-thread. Is it any work to make the entire JS computation off-thread!? Because if you run the before-mentioned Dromaeo benchmark, it makes my browser to feel sluggish, but is maybe me.

Thank you for the MemShrink project, is great to have a quality code assurance project inside Firefox. I think there is a huge need for more!

Comments are closed.