Firefox OS phones on sale in Australia

Firefox OS phones are now on sale in Australia! You can buy a ZTE Open C with Firefox OS 1.3 installed for $99 (AUD) at JB Hi-Fi. (For non-Australian readers: JB Hi-Fi is probably the biggest electronics and home entertainment retailer in Australia.)

Australia’s not the ideal market for the current versions of Firefox OS, being a  country where a large fraction of people already use high-end phones. But it’s nice that they’re easily available :)

You should use WebRTC for your 1-on-1 video meetings

Did you know that Firefox 33 (currently in Beta) lets you make a Skype-like video call directly from one running Firefox instance to another without requiring an account with a central service (such as Skype or Vidyo)?

This feature is built on top of Firefox’s WebRTC support, and it’s kind of amazing.

It’s pretty easy to use: just click on the toolbar button that looks like a phone handset or a speech bubble (which one you see depends which version of Firefox you have) and you’ll be given a URL with a call.mozilla.com domain name. [Update: depending on which beta version you have, you might need to set the loop.enabled preference in about:config, and possibly customize your toolbar to make the handset/bubble icon visible.] Send that URL to somebody else — via email, or IRC, or some other means — and when they visit that URL in Firefox 33 (or later) it will initiate a video call with you.

I’ve started using it for 1-on-1 meetings with other Mozilla employees and it works well. It’s nice to finally have an open source implementation of video calling. Give it a try!

This is not the security blog

Planet Mozilla’s been a little mixed up for the past few days, claiming that I was the author of posts on the Mozilla Security Blog. The good news is that this problems appears to have been fixed, thanks to Mike Hoye.

However, it’s likely that very few people saw the post I made a few days ago about the new per-class measurements in about:memory. So please take a look if you’re interested in that sort of thing. Thanks.

Per-class JS object and shape measurements in Firefox’s about:memory

A few days ago I landed support for per-class reporting of JavaScript objects and shapes in about:memory. (Shapes are auxiliary, engine-internal data structures that are used to facilitate object property accesses. They can use large amounts of memory.)

Prior to this patch, the JavaScript objects and shapes within a single compartment (which corresponds to a JavaScript window or global object) would be covered by measurements in a small number of fixed categories.

10,179,152 B (02.59%) -- objects
├───6,749,600 B (01.72%) -- gc-heap
│   ├──3,512,640 B (00.89%) ── dense-array
│   ├──2,965,184 B (00.75%) ── ordinary
│   └────271,776 B (00.07%) ── function
├───3,429,552 B (00.87%) -- malloc-heap
│   ├──2,377,600 B (00.61%) ── slots
│   └──1,051,952 B (00.27%) ── elements/non-asm.js
└───────────0 B (00.00%) ── non-heap/code/asm.js
474,144 B (00.12%) -- shapes
├──316,832 B (00.08%) -- gc-heap
│  ├──167,320 B (00.04%) -- tree
│  │  ├──152,400 B (00.04%) ── global-parented
│  │  └───14,920 B (00.00%) ── non-global-parented
│  ├──125,352 B (00.03%) ── base
│  └───24,160 B (00.01%) ── dict
└──157,312 B (00.04%) -- malloc-heap
   ├───99,328 B (00.03%) ── compartment-tables
   ├───35,040 B (00.01%) ── tree-tables
   ├───12,704 B (00.00%) ── dict-tables
   └───10,240 B (00.00%) ── tree-shape-kids

These measurements are only interesting to those who understand the guts of the JavaScript engine.

In contrast, objects and shapes are now grouped by their class. Per-class measurements relate back to the JavaScript code in a more obvious way, making these measurements useful to a wider range of people.

10,515,296 B (02.69%) -- classes
├───4,566,840 B (01.17%) ++ class(Array)
├───3,618,464 B (00.93%) ++ class(Object)
├───1,755,232 B (00.45%) ++ class(HTMLDivElement)
├─────333,624 B (00.09%) ++ class(Function)
├─────165,624 B (00.04%) ++ class(<non-notable classes>)
├──────38,736 B (00.01%) ++ class(Window)
└──────36,776 B (00.01%) ++ class(CSS2PropertiesPrototype)

(The <non-notable classes> entry aggregates all classes that are smaller than a certain threshold. This prevents any long tail of classes from bloating about:memory too much.)

Expanding the sub-tree for the Object class, we see that the fixed categories are still present, for those who are interested in them.

3,618,464 B (00.93%) -- class(Object)
├──3,540,672 B (00.91%) -- objects
│  ├──2,349,632 B (00.60%) -- malloc-heap
│  │  ├──2,348,480 B (00.60%) ── slots
│  │  └──────1,152 B (00.00%) ── elements/non-asm.js
│  └──1,191,040 B (00.30%) ── gc-heap
└─────77,792 B (00.02%) -- shapes
      ├──57,376 B (00.01%) -- gc-heap
      │  ├──47,120 B (00.01%) ── tree
      │  ├───5,360 B (00.00%) ── dict
      │  └───4,896 B (00.00%) ── base
      └──20,416 B (00.01%) -- malloc-heap
         ├──11,552 B (00.00%) ── tree-tables
         ├───6,912 B (00.00%) ── tree-kids
         └───1,952 B (00.00%) ── dict-tables

Although the per-class measurements often aren’t surprising — Object and Array objects and shapes often dominate — sometimes they are. Consider the following examples.

  • The above example has 1.7 MiB of HTMLDivElement objects and shapes, which indicates that the compartment contains many div elements.
  • If you have lots of memory used by Function objects and shapes, it suggests that the code is creating excessive numbers of closures.
  • Just this morning a visitor to the #memshrink IRC channel was wondering why they had 11 MiB of XPC_WN_NoMods_NoCall_Proto_JSClass objects and shapes in one compartment. (This is a question I currently don’t have a good answer for.)

Historically, the data-dependent measurements in about:memory — e.g. those done on a per-tab, or per-compartment, or per-image, or per-script basis — have been more useful and interesting than the ones in fixed categories, because they map obviously to browser and code artifacts. For example, per-tab measurements let you know if a particular web page is using excessive memory, and per-compartment measurements revealed the existence of zombie compartments, a kind of bad memory leak that used to be common in Firefox and its add-ons.

I’m hoping that these per-class measurements will prove similarly useful. Keep an eye on them, and please let me know and/or file bugs if you see any surprising cases.

A final note: Mozilla’s devtools team is currently making great progress on a JavaScript memory profiler, which will give finer-grained measurements of JavaScript memory usage in web content. Although there will be some overlap between that tool and these new measurements in about:memory, it will useful to have both tools, because each one will be appropriate in different circumstances.

The story of a tricky bug

The Bug Report

A few weeks ago I skimmed through /r/firefox and saw a post by a user named DeeDee_Z complaining about high memory usage in Firefox. Somebody helpfully suggested that DeeDee_Z look at about:memory, which revealed thousands of blank windows like this:

  │    │  ├────0.15 MB (00.01%) ++ top(about:blank, id=1001)
  │    │  ├────0.15 MB (00.01%) ++ top(about:blank, id=1003)
  │    │  ├────0.15 MB (00.01%) ++ top(about:blank, id=1005

I filed bug 1041808 and asked DeeDee_Z to sign up to Bugzilla so s/he could join the discussion. What followed was several weeks of back and forth, involving suggestions from no fewer than seven Mozilla employees. DeeDee_Z patiently tried numerous diagnostic steps, such as running in safe mode, pasting info from about:support, getting GC/CC logs, and doing a malware scan. (Though s/he did draw the line at running wireshark to detect if any unusual network activity was happening, which I think is fair enough!)

But still there was no progress. Nobody else was able to reproduce the problem, and even DeeDee_Z had trouble making it happen reliably.

And then on August 12, more than three weeks after the bug report was filed, Peter Van der Beken commented that he had seen similar behaviour on his machine, and by adding some logging to Firefox’s guts he had a strong suspicion that it was related to having the “keep until” setting for cookies set to “ask me every time”. DeeDee_Z had the same setting, and quickly confirmed that changing it fixed the problem. Hooray!

I don’t know how Peter found the bug report — maybe he went to file a new bug report about this problem and Bugzilla’s duplicate detection identified the existing bug report — but it’s great that he did. Two days later he landed a simple patch to fix the problem. In Peter’s words:

The patch makes the dialog for allowing/denying cookies actually show up when a cookie is set through the DOM API. Without the patch the dialog is created, but never shown and so it sticks around forever.

This fix is on track to ship in Firefox 34, which is due to be released in late November.

Takeaway lessons

There are a number of takeaway lessons from this story.

First, a determined bug reporter is enormously helpful. I often see vague complaints about Firefox on websites (or even in Bugzilla) with no responses to follow-up questions. In contrast, DeeDee_Z’s initial complaint was reasonably detailed. More importantly, s/he did all the follow-up steps that people asked her/him to do, both on Reddit and in Bugzilla. The about:memory data made it clear it was some kind of window leak, and although the follow-up diagnostic steps didn’t lead to the fix in this case, they did help rule out a number of possibilities. Also, DeeDee_Z was extremely quick to confirm that Peter’s suggestion about the cookie setting fixed the problem, which was very helpful.

Second, many (most?) problems don’t affect everyone. This was quite a nasty problem, but the “ask me every time” setting is not commonly used because causes lots of dialogs to pop up, which few users have the patience to deal with. It’s very common that people have a problem with Firefox (or any other piece of software), incorrectly assume that it affects everyone else equally, and conclude with “I can’t believe anybody uses this thing”. I call this “your experience is not universal“. This is particular true for web browsers, which unfortunately are enormously complicated and have many combinations of settings get little or no testing.

Third, and relatedly, it’s difficult to fix problems that you can’t reproduce. It’s only because Peter could reproduce the problem that he was able to do the logging that led him to the solution.

Fourth, it’s important to file bug reports in Bugzilla. Bugzilla is effectively the Mozilla project’s memory, and it’s monitored by many contributors. The visibility of a bug report in Bugzilla is vastly higher than a random complaint on some other website. If the bug report hadn’t been in Bugzilla, Peter wouldn’t have stumbled across it. So even if he had fixed it, DeeDee_Z wouldn’t have known and probably would have had been stuck with the problem until Firefox 34 came out. That’s assuming s/he didn’t switch to a different browser in the meantime.

Fifth, Mozilla does care about memory usage, particularly cases where memory usage balloons unreasonably. We’ve had a project called MemShrink running for more than three years now. We’ve fixed hundreds of problems, big and small, and continue to do so. Please use about:memory to start the diagnosis, and add the “[MemShrink]” tag to any bug reports in Bugzilla that relate to memory usage, and we will triage them in our fortnightly MemShrink meetings.

Finally, luck plays a part. I don’t often look at /r/firefox, and I could have easily missed DeeDee_Z’s complaint. Also, it was lucky that Peter found the bug in Bugzilla. Many tricky bugs don’t get resolved this quickly.

Some good reading on sexual harassment and being a decent person

Last week I attended a sexual harassment prevention training seminar. This was the first  of several seminars that Mozilla is holding as part of its commendable Diversity and Inclusion Strategy. The content was basically “how to not get sued for sexual harassment in the workplace”. That’s a low bar, but also a reasonable place to start, and the speaker was both informative and entertaining. I’m looking forward to the next seminar on Unconcious Bias and Inclusion, which sounds like it will cover more subtle issues.

With the topic of sexual harassment in mind, I stumbled across a Metafilter discussion from last year about an essay by Genevieve Valentine in which she describes and discusses a number of incidents of sexual harassment that she has experienced throughout her life. I found the essay interesting, but the Metafilter discussion thread even more so. It’s a long thread (594 comments) but mostly high quality. It focuses initially on one kind of harassment that some men perform on public transport, but quickly broadens to be about (a) the full gamut of harassing behaviours that many women face regularly, (b) the responses women make towards these behaviours, and (c) the reactions, both helpful and unhelpful, that people can and do have towards those responses. Examples abound, ranging from the disconcerting to the horrifying.

There are, of course, many other resources on the web where one can learn about such topics. Nonetheless, the many stories that viscerally punctuate this particular thread (and the responses to those stories) helped my understanding of this topic — in particular, how bystanders can intervene when a woman is being harassed — more so than some dryer, more theoretical presentations have. It was well worth my time.

Poor battery life on the Flame device?

The new Firefox OS reference phone is called the Flame. I have one that I’m using as my day-to-day phone, and as soon as I got it I found that the battery life was terrible — I use my phone very lightly, but the battery was draining in only 24 hours or so.

It turns out this was because a kernel daemon called ksmd (kernel samepage merging daemon) was constantly running and using about 3–7% of CPU. I detected this by running the following command which prints CPU usage stats for all running processes every five seconds.

adb shell top -m 5

ksmd doesn’t seem very useful for a device such as the Flame, and Alexandre Lissy kindly built me a new kernel with it disabled, which improved my battery life by at least 3x. Details are in the relevant bug.

It seems that plenty of other Flame users are not having problems with ksmd, but if your Flame’s battery life is poor it would be worth checking if this is the cause.

Dipping my toes in the Servo waters

I’m very interested in Rust and Servo, and have been following their development closely. I wanted to actually do some coding in Rust, so I decided to start making small contributions to Servo.

At this point I have landed two changes in the tree — one to add very basic memory measurements for Linux, and the other for Mac — and I thought it might be interesting for people to hear about the basics of contributing. So here’s a collection of impressions and thoughts in no particular order.

Getting the code and building Servo was amazingly easy. The instructions actually worked first time on both Ubuntu and Mac! Basically it’s just apt-get install (on Ubuntu) or port install (on Mac), git clone, configure, and make. The configure step takes a while because it downloads an appropriate version of Rust, but that’s fine; I was expecting to have to install the appropriate version of Rust first so I was pleasantly surprised.

Once you build it, Servo is very bare-boned. Here’s a screenshot.

Servo

There is no address bar, or menu bar, or chrome of any kind. You simply choose which page you want to display from the command line when you start Servo. The memory profiling I implemented is enabled by using the -m option, which causes memory measurements to be periodically printed to the console.

Programming in Rust is interesting. I’m not the first person to observe that, compared to C++, it takes longer to get your code past the compiler, but it’s more likely to to work once you do. It reminds me a bit of my years programming in Mercury (imagine Haskell, but strict, and with a Prolog ancestry). Discriminated unions, pattern matching, etc. In particular, you have to provide code to handle all the error cases in place. Good stuff, in my opinion.

One thing I didn’t expect but makes sense in hindsight: Servo has seg faulted for me a few times. Rust is memory-safe, and so shouldn’t crash like this. But Servo relies on numerous libraries written in C and/or C++, and that’s where the crashes originated.

The Rust docs are a mixed bag. Libraries are pretty well-documented, but I haven’t seen a general language guide that really leaves me feeling like I understand a decent fraction of the language. (Most recently I read Rust By Example.) This is meant to be an observation rather than a complaint; I know that it’s a pre-1.0 language, and I’m aware that Steve Klabnik is now being paid by Mozilla to actively improve the docs, and I look forward to those improvements.

The spotty documentation isn’t such a problem, though, because the people in the #rust and #servo IRC channels are fantastic. When I learned Python last year I found that simply typing “how to do X in Python” into Google almost always leads to a Stack Overflow page with a good answer. That’s not the case for Rust, because it’s too new, but the IRC channels are almost as good.

The code is hosted on GitHub, and the project uses a typical pull request model. I’m not a particularly big fan of git — to me it feels like a Swiss Army light-sabre with a thousand buttons on the handle, and I’m confident using about ten of those buttons. And I’m also not a fan of major Mozilla projects being hosted on GitHub… but that’s a discussion for another time. Nonetheless, I’m sufficiently used to the GitHub workflow from working on pdf.js that this side of things has been quite straightforward.

Overall, it’s been quite a pleasant experience, and I look forward to gradually helping build up the memory profiling infrastructure over time.

Measuring memory used by third-party code

Firefox’s memory reporting infrastructure, which underlies about:memory, is great. And when it lacks coverage — causing the “heap-unclassified” number to get large — we can use DMD to identify where the unreported allocations are coming from. Using this information, we can extend existing memory reporters or write new ones to cover the missing heap blocks.

But there is one exception: third-party code. Well… some libraries support custom allocators, which is great, because it lets us provide a counting allocator. And if we have a copy of the third-party code within Firefox, we can even use some pre-processor hacks to forcibly provide custom counting allocators for code that doesn’t support them.

But some heap allocations are done by code we have no control over, like OpenGL drivers. For example, after opening a simple WebGL demo on my Linux box, I have over 50% “heap-unclassified”.

208.11 MB (100.0%) -- explicit
├──107.76 MB (51.78%) ── heap-unclassified

DMD’s output makes it clear that the OpenGL drivers are responsible. The following record is indicative.

Unreported: 1 block in stack trace record 2 of 3,597
 15,486,976 bytes (15,482,896 requested / 4,080 slop)
 6.92% of the heap (20.75% cumulative); 10.56% of unreported (31.67% cumulative)
 Allocated at
 replace_malloc (/home/njn/moz/mi8/co64dmd/memory/replace/dmd/../../../../memory/replace/dmd/DMD.cpp:1245) 0x7bf895f1
 _swrast_CreateContext (??:?) 0x3c907f03
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3cd84fa8
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3cd9fa2c
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3cd8b996
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3ce1f790
 ??? (/usr/lib/x86_64-linux-gnu/dri/i965_dri.so) 0x3ce1f935
 glXGetDriverConfig (??:?) 0x3dce1827
 glXDestroyGLXPixmap (??:?) 0x3dcbc213
 glXCreateNewContext (??:?) 0x3dcbc48a
 mozilla::gl::GLContextGLX::CreateGLContext(mozilla::gfx::SurfaceCaps const&, mozilla::gl::GLContextGLX*, bool, _XDisplay*, unsigned long, __GLXFBConfigRec*, bool, gfxXlibSurface*) (/home/njn/moz/mi8/co64dmd/gfx/gl/../../../gfx/gl/GLContextProviderGLX.cpp:783) 0x753c99f4

The bottom-most frame is for a function (CreateGLContext) within Firefox’s codebase, and then control passes to the OpenGL driver, which eventually does a heap allocation, which ends up in DMD’s replace_malloc function.

The following DMD report is a similar case that shows up on Firefox OS.

Unreported: 1 block in stack trace record 1 of 463
 1,454,080 bytes (1,454,080 requested / 0 slop)
 9.75% of the heap (9.75% cumulative); 21.20% of unreported (21.20% cumulative)
 Allocated at
 replace_calloc /Volumes/firefoxos/B2G/gecko/memory/replace/dmd/DMD.cpp:1264 (0xb6f90744 libdmd.so+0x5744)
 os_calloc (0xb25aba16 libgsl.so+0xda16) (no addr2line)
 rb_alloc_primitive_lists (0xb1646ebc libGLESv2_adreno.so+0x74ebc) (no addr2line)
 rb_context_create (0xb16446c6 libGLESv2_adreno.so+0x726c6) (no addr2line)
 gl2_context_create (0xb16216f6 libGLESv2_adreno.so+0x4f6f6) (no addr2line)
 eglCreateClientApiContext (0xb25d3048 libEGL_adreno.so+0x1a048) (no addr2line)
 qeglDrvAPI_eglCreateContext (0xb25c931c libEGL_adreno.so+0x1031c) (no addr2line)
 eglCreateContext (0xb25bfb58 libEGL_adreno.so+0x6b58) (no addr2line)
 eglCreateContext /Volumes/firefoxos/B2G/frameworks/native/opengl/libs/EGL/eglApi.cpp:527 (0xb423dda2 libEGL.so+0xeda2)
 mozilla::gl::GLLibraryEGL::fCreateContext(void*, void*, void*, int const*) /Volumes/firefoxos/B2G/gecko/gfx/gl/GLLibraryEGL.h:180 (discriminator 3) (0xb4e88f4c libxul.so+0x789f4c)

We can’t traverse these allocations in the usual manner to measure them, because we have no idea about the layout of the relevant data structures. And we can’t provide a custom counting allocator to code outside of Firefox’s codebase.

However, although we pass control to the driver, control eventually comes back to the heap allocator, and that is something that we do have some power to change. So I had an idea to toggle some kind of mode that records all the allocations that occur within a section of code, as the following code snippet demonstrates.

SetHeapBlockTagForThread("webgl-create-new-context");
context = glx.xCreateNewContext(display, cfg, LOCAL_GLX_RGBA_TYPE, glxContext, True);
ClearHeapBlockTagForThread();

The calls on either side of glx.xCreateNewContext tell the allocator that it should tag all allocations done within that call. And later on, the relevant memory reporter can ask the allocator how many of these allocations remain and how big they are. I’ve implemented a draft version of this, and it basically works, as the following about:memory output shows.

216.97 MB (100.0%) -- explicit
├───78.50 MB (36.18%) ── webgl-contexts
├───32.37 MB (14.92%) ── heap-unclassified

The implementation is fairly simple.

  • There’s a global hash table which records which live heap blocks have a tag associated with them. (Most heap blocks don’t have a tag, so this table stays small.)
  • When SetHeapBlockTagForThread is called, the given tag is stored in thread-local storage. When ClearHeapBlockTagForThread is called, the tag is cleared.
  • When an allocation happens, we (quickly) check if there’s a tag set for the current thread and if so, put a (pointer,tag) pair into the table. Otherwise, we do nothing extra.
  • When a deallocation happens, we check if the deallocated block is in the table, and remove it if so.
  • To find all the live heap blocks with a particular tag, we simply iterate over the table looking for tag matches. This can be used by a memory reporter.

Unfortunately, the implementation isn’t suitable for landing in Firefox’s code, for several reasons.

  • It uses Mike Hommey’s replace_malloc infrastructure to wrap the default allocator (jemalloc). This works well — DMD does the same thing — but using it requires doing a special build and then setting some environment variables at start-up. This is ok for an occasional-use tool that’s only used by Firefox developers, but it’s important that about:memory works in vanilla builds without any additional effort.
  • Alternatively, I could modify jemalloc directly, but we’re hoping to one day move away from our old, heavily-modified version of jemalloc and start using an unmodified jemalloc3.
  • It may have a non-trivial performance hit. Although I haven’t measured performance yet — the above points are a bigger show-stopper at the moment — I’m worried about having to do a hash table lookup on every deallocation. Alternative implementations (store a marker in each block, or store tagged blocks in their own zone) are possible but present their own difficulties.
  • It can miss some allocations. When adding a tag for a particular section of code, you don’t want to mark every allocation that occurs while that section executes, because there could be multiple threads running and you don’t want to mark allocations from other threads. So it restricts the marking to a single thread, but if the section creates a new thread itself, any allocations done on that new thread will be missed. This might sound unlikely, but my implementation appears to miss some allocations and this is my best theory as to why.

This issue of OpenGL drivers and other kinds of third-party code has been a long-term shortcoming with about:memory. For the first time I have a partial solution, though it still has major problems. I’d love to hear if anyone has additional ideas on how to make it better.

A browser benchmarking manifesto

This post represents my own opinion, not the official position of Mozilla.

I’ve been working on Firefox for over five years and I’ve been unhappy with the state of browser benchmarking the entire time.

The current problems

The existing benchmark suites are bad

Most of the benchmark suites are terrible. Consider at Hennessy and Patterson’s categorization of benchmarks, from best to worst.

  1. Real applications.
  2. Modified applications (e.g. with I/O removed to make it CPU-bound).
  3. Kernels (key fragments of real applications).
  4. Toy benchmarks (e.g. sieve of Erastosthenes).
  5. Synthetic benchmarks (code created artificially to fit a profile of particular operations, e.g. Dhrystone).

In my opinion, benchmark suites should only contain benchmarks in categories 1 and 2. There are certainly shades of grey in these categories, but I personally have a high bar, and I’m likely to consider most programs whose line count is in the hundreds as closer to a “toy benchmark” than a “real application”.

Very few browser benchmark suites contain benchmarks from category 1 or 2. I think six or seven of the benchmarks in Octane meet this standard. I’m having trouble thinking of any other existing benchmark suite that contains any that reach this standard. (This includes Mozilla’s own Kraken suite.)

Bad benchmark suites hurt the web, because every hour that browser vendors spend optimizing bad benchmarks is an hour they don’t spend improving browsers in ways that actually help users. I’ve seen first-hand how much time this wastes.

Conflicts of interest

Some of the benchmark suites — including the best-known ones — come from browser vendors. The conflict of interest is so obvious I won’t bother discussing it further.

Opaque scoring systems

Some of them have opaque scoring systems. Octane (derived from V8bench) is my least favourite here. It also suffers from much higher levels of variance between runs than the benchmarks with time-based scoring systems.

A better way

Here’s how I think a new browser benchmark suite should be created.  Some of these ideas are taken from the SPEC benchmark suiteswhich have been used for over 20 years to benchmark the performance of C, C++, Fortran and Java programs, among other things.

A collaborative and ongoing process

The suite shouldn’t be owned by a single browser vendor. Ideally, all the major browser vendors (Apple, Google, Microsoft, and Mozilla) would be involved. Involvement of experts from outside those companies would also be desirable.

This is a major political and organizational challenge, but it would avoid bias (both in appearance and reality), and would likely lead to a higher-quality suite, because low-quality benchmarks are likely to be shot down.

There should be a public submissions process, where people submit web
apps/sites for consideration.  And obviously there needs to be a selection
process to go with that.

The benchmark suite should be open source and access should be free of
charge.  It should be hosted on a public site not owned by a browser vendor.

There should also be an official process for modifying the suite: modifying benchmarks, removing benchmark, and adding new benchmarks.  This process
shouldn’t be too fluid.  Perhaps allowing changes once per year would be reasonable. The current trend of continually augmenting old suites should be resisted.

High-quality

All benchmarks should be challenging, and based on real website/apps;  they should all belong to category 1 or 2 on the above scale.

Each benchmark should have two or three inputs of different sizes.  Only the
largest input would be used in official runs, but the smaller inputs
would be useful for browser developers for testing and profiling purposes.

All benchmarks should have side-effects of some kind that depend on most or
all of the computation.  There should be no large chunks of computation whose
results are unused, because such benchmarks are subject to silly results due
to certain optimizations such as dead code elimination.

Categorization

Any categorization of benchmarks should not be done by browser subsystems. E.g. there shouldn’t be groups like “DOM”, “CSS”, “hardware acceleration”, “code loading” because that encourages artificial benchmarks that stress a single
subsystem. Instead, any categories should be based on application domain:
“games”, “productivity”, “multimedia”, “social”, etc. If a grouping like
that ends up stressing some browser subsystems more than others, well that’s
fine, because it means that’s what real websites/apps are doing (assuming the
benchmark selection is balanced).

There is one possible exception to this rule, which is the JavaScript engine. At least some of the benchmarks will be dominated by JS execution time, and because JS engines can be built standalone, the relevant JS engine
teams will inevitably want to separate JS code from everything else so that
the code can be run in JS shells. It may be worthwhile to pre-emptively do that, where appropriate. If that were done, you wouldn’t run these JS-only sub-applications as part of an official suite run, but they would be present in the repository.

Scoring

Scoring should be as straightforward as possible. Times are best because their meaning is more obvious than anything else, *especially* when multiple
times are combined. Opaque points systems are awful.

Using times for everything makes most sense when benchmarks are all
throughput-oriented.  If some benchmarks are latency-oriented (or FPS, or
something else) this becomes trickier.  Even still, a simple system that humans can intuitively understand is best.

Conclusion

It’s time the browser world paid attention to the performance and benchmarking lessons that the CPU and C/C++/Fortran worlds learned years ago. It’s time to grow up.

If you want to learn more, I highly recommend getting your hands on a copy of Hennessy & Patterson’s Computer Architecture and reading the section on measuring performance. This is section 1.5 (“Measuring and Reporting Performance”) in the 3rd edition, but I expect later editions have similar sections.