Categories
Firefox Performance Uncategorized

Faster JavaScript parsing

Over the past year or so I’ve almost doubled the speed of SpiderMonkey’s JavaScript parser.  I did this by speeding up both the scanner (bug 564369, bug 584595, bug 588648, bug 639420, bug 645598) and the parser (bug 637549).  I used patch stacks in several of those bugs, and so in those six bugs I actually landed 28 changesets.

Notable things about scanning JavaScript code

Before I explain the changes I made, it will help to explain a few notable things about scanning JavaScript code.

  • JavaScript is encoded using UCS-2.  This means that each character is 16 bits.
  • There are several character sequences that indicate the end of a line (EOL): ‘\n’, ‘\r’, ‘\r\n’, \u2028 (a.k.a. LINE_SEPARATOR), and \u2029 (a.k.a. PARA_SEPARATOR).  Note that ‘\r\n’ is treated as a single character.
  • JavaScript code is often minified, and the characteristics of minified and non-minified code are quite different.  The most important difference is that minified code has much less whitespace.

Scanning improvements

Before I made any changes, there were two different modes in which the scanner could operate.  In the first mode, the entire character stream to be scanned was already in memory.  In the second, the scanner read the characters from a file in chunks a few thousand chars long.  Firefox always uses the first mode (except in the rare case where the platform doesn’t support mmap or an equivalent function), but the JavaScript shell used the second.  Supporting the second made made things complicated in two ways.

  • It was possible for an ‘\r\n’ EOL sequence to be split across two chunks, which required some extra checking code.
  • The scanner often needs to unget chars (up to six chars, due to the lookahead required for \uXXXX sequences), and it couldn’t unget chars across a chunk boundary.  This meant that it used a six-char unget buffer.  Every time a char was ungotten, it would be copied into this buffer.  As a consequence, every time it had to get a char, it first had to look in the unget buffer to see if there was one or more chars that had been previously ungotten.  This was an extra check (and a data-dependent and thus unpredictable check).

The first complication was easy to avoid by only reading N-1 chars into the chunk buffer, and only reading the Nth char in the ‘\r\n’ case.  But the second complication was harder to avoid with that design.  Instead, I just got rid of the second mode of operation;  if the JavaScript engine needs to read from file, it now reads the whole file into memory and then scans it via the first mode.  This can result in more memory being used but it only affects the shell, not the browser, so it was an acceptable change.  This allowed the unget buffer to be completely removed;  when a character is ungotten now the scanner just moves back one char in the char buffer being scanned.

Another improvement was that in the old code, there was an explicit EOL normalization step.  As each char was gotten from the memory buffer, the scanner would check if it was an EOL sequence;  if so it would change it to ‘\n’, if not, it would leave it unchanged.  Then it would copy this normalized char into another buffer, and scanning would proceed from this buffer.  (The way this copying worked was strange and confusing, to make things worse.)  I changed it so that getChar() would do the normalization without requiring the copy into the second buffer.

The scanner has to detect EOL sequences in order to know which line it is on.  At first glance, this requires checking every char to see if it’s an EOL, and the scanner uses a small look-up table to make this fast.  However, it turns out that you don’t have to check every char.  For example, once you know that you’re scanning an identifier, you know that if you hit an EOL sequence you’ll immediately unget it, because that marks the end of the identifier.  And when you unget that char you’ll undo the line update that you did when you hit the EOL.  This same logic applies in other situations (eg. parsing a number).  So I added a function getCharIgnoreEOL() that doesn’t do the EOL check.  It has to always be paired with ungetCharIgnoreEOL() and requires some care as to where it’s used, but it avoids the EOL check on more than half the scanned chars.

As well as detecting where each token starts and ends, for a lot of token kinds the scanner has to compute a value.  For example, after scanning the character sequence ” 123 ” it has to convert that to the number 123.  The old scanner would copy the chars into a temporary buffer before calling the function that did the conversion.  This was unnecessary — the conversion function didn’t even require NULL-terminated strings because it gets passed the length of the string being converted!  Also, the old scanner was using js_strtod() to do the number conversion.  js_strtod() can convert both integers and fractional numbers, but its quite slow and overkill for integers.  And when scanning, even before converting the string to a number, we know if the number we just scanned was an integer or not (by remembering if we saw a ‘.’ or exponent).  So now the scanner instead calls GetPrefixInteger() which is much faster.  Several of the tests in Kraken involve huge arrays of integers, and this made a big difference to them.

There’s a similar story with identifiers, but with an added complication.  Identifiers can contain \uXXXX chars, and these need to be normalized before we do more with the string inside SpiderMonkey.  So the scanner now remembers whether a \uXXXX char has occurred in an identifier.  If not, it can work directly (temporarily) with the copy of the string inside the char buffer.  Otherwise, the scanner will rescan the identifier, normalizing and copying it into a new buffer.  I.e. the scanner de-optimizes the (very) rare case in order to avoid the copying in the common case.

JavaScript supports decimal, hexadecimal and octal numbers.  The number-scanning code handled all three kinds in the same loop, which meant that it checked the radix every time it scanned another number char.  So I split this into three parts, which make it both faster and easier to read.

Although JavaScript chars are 16 bits, the vast majority of chars you see are in the first 128 chars.  This is true even for code written in non-Latin scripts, because of all the keywords (e.g. ‘var’) and operators (e.g. ‘+’) and punctuation (e.g. ‘;’).  So it’s worth optimizing for those.  The main scanning loop (in getTokenInternal()) now first checks every char to see if its value is greater than 128.  If so, it handles it in a side-path (the only legitimate such chars are whitespace, EOL or identifier chars, so that side-path is quite small).  The rest of getTokenInternal() can then assume that it’s a sub-128 char.  This meant I could be quite aggressive with look-up tables, because having lots of 128-entry look-up tables is fine, but having lots of 65,536-entry look-up tables would not be.  One particularly important look-up table is called firstCharKinds;  it tells you what kind of token you will have based on the first non-whitespace char in it.  For example, if the first char is a letter, it will be an identifier or keyword;  if the first char is a ‘0’ it will be a number;  and so on.

Another important look-up table is called oneCharTokens.  There are a handful of tokens that are one-char long, cannot form a valid prefix of another token, and don’t require any additional special handling:  ;,?[]{}().  These account for 35–45% of all tokens seen in real code!  The scanner can detect them immediately and use another look-up table to convert the token char to the internal token kind without any further tests.  After that, the rough order of frequency for different token kinds is as follows:  identifiers/keywords, ‘.’, ‘=’, strings, decimal numbers, ‘:’, ‘+’, hex/octal numbers, and then everything else.  The scanner now looks for these token kinds in that order.

That’s just a few of the improvements, there were lots of other little clean-ups.  While writing this post I looked at the old scanning code, as it was before I started changing it.  It was awful, it’s really hard to see what was happening;  getChar() was 150 lines long because it included code for reading the next chunk from file (if necessary) and also normalizing EOLs.

In comparison, as well as being much faster, the new code is much easier to read, and much more DFA-like.  It’s worth taking a look at getTokenInternal() in jsscan.cpp.

Parsing improvements

The parsing improvements were all related to the parsing of expressions.  When the parser parses an expression like “3” it needs to look for any following operators, such as “+”.  And there are roughly a dozen levels of operator precedence.  The way the parser did this was to get the next token, check if it matched any of the operators of a particular precedence, and then unget the token if it didn’t match.  It would then repeat these steps for the next precedence level, and so on.  So if there was no operator after the “3”, the parser would have gotten and ungotten the next token a dozen times!  Ungetting and regetting tokens is fast, because there’s a buffer used (i.e. you don’t rescan the token char by char) but it was still a bottleneck.  I changed it so that the sub-expression parsers were expected to parse one token past the end of the expression, instead of zero tokesn past the end.  This meant that the repeated getting/ungetting could be avoided.

These operator parsers are also very small.  I inlined them more aggressively, which also helped quite a bit.

Results

I had some timing results but now I can’t find them.  But I know that the overall speed-up from my changes was about 1.8x on Chris Leary’s parsemark suite, which takes code from lots of real sites, and the variation in parsing times for different codebases tends not to vary that much.

Many real websites, e.g. gmail, have MB of JS code, and this speed-up will probably save one or two tenths of a second when they load.  Not something you’d notice, but certainly something that’ll add up over time and help make the browser feel snappier.

Tools

I used Cachegrind to drive most of these changes.  It has two features that were crucial.

First, it does event-based profiling, i.e. it counts instructions, memory accesses, etc, rather than time.  When making a lot of very small improvements, noise variations often swamp the effects of the improvements, so being able to see that instruction counts are going down by 0.2% here, 0.3% there, is very helpful.

Second, it gives counts of these events for individual lines of source code.  This was particularly important for getTokenInternal(), which is the main scanning function and has around 700 lines;  function-level stats wouldn’t have been enough.

Categories
Memory consumption MemShrink

MemShrink progress, week 2

Here are some potted highlights from the second week of MemShrink.

  • The beauty of Gregor Wagner’s time-based GC trigger, which I mentioned last week, became ever evident.  It’s been confirmed that nine other bug reports are fixed by this one change.  Great stuff!  See this comment and subsequent comments for an analysis.  Unfortunately, the release drivers decided it wasn’t suitable for back-porting to Firefox 6.  Users who leave their browser open overnight should find Firefox 7 more to their liking, once it’s available.
  • I landed lazy initialization of TraceMonitors and JaegerCompartments, something I wrote about previously.  This gave a 2.89% reduction in the Trace Malloc MaxHeap measurement on CentOS (x86_64) release 5, and probably similar reductions on other platforms.
  • Justin Lebar added hard and soft page fault counts to about:memory.  This makes it clear if Firefox is paging badly.
  • Dão Gottwald fixed a leak in the iQ library used by Panorama.  This bug could cause entire tabs to be leaked in some circumstances.
  • Blake Kaplan avoided the creation of some sandboxes.  This seems to have reduced the number of JavaScript compartments required for techcrunch.com.

There’s less than a week until the development period for Firefox 7 ends.  Hopefully we’ll see some more good MemShrink fixes before that happens.

Categories
Firefox

Firefox 6 will be released in six seven weeks

Firefox’s new rapid release cycle has received a lot of attention since Firefox 5 was released.  I won’t add to the discussion of its pros and cons.

But I will note that we’re now on a 6-week release cycle.  That means Firefox 6 should be released in early August, and Firefox 9 should be released before the end of the year.  I point this out because an awful lot of people think we’re on a 3-month release cycle.  I can think of two reasons for this misconception: (a) early discussion of the rapid release cycle said it would be 3 months long, if I remember correctly, and (b) the gap between Firefox 4 and Firefox 5 was 3 months.

This 6-week cycle is something that we need to publicize well, to avoid further charges of poor communication.  I don’t want to read 1000 comments saying “3 months between FF4 and FF5, 6 weeks between FF5 and FF6, is FF7 coming in 3 weeks!?!! ZOMG hasn’t Mozilla heard of Zeno’s Paradox? LOLWUT”.

Update: Mike Hommey has kindly corrected me;  there will be 8 weeks between FF5 and FF6.  There will then be 6 weeks between all subsequent releases.  The 12 week gap between FF4 and FF5 and the 8-week gap between FF5 and FF6 were artifacts of the timeline used to get the rapid release process up to speed.

Categories
MemShrink

MemShrink progress, week 1

We had our first MemShrink meeting a week ago.  Here’s some progress that’s been made since then.  Please note that this is a list of things that caught my eye, and isn’t meant to be exhaustive in any way.

Two big improvements were made to garbage collection trigger heuristics.

  • Gregor Wagner added a time-based trigger to the garbage collector.  Most notably, this should prevent slow build-ups of garbage when the browser is left idle but an open website is doing a small amount of continual work (eg. updates on a news site).  We’ve had a lot of reports about this problem, so this will hopefully be a big win. Thanks, Gregor!  Note that if the browser is truly idle it won’t trigger, so it shouldn’t cause power problems with Firefox mobile.  See this comment for full details of the new trigger.
  • Luke Wagner (no relation to Gregor!) fixed a gap in the garbage collector’s allocation-based triggers — certain string allocations weren’t being counted by the garbage collector, which meant it allowed them to build up excessively.  This fix particularly helps with complex regular expression operations.

Two improvements were made to the coverage of about:memory, thus reducing the amount of memory falling into the “heap-unclassified” bucket.

  • I added three new memory reporters: explicit/js/scripts, explicit/js/object-slots, explicit/js/string-chars.  When I open Gmail, these three together account for 11% of the explicit allocations.
  • Kyle Huey added a new memory reporter called xpti-working-set, which measures memory used by the XPCOM typelib system.  This is usually a bit over 1MB of memory.

Justin Lebar made progress on measuring page fault counts and using them to respond when memory pressure is high.

Finally, I made progress on reducing unnecessary compartment overhead, which I described previously.

Categories
about:memory JägerMonkey Memory consumption Tracemonkey

You make what you measure

Inspired by Shaver’s patch, I implemented some per-compartment stats in about:memory.  I then visited TechCrunch because I know it stresses the JS engine.  Wow, there are over 70 compartments!  There are 20 stories on the front page.  Every story has a Facebook “like” button, a Facebook “send” button, and a Google “+1” button, and every button gets its own compartment.

That sounds like a bug, but it’s probably not.  Nonetheless, every one of those buttons had an entry in about:memory that looked like this:

Old compartment measurements

(The ‘object-slots’ and ‘scripts’ and ‘string-chars’ measurements are also new, courtesy of bug 571249.)

Ugh, 255,099 bytes for a compartment that has only 971 bytes (i.e. not much) of scripts?  Even worse, this is actually an underestimate because there is another 68KB of tjit-data memory that isn’t being measured for each compartment.  That gives a total of about 323KB per compartment.  And it turns out that no JIT compilation is happening for these compartments, so all that tjit-data and mjit-code space is allocated but unused.

Fortunately, it’s not hard to avoid this wasted space, by lazily initializing each compartment’s TraceMonitor and JaegerCompartment.  With that done, the entry in about:memory looks like this:

New compartment measurements

That’s an easy memory saving of over 20MB for a single webpage.  The per-compartment memory reporters haven’t landed yet, and may not even land in their current form, but they’ve already demonstrated their worth.  You make what you measure.

Categories
Bugzilla Gmail

Gmail and Bugzilla: an update

I recently wrote about Gmail’s threading not handling Bugzilla emails nicely.  Thanks to a patch from Byron Jones, bugzilla.mozilla.org now lets you set an option so that Gmail’s threading will work nicely with Bugzilla.  Just go to “User Preferences”, and in the “General Preferences” tab, enable “Make email subjects compatible with Gmail’s threading.”  Thanks, Byron!

Categories
MemShrink

MemShrink meetings are go: Tuesday 1pm (Pacific time)

Exactly three months ago I wrote about a new project called MemShrink, the aim of which was to reduce Firefox’s memory usage.  Thanks to Johnny Stenback we will now have weekly MemShrink meetings in which bug reports will be discussed, triaged and assigned, and anything else relevant (see this wiki page) will be discussed.  These meetings will take place every week at Tuesday 1pm, Pacific time;  the first one will be on June 14.

This is great news!  MemShrink was never going to be more than a niche effort without weekly meetings.  To give you an idea of how important I think they are, I’ll be attending them even though it’ll be 6am Wednesday in my timezone.  So come along or dial-in.  Johnny will post dial-in instructions to the dev-platform list/newsgroup some time before the meeting.

Categories
about:memory Correctness Firefox SQLite

Asynchronous database connections must be closed with asyncClose()

TL;DR:  if you’re familiar with any Mozilla (C++ or JS) code that opens an async database connection, please go and check that it calls asyncClose() to close the connection;  not doing so can lead to crashes and/or memory leaks.

Specifically, in Firefox 6, when such a connection is destroyed (because it’s explicitly deallocated or its refcount drops to zero in C++, or it’s garbage collected in JS) it’ll currently cause sporadic crashes in about:memory or telemetry code or Test Pilot code.  This is because memory reporters end up pointing to connections that have been freed, and so when they are used they end up (innocently) accessing memory they shouldn’t.

I’ve opened bug 662989 to avoid the crashes, but even once it’s implemented, I think that if you fail to call asyncClose() it will still cause a memory leak, because sqlite3_close() is never called on the connection and so SQLite won’t free up memory used by the connection.  Also, connections that needlessly remain open can have active locks that can starve out other connections or cause extreme growth of the write-ahead-log.

As I write this, thanks to Marco Bonardo, I’m aware of three places in JS code that fail to call asyncClose() when they should:

  • browser/components/preferences/aboutPermissions.js.  Bug 654573 covers this, I’m about to land a patch to fix it.
  • toolkit/components/contentprefs/nsContentPrefService.js.  Bug 662986 covers this.
  • browser/app/profile/extensions/testpilot@labs.mozilla.com/modules/experiment_data_store.js.  Bug 662985 covers this.  Note that Test Pilot seems to fail to use asyncClose() when it should, and it also uses memory reporters.  So it’s unclear which of these two things is responsible for the Test Pilot crashes of this sort seen so far;  in other words, Test Pilot may be a culprit or an innocent bystander, or both.

These were found via a crude MXR search.  I haven’t looked for C++ code that makes this mistake.

If you only do synchronous transactions over your database connection, the connection will be cleaned up properly, even if you don’t call close(), when it is deallocated or garbage collected.  However, it’s better to close the connection as soon as you can so that the resources (memory, locks) are freed immediately.

See this MDC page for more details about database connections and the relevant APIs.  It appears that these APIs are somewhat error-prone.  As it happens, an NS_ENSURE_TRUE macro will fail when an async connection is destroyed without having been asyncClose’d first, and this causes a warning message to be printed to stderr.  Unfortunately, stderr is spammed with all sorts of warnings (something that was discussed in the comments of a previous post of mine), and so this warning gets lost among the noise.  Addressing this problem is something I’ll write about shortly.

Thanks to Andrew Sutherland and Marco Bonardo for helping me greatly with the bugs mentioned above.  Any errors in this post are mine!

UPDATE: I did some investigation and found that about:permissions leaked about 180KB of memory on my machine every time it failed to asyncClose a DB connection.   This showed up under the explicit/storage/sqlite/other reporter in about:memory.

Categories
Bugzilla Gmail

Gmail and Bugzilla

I use Gmail.  I like the way it manages conversations, with one exception.  New bugs reported in Bugzilla cause an email with a title like “New: Fix the bug” to be sent.  All subsequent changes to that bug cause an email with a title like “Fix the bug” to be sent.  So that first email doesn’t get included in the same conversation as the rest of the emails because the subject line has the “New: ” at the start.

This is annoying.  Does anyone know how to fix it, either on the Gmail side or the Bugzilla side?

Categories
Correctness Firefox

What is the point of non-fatal assertions?

When you run a debug build of Firefox you get a lot of stuff printed to stderr.  Some of it looks like this:

 ++DOCSHELL 0x7f53d836b800 == 3
 ++DOMWINDOW == 5 (0x7f53d836c078) [serial = 5] [outer = (nil)]

and I imagine it’s occasionally useful.

You also see a lot of warnings like these, which I just saw (with duplicates removed) when I just started up Firefox, loaded www.mozilla.com, and then exited:

 WARNING: 1 sort operation has occurred for the SQL statement '0x7f53e3d32208'.  See https://developer.mozilla.org/En/Storage/Warnings details.: file /home/njn/moz/mc2/storage/src/mozStoragePrivateHelpers.cpp, line 144
 WARNING: GetDefaultCharsetForLocale: need to add multi locale support: file /home/njn/moz/mc2/intl/locale/src/unix/nsUNIXCharset.cpp, line 146
 WARNING: Ignoring duplicate observer.: file /home/njn/moz/mc2/modules/libpref/src/nsPrefBranch.cpp, line 594
 WARNING: not an nsIRDFRemoteDataSource: 'remote != nsnull', file /home/njn/moz/mc2/rdf/datasource/src/nsLocalStore.cpp, line 312
 WARNING: NS_ENSURE_SUCCESS(rv, 0) failed with result 0x8000FFFF: file /home/njn/moz/mc2/content/base/src/nsContentUtils.cpp, line 2527
 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80040111: file /home/njn/moz/mc2/content/base/src/nsFrameLoader.cpp, line 415
 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8053000C: file /home/njn/moz/mc2/content/base/src/nsGenericElement.cpp, line 5484
 WARNING: NS_ENSURE_TRUE(aURI) failed: file /home/njn/moz/mc2/content/base/src/nsContentUtils.cpp, line 4706
 WARNING: NS_ENSURE_TRUE(!(mAsyncExecutionThread)) failed: file /home/njn/moz/mc2/storage/src/mozStorageConnection.cpp, line 784
 WARNING: NS_ENSURE_TRUE(pusher.Push(aBoundElement)) failed: file /home/njn/moz/mc2/content/xbl/src/nsXBLProtoImplMethod.cpp, line 327
 WARNING: nsExceptionService ignoring thread destruction after shutdown: file /home/njn/moz/mc2/xpcom/base/nsExceptionService.cpp, line 197
 WARNING: SQLite returned error code 1 , Storage will convert it to NS_ERROR_FAILURE: file /home/njn/moz/mc2/storage/src/mozStoragePrivateHelpers.cpp, line 113
 WARNING: Subdocument container has no content: file /home/njn/moz/mc2/layout/base/nsDocumentViewer.cpp, line 2398
 WARNING: Subdocument container has no frame: file /home/njn/moz/mc2/layout/base/nsDocumentViewer.cpp, line 2418
 WARNING: Subdocument container has no frame: file /home/njn/moz/mc2/layout/base/nsDocumentViewer.cpp, line 2418

The NS_ENSURE_SUCCESS and NS_ENSURE_TRUE ones in particular look like assertion failures.  I’ve heard before that Firefox assertions (outside the JS engine) are non-fatal.  I was wondering about this last week and had planned to ask someone about it in more detail.

But before I did that, I spent several hours yesterday debugging the crash in bug 654573.  Turns out the problem is that this assertion in mozStorageConnection.cpp fails:

 NS_ENSURE_FALSE(mAsyncExecutionThread, NS_ERROR_UNEXPECTED);

This causes this warning to be printed out at run-time:

 WARNING: NS_ENSURE_TRUE(!(mAsyncExecutionThread)) failed: file /home/njn/moz/mc1/storage/src/mozStorageConnection.cpp, line 799

Oh, and just to make things really complicated, here’s the definition of NS_ENSURE_FALSE, from nsDebug.h:

 #define NS_ENSURE_TRUE(x, ret)                                \
   PR_BEGIN_MACRO                                              \
     if (NS_UNLIKELY(!(x))) {                                  \
        NS_WARNING("NS_ENSURE_TRUE(" #x ") failed");           \
        return ret;                                            \
     }                                                         \
   PR_END_MACRO
 #define NS_ENSURE_FALSE(x, ret)
   NS_ENSURE_TRUE(!(x), ret)

Oh look;  if the condition fails it not only prints out a warning message, it also does an early return.  And that’s exactly what was causing the crash, because some code that followed the assertion wasn’t run, which meant that a database connection wasn’t closed properly, which caused a use-after-free error.

If this assertion had been fatal, I wouldn’t have spent several hours yesterday debugging this crash, because there wouldn’t have been a crash;  there would have been an assertion failure, and the problem would have been immediately obvious.  (As it happens, I eventually worked out the problem when I noticed that warning message among the pages of output that Firefox produces.)  If the assertion had been non-fatal but did not have the early return, there also probably wouldn’t have been a crash, but it’s hard to know, because once an assertion fails you’re pretty much in no-man’s land.

So, in summary, here are my thoughts about assertions:

  • Non-fatal assertions are close to useless.  They don’t get fixed.  The warnings just get lost among the noise of all the messages that go to stderr.
  • Non-fatal assertions that return early are worse than useless.  Especially since the early-return behaviour is entirely non-obvious from their names.

One suggestion I heard on IRC yesterday was that there just aren’t enough developers in general to fix all the assertion failures if we were to make them fatal.  But in contrast, the JS engine is able to use fatal assertions because it has a higher developer-to-code ratio than the rest of Firefox.  (This makes me also think of compiler warnings, where there’s a similar disparity between the JS engine and the rest of Firefox.)

Because this is Mozilla, I’m sure there is a ton of history behind this current situation that I’m unaware of.  I’d love to know what that is, and why we let obvious defects — because an assertion failure unambiguously indicates there’s a defect, be it in the code or the assertion itself — remain.  Thanks!