{"id":707,"date":"2011-07-01T16:18:27","date_gmt":"2011-07-01T05:18:27","guid":{"rendered":"http:\/\/blog.mozilla.org\/nnethercote\/?p=707"},"modified":"2011-07-04T10:00:44","modified_gmt":"2011-07-03T23:00:44","slug":"faster-javascript-parsing","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/nnethercote\/2011\/07\/01\/faster-javascript-parsing\/","title":{"rendered":"Faster JavaScript parsing"},"content":{"rendered":"<p>Over the past year or so I&#8217;ve almost doubled the speed of SpiderMonkey&#8217;s JavaScript parser.\u00a0 I did this by speeding up both the scanner (<a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=564369\">bug\u00a0564369<\/a>, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=584595\">bug\u00a0584595<\/a>, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=588648\">bug\u00a0588648<\/a>, <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=639420\">bug\u00a0639420<\/a><a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=639420\">,<\/a> <a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=645598\">bug\u00a0645598<\/a><a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=639420\">)<\/a> and the parser (<a href=\"https:\/\/bugzilla.mozilla.org\/show_bug.cgi?id=637549\">bug\u00a0637549<\/a>).\u00a0 I used patch stacks in several of those bugs, and so in those six bugs I actually landed 28 changesets.<\/p>\n<h3>Notable things about scanning JavaScript code<\/h3>\n<p>Before I explain the changes I made, it will help to explain a few notable things about scanning JavaScript code.<\/p>\n<ul>\n<li>JavaScript is encoded using UCS-2.\u00a0 This means that each character is 16 bits.<\/li>\n<li>There are several character sequences that indicate the end of a line (EOL): &#8216;\\n&#8217;, &#8216;\\r&#8217;, &#8216;\\r\\n&#8217;, \\u2028 (a.k.a. LINE_SEPARATOR), and \\u2029 (a.k.a. PARA_SEPARATOR).\u00a0 Note that &#8216;\\r\\n&#8217; is treated as a single character.<\/li>\n<li>JavaScript code is often <a href=\"http:\/\/en.wikipedia.org\/wiki\/Minification_%28programming%29\">minified<\/a>, and the characteristics of minified and non-minified code are quite different.\u00a0 The most important difference is that minified code has much less whitespace.<\/li>\n<\/ul>\n<h3>Scanning improvements<\/h3>\n<p>Before I made any changes, there were two different modes in which the scanner could operate.\u00a0 In the first mode, the entire character stream to be scanned was already in memory.\u00a0 In the second, the scanner read the characters from a file in chunks a few thousand chars long.\u00a0 Firefox always uses the first mode (except in the rare case where the platform doesn&#8217;t support mmap or an equivalent function), but the JavaScript shell used the second.\u00a0 Supporting the second made made things complicated in two ways.<\/p>\n<ul>\n<li>It was possible for an &#8216;\\r\\n&#8217; EOL sequence to be split across two chunks, which required some extra checking code.<\/li>\n<\/ul>\n<ul>\n<li>The scanner often needs to unget chars (up to six chars, due to the lookahead required for \\uXXXX sequences), and it couldn&#8217;t unget chars across a chunk boundary.\u00a0 This meant that it used a six-char unget buffer.\u00a0 Every time a char was ungotten, it would be copied into this buffer.\u00a0 As a consequence, every time it had to get a char, it first had to look in the unget buffer to see if there was one or more chars that had been previously ungotten.\u00a0 This was an extra check (and a data-dependent and thus unpredictable check).<\/li>\n<\/ul>\n<p>The first complication was easy to avoid by only reading N-1 chars into the chunk buffer, and only reading the Nth char in the &#8216;\\r\\n&#8217; case.\u00a0 But the second complication was harder to avoid with that design.\u00a0 Instead, I just got rid of the second mode of operation;\u00a0 if the JavaScript engine needs to read from file, it now reads the whole file into memory and then scans it via the first mode.\u00a0 This can result in more memory being used but it only affects the shell, not the browser, so it was an acceptable change.\u00a0 This allowed the unget buffer to be completely removed;\u00a0 when a character is ungotten now the scanner just moves back one char in the char buffer being scanned.<\/p>\n<p>Another improvement was that in the old code, there was an explicit EOL normalization step.\u00a0 As each char was gotten from the memory buffer, the scanner would check if it was an EOL sequence;\u00a0 if so it would change it to &#8216;\\n&#8217;, if not, it would leave it unchanged.\u00a0 Then it would copy this normalized char into another buffer, and scanning would proceed from this buffer.\u00a0 (The way this copying worked was strange and confusing, to make things worse.)\u00a0 I changed it so that getChar() would do the normalization without requiring the copy into the second buffer.<\/p>\n<p>The scanner has to detect EOL sequences in order to know which line it is on.\u00a0 At first glance, this requires checking every char to see if it&#8217;s an EOL, and the scanner uses a small look-up table to make this fast.\u00a0 However, it turns out that you don&#8217;t have to check every char.\u00a0 For example, once you know that you&#8217;re scanning an identifier, you know that if you hit an EOL sequence you&#8217;ll immediately unget it, because that marks the end of the identifier.\u00a0 And when you unget that char you&#8217;ll undo the line update that you did when you hit the EOL.\u00a0 This same logic applies in other situations (eg. parsing a number).\u00a0 So I added a function getCharIgnoreEOL() that doesn&#8217;t do the EOL check.\u00a0 It has to always be paired with ungetCharIgnoreEOL() and requires some care as to where it&#8217;s used, but it avoids the EOL check on more than half the scanned chars.<\/p>\n<p>As well as detecting where each token starts and ends, for a lot of token kinds the scanner has to compute a value.\u00a0 For example, after scanning the character sequence &#8221; 123 &#8221; it has to convert that to the number 123.\u00a0 The old scanner would copy the chars into a temporary buffer before calling the function that did the conversion.\u00a0 This was unnecessary &#8212; the conversion function didn&#8217;t even require NULL-terminated strings because it gets passed the length of the string being converted!\u00a0 Also, the old scanner was using js_strtod() to do the number conversion.\u00a0 js_strtod() can convert both integers and fractional numbers, but its quite slow and overkill for integers.\u00a0 And when scanning, even before converting the string to a number, we know if the number we just scanned was an integer or not (by remembering if we saw a &#8216;.&#8217; or exponent).\u00a0 So now the scanner instead calls GetPrefixInteger() which is <em>much<\/em> faster.\u00a0 Several of the tests in Kraken involve huge arrays of integers, and this made a big difference to them.<\/p>\n<p>There&#8217;s a similar story with identifiers, but with an added complication.\u00a0 Identifiers can contain \\uXXXX chars, and these need to be normalized before we do more with the string inside SpiderMonkey.\u00a0 So the scanner now remembers whether a \\uXXXX char has occurred in an identifier.\u00a0 If not, it can work directly (temporarily) with the copy of the string inside the char buffer.\u00a0 Otherwise, the scanner will rescan the identifier, normalizing and copying it into a new buffer.\u00a0 I.e. the scanner de-optimizes the (very) rare case in order to avoid the copying in the common case.<\/p>\n<p>JavaScript supports decimal, hexadecimal and octal numbers.\u00a0 The number-scanning code handled all three kinds in the same loop, which meant that it checked the radix every time it scanned another number char.\u00a0 So I split this into three parts, which make it both faster and easier to read.<\/p>\n<p>Although JavaScript chars are 16 bits, the vast majority of chars you see are in the first 128 chars.\u00a0 This is true even for code written in non-Latin scripts, because of all the keywords (e.g. &#8216;var&#8217;) and operators (e.g. &#8216;+&#8217;) and punctuation (e.g. &#8216;;&#8217;).\u00a0 So it&#8217;s worth optimizing for those.\u00a0 The main scanning loop (in getTokenInternal()) now first checks every char to see if its value is greater than 128.\u00a0 If so, it handles it in a side-path (the only legitimate such chars are whitespace, EOL or identifier chars, so that side-path is quite small).\u00a0 The rest of getTokenInternal() can then assume that it&#8217;s a sub-128 char.\u00a0 This meant I could be quite aggressive with look-up tables, because having lots of 128-entry look-up tables is fine, but having lots of 65,536-entry look-up tables would not be.\u00a0 One particularly important look-up table is called firstCharKinds;\u00a0 it tells you what kind of token you will have based on the first non-whitespace char in it.\u00a0 For example, if the first char is a letter, it will be an identifier or keyword;\u00a0 if the first char is a &#8216;0&#8217; it will be a number;\u00a0 and so on.<\/p>\n<p>Another important look-up table is called oneCharTokens.\u00a0 There are a handful of tokens that are one-char long, cannot form a valid prefix of another token, and don&#8217;t require any additional special handling:\u00a0 <code>;,?[]{}()<\/code>.\u00a0 These account for 35&#8211;45% of all tokens seen in real code!\u00a0 The scanner can detect them immediately and use another look-up table to convert the token char to the internal token kind without any further tests.\u00a0 After that, the rough order of frequency for different token kinds is as follows:\u00a0 identifiers\/keywords, &#8216;.&#8217;, &#8216;=&#8217;, strings, decimal numbers, &#8216;:&#8217;, &#8216;+&#8217;, hex\/octal numbers, and then everything else.\u00a0 The scanner now looks for these token kinds in that order.<\/p>\n<p>That&#8217;s just a few of the improvements, there were lots of other little clean-ups.\u00a0 While writing this post I looked at the old scanning code, as it was before I started changing it.\u00a0 It was awful, it&#8217;s really hard to see what was happening;\u00a0 getChar() was 150 lines long because it included code for reading the next chunk from file (if necessary) and also normalizing EOLs.<\/p>\n<p>In comparison, as well as being much faster, the new code is much easier to read, and much more <a href=\"http:\/\/en.wikipedia.org\/wiki\/Deterministic_finite-state_machine\">DFA<\/a>-like.\u00a0 It&#8217;s worth taking a look at getTokenInternal() in jsscan.cpp.<\/p>\n<h3>Parsing improvements<\/h3>\n<p>The parsing improvements were all related to the parsing of expressions.\u00a0 When the parser parses an expression like &#8220;3&#8221; it needs to look for any following operators, such as &#8220;+&#8221;.\u00a0 And there are roughly a dozen levels of operator precedence.\u00a0 The way the parser did this was to get the next token, check if it matched any of the operators of a particular precedence, and then unget the token if it didn&#8217;t match.\u00a0 It would then repeat these steps for the next precedence level, and so on.\u00a0 So if there was no operator after the &#8220;3&#8221;, the parser would have gotten and ungotten the next token a dozen times!\u00a0 Ungetting and regetting tokens is fast, because there&#8217;s a buffer used (i.e. you don&#8217;t rescan the token char by char) but it was still a bottleneck.\u00a0 I changed it so that the sub-expression parsers were expected to parse one token past the end of the expression, instead of zero tokesn past the end.\u00a0 This meant that the repeated getting\/ungetting could be avoided.<\/p>\n<p>These operator parsers are also very small.\u00a0 I inlined them more aggressively, which also helped quite a bit.<\/p>\n<h3>Results<\/h3>\n<p>I had some timing results but now I can&#8217;t find them.\u00a0 But I know that the overall speed-up from my changes was about 1.8x on Chris Leary&#8217;s parsemark suite, which takes code from lots of real sites, and the variation in parsing times for different codebases tends not to vary that much.<\/p>\n<p>Many real websites, e.g. gmail, have MB of JS code, and this speed-up will probably save one or two tenths of a second when they load.\u00a0 Not something you&#8217;d notice, but certainly something that&#8217;ll add up over time and help make the browser feel snappier.<\/p>\n<h3>Tools<\/h3>\n<p>I used Cachegrind to drive most of these changes.\u00a0 It has two features that were crucial.<\/p>\n<p>First, it does event-based profiling, i.e. it counts instructions, memory accesses, etc, rather than time.\u00a0 When making a lot of very small improvements, noise variations often swamp the effects of the improvements, so being able to see that instruction counts are going down by 0.2% here, 0.3% there, is very helpful.<\/p>\n<p>Second, it gives counts of these events for individual lines of source code.\u00a0 This was particularly important for getTokenInternal(), which is the main scanning function and has around 700 lines;\u00a0 function-level stats wouldn&#8217;t have been enough.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Over the past year or so I&#8217;ve almost doubled the speed of SpiderMonkey&#8217;s JavaScript parser.\u00a0 I did this by speeding up both the scanner (bug\u00a0564369, bug\u00a0584595, bug\u00a0588648, bug\u00a0639420, bug\u00a0645598) and the parser (bug\u00a0637549).\u00a0 I used patch stacks in several of those bugs, and so in those six bugs I actually landed 28 changesets. Notable things [&hellip;]<\/p>\n","protected":false},"author":139,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[30,311,1],"tags":[],"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/707"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/users\/139"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/comments?post=707"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/posts\/707\/revisions"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/media?parent=707"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/categories?post=707"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/nnethercote\/wp-json\/wp\/v2\/tags?post=707"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}