Firefox’s preferences system uses data files to store information about default preferences within Firefox, and user preferences in a user’s profile (such as prefs.js, which records changes to preference values, and user.js, which allows users to override default preference values).
A new parser
These data files use a custom format, and therefore Firefox has a custom parser for them. I recently rewrote the parser. The new parser has the following benefits over the old parser.
- It is faster (raw parsing speed is close to 2x faster).
- It is safer (because it’s written in Rust rather than C++).
- It is more correct and better tested (the old one got various obscure edge cases wrong).
- It is more readable, and easier to modify.
- It issues no warnings, only errors.
- It is slightly stricter (e.g. doesn’t allow any malformed input, and it catches integer overflow).
- It has error recovery and better error messages (including correct line numbers).
Modifiability was the prime motivation for the change. I wanted to make some adjustments to the preferences file grammar, but this would have been very difficult in the old parser, because it was written in an awkward style.
It was essentially a single loop containing a giant switch statement on a state variable. This switch was executed for every single char in a file. The states held by the state variable had names like PREF_PARSE_QUOTED_STRING, PREF_PARSE_UNTIL_OPEN_PAREN, PREF_PARSE_COMMENT_BLOCK_MAYBE_END. It also had a second state variable, because in some places a single one wasn’t enough; the parser had to return to the previous state after exiting the current state. Furthermore, lexing and parsing were not separate, so code to handle comments and whitespace was spread around in various places.
The new parser is a recursive descent parser — even though the grammar doesn’t actually have any recursion — in which the structure of the code reflects the structure of the grammar. Lexing is distinct from parsing. As a result, the new parser is much easier to read and modify. In particular, after landing it I added error recovery without too much effort; that would have been almost impossible in the old parser.
Note that the idea of error recovery for preferences parsing was first proposed in bug 107264, filed in 2001! After landing it, I tweeted the following.
I fixed an old bug: https://t.co/llDURdHUN8
Imagine going back in time and telling the reporter “this bug will get fixed 16 years from now, and the code will be written in a systems programming language that doesn’t exist yet”.
— Nicholas Nethercote (@nnethercote) February 20, 2018
Amazingly enough, the original reporter is on Twitter and responded!
I kept getting emails on this bug over the years — dependencies and stuff — and I’d be like, “this bug is still open?!” Great job, @nnethercote! https://t.co/uVLYK8Tn6U
— Kevin Basil Fritts (@kevinbasil) March 1, 2018
The new parser is slightly stricter and rejects some malformed input that the old parser accepted.
Disconcertingly, the old parser allowed arbitrary junk between preferences (including at the start and end of the prefs file) so long as that junk didn’t include any of the following chars: ‘/’, ‘#’, ‘u’, ‘s’, ‘p’. This means that lines like these:
!foo@bar&pref("prefname", true); ticky_pref("prefname", true); // missing 's' at start User_pref("prefname", true); // should be 'u' at start
would all be treated the same as this:
The new parser disallows such junk because it isn’t necessary and seems like an unintentional botch by the old parser. In practice, this caught a couple of prefs that accidentally had an extra ‘;’ at the end.
The old parser allowed the SUB (0x1a) character between tokens and treated it like ‘\n’.
The new parser does not allow this character. SUB was used to indicate end-of-file (not end-of-line) in some old operating systems such as MS-DOS, but this doesn’t seem necessary today.
The old parser tolerated (with a warning) invalid escape sequences within string literals — such as “\q” (not a valid escape) and “\x1” and “\u12″(both of which have insufficient hex digits) — accepting them literally.
The new parser does not tolerate invalid escape sequences because it doesn’t seem necessary and would complicate things.
The old parser tolerated the NUL character (0x00) within string literals; this is
dangerous because C++ code that manipulates string values with embedded NULs will almost certainly consider those chars as end-of-string markers.
The new parser treats the NUL character as end-of-file, to avoid this danger. (The escape sequences “\x00” and “\u0000” are also disallowed.)
The old parser allowed integer literals to overflow, silently wrapping them.
The new parser treats integer overflow as a parse error. This seems better,
and it caught overflows of several existing prefs.
Error recovery minimizes the risk of data loss caused by the increased strictness because malformed pref lines in prefs.js will be removed but well-formed pref lines afterwards are preserved.
Nonetheless, please keep an eye out for any other problems that might arise from this change.
I mentioned before that I wanted to make some adjustments to the preferences file grammar. Specifically, I changed the grammar used by default preference files (but not user preference files) to support annotating each preference with one or more boolean attributes. The attributes supported so far are ‘sticky’ and ‘locked’. For example:
pref("sticky.pref", true, sticky); pref("locked.pref", 123, locked); pref("sticky-and-locked-pref", "blah", sticky, locked);
Note that the addition of the ‘locked’ attribute fixed a 10 year old bug.
When will this ship?
All of these changes are on track to ship in Firefox 60, which is due to release on May 9th.
12 replies on “A New Preferences Parser for Firefox”
So with the new error handling, the malformed first line in the attributes section will be ignored, and the other two lines still be applied, right?
(check your quotes on the first line ;))
Good catch! I fixed it to adding the missing quote.
As for the error handling: it turns out that failing to put a quote at the end of a string is one case that’s hard to recover from well, because the parser ends up interpreting tokens that are supposed to be after the string as part the string, and then the first quote of the next string is interpreted as the end of the current string. But the old parser would have handled that badly too, so at least the new parser isn’t any worse on that case.
Why write a whole new parser instead of migrating to a format with a readily available parser like TOML or JSON?
Migrating to another format still requires you to parse the existing configuration files.
Backwards compatibility. Users can write user preference files (user.js in the profile) and I didn’t want to break that. Default preference files could use a different format, but if we have to support user preference files, it’s very little extra work to support default files as well.
Is that use case worth supporting, though? I can’t imagine there are a lot of people handwriting these prefs files.
In Firefox 57+ any changes I’ve made via the about:config screen or to the user.js/pref.js files seems to be reverted each time I close the browser.
Firefox overwrites pref.js frequently, so any changes you make to it while Firefox is running are certain to be overwritten. There’s a big comment at the top saying “DO NOT EDIT THIS FILE” for that reason.
Firefox never overwrites user.js, so prefs in that file should remain and be observed.
As for changes to about:config, changes made there should also remain and be observed.
There isn’t enough detail in this comment for me to take any action. If you can file a bug (at https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=Preferences%3A%20Backend) with clear steps to reproduce I will investigate. Thanks!
Why did you completely hand-roll the parser instead of using a parser framework like nom or larlpop? I’m playing with Rust in my free time myself and I need to write a parser for my toy program, but I’m not sure yet which way I want to go.
(I of course typoed and meant lalrpop. I suppose you could work that out, but I don’t want to sound like “that dumb guy”, and your blog doesn’t have an edit button :P)
The grammar is small and simple enough that writing from scratch was reasonable. I was also curious how much speed I could wring out of it, and there are several optimizations that are grammar-specific and couldn’t have been done with a generic parser framework.
I’m experiencing a problem with our enterprise antivirus that blocks the update of prefs.js.
Firefox saves prefs-.js and then renames it to prefs.js. For a reason still unknown to me it fails the rename part, so that I now have 1706 prefs files in profile since november 17… it seems like a deadlock between the AV analyzing prefs.js and Firefox that wants to replace it. Neither report an error…
This happens on Firefox up to v 57 and last ESR. I didn’t test your new code but since you know the old and the new code probably you can better understand what may be happening and put some checks in the code