Archive for September, 2012
In my last post, I gave a brief overview of how security checks worked in gecko and how enablePrivilege fit into that model. Clearly, it was not perfect, otherwise we wouldn’t change anything. In particular, the old
model had the following major problems:
- Security checks took place in C++
- Because we had security checks scattered throughout our C++ code, the checks ran even when the method was called from other C++ code. This meant that we had cases where an action performed by C++ code looked like it came from whatever was that last JS code that happened to run. This could result in actions being denied when they shouldn’t be. Furthermore, it meant that the security checking code was unsafe by default in that if it was unable to determine the privileges of the running code, it returned
- There were multiple ways of being privileged
- Security systems should be as simple as possible and it should be as easy as possible to determine privilege levels. With our previous system, there were the privileges granted by where the code came from (its
origin) as well as whether it had happened to call enablePrivilege recently. This combined with the above problem when we wanted to ask,
Has the running code called enablePrivilege?we had to deal with the case where the running code was actually C++, further adding to the confusion.
- The model punished the common case
- Most JS code never touches objects (DOM or not) from another origin. Because our security checks happened for all calls, no matter who was calling (because we didn’t know), even code that we knew would never fail a security check was paying the expensive cost of walking the stack and computing privileges for objects. Our ideal solution would avoid this for common code.
- The security checks were dynamic
- Once JS code is compiled, its privilege levels are built into it. Furthermore, with a few exceptions, every object in existance has its privileges baked in. The model that we had recomputed both of these privilege levels on every use. We could vastly speed up our security checks by computing the relationship between code and the objects it used when the objects were exposed to it (see also: object capabilities).
Around the Firefox 4 timeframe, we decided to tackle these problems head-on by moving our security checks into the JS objects themselves. This work started taking place as part of the compartments work and continues today. It is important to note that, the problems in italics above actually caused real security bugs that we had to fix; this wasn’t simply theory. So, as part of the compartments work, we moved our security checks into the JS layer and computed whether they would pass or fail ahead of time. This turned out to be a massive performance win.
At the same time, we started taking advantage of the reduced reliance of our security checks on stack frames. In particular, we found that simply maintaining the JS stack was cost us in performance, so we both wanted to slim down our stack frames as well as avoid pushing them in cases where they weren’t needed.
Back to enablePrivilege
As explained in my last post, enablePrivilege relied heavily on our use of the JS stack. With the compartments work, we no longer needed to walk the stack for our other security checks. But enablePrivilege still required the use of the stack, leading to a situation where we had to re-add the ability to use the stack into our new security model (if I remember correctly, I think our first attempt to allow enablePrivilege to continue working accidentally disabled every single security check). Even worse, we had to continue to maintain state in the stack solely for this case. As our JITs got more complex, this burden got more expensive; the mere existence of the stack walking code has cost us weeks of work. The problems with the JS stack piled up on us after we’d had to fix several earlier security bugs caused by the fact that there were multiple ways of expressing
privileged code, meaning enablePrivilege was already high on our list of
things to remove from the platform.
The removal of enablePrivilege, therefore, means that we will be able to speed up our JS engine and simplify our security model, while reducing the possibility of us introducing security bugs.
For next time
In the next installment, I’ll dive into some of the less technical reasons that enablePrivilege removal is good. After that, I’ll talk about how to replace enablePrivilege in web applications.1 comment
The imminent death of enablePrivilege has brought a few angry web application developers into Bugzilla, and they’re quite rightly demanding to know why we’re removing a very powerful tool from their toolbox. While Jonas has responded to them, I thought it might be interesting to expound on the performance aspect of enablePrivilege removal. In order to understand the impact, let’s take a walk down the history of how security checks were implemented in Gecko.
The history of JS security checks in Gecko is very tightly coupled with the history of the bindings that allow JS to C++ communication. The first DOM bindings in Gecko were auto generated from the IDL for each element type by a program that only ran on Windows. The generated bindings would then do the proper forwarding to C++ and automatically convert the result to JS. Because there was no single point of entry going from JS to C++, the code generator had to insert security checks (more on that later) at each entry point. As a side note, looking at the old generated code, we actually forgot to security check accesses to unknown properties (called “expandos”), which would be considered a serious security bug these days. In addition to the security check that verified that a given method call was allowed at all, each DOM method had to check that the operation was permitted on all of the arguments passed to it as well.
In 2001, jst, jband, and peterv landed a massive project called “XPCDOM.” This got rid of the megabytes worth of generated code in favor of using XPConnect’s generic JS to C++ bridge. Because XPConnect has a single point of transit between JS and C++, this allowed us to have a single security check guarding all calls from JS to C++. This single check was, as before, supplemented by additional checks in C++ to verify the validity of the arguments passed in as well as additional checks in the JS engine to catch tricky edge cases that didn’t go directly through C++, but instead stayed in JS (and the engine).
So, what are these “security checks”
I believe that Gecko’s original security model was designed with Java 1’s security model in mind. The main idea of this model is that, at any point in time, it must be possible to inspect the currently running code to ask it what permissions it has and to assign a permission level for every object in the system. In order to perform the former operation, the JS engine exposed a “stack walking” API, which allowed us to write code that walked up the JS stack, asking each stack frame what permissions it had. For the latter operation, we had another (expensive) method of asking every object what permissions it had. Comparing the two permissions gave us our result.
What does this have to do with enablePrivilege?
The semantics of enablePrivilege seen without this context are odd: given a JS stack frame, a call to enablePrivilege elevates the privileges of that stack frame (and any functions it calls) and then returns to normal privileges once that stack frame returns. Knowing how security checks were implemented, however, it makes complete sense. Because the ability to assign privileges to running code depended on the ability to walk the JS stack, annotating stack frames with the information “has called enablePrivilege” and then later (if a security check was about to fail) asking “by the way, do any stack frames have this additional privilege?” was a natural and easy implementation.
Up next: the problems
We’re currently in the process of changing how all of this works, so clearly the solutions presented here were found lacking. So, up next: what’s happening now, and why.