Security Checks and enablePrivilege in Gecko, part 2
The problems
In my last post, I gave a brief overview of how security checks worked in gecko and how enablePrivilege fit into that model. Clearly, it was not perfect, otherwise we wouldn’t change anything. In particular, the old
model had the following major problems:
- Security checks took place in C++
- Because we had security checks scattered throughout our C++ code, the checks ran even when the method was called from other C++ code. This meant that we had cases where an action performed by C++ code looked like it came from whatever was that last JS code that happened to run. This could result in actions being denied when they shouldn’t be. Furthermore, it meant that the security checking code was unsafe by default in that if it was unable to determine the privileges of the running code, it returned
maximum permissions.
- There were multiple ways of being privileged
- Security systems should be as simple as possible and it should be as easy as possible to determine privilege levels. With our previous system, there were the privileges granted by where the code came from (its
origin
) as well as whether it had happened to call enablePrivilege recently. This combined with the above problem when we wanted to ask,Has the running code called enablePrivilege?
we had to deal with the case where the running code was actually C++, further adding to the confusion. - The model punished the common case
- Most JS code never touches objects (DOM or not) from another origin. Because our security checks happened for all calls, no matter who was calling (because we didn’t know), even code that we knew would never fail a security check was paying the expensive cost of walking the stack and computing privileges for objects. Our ideal solution would avoid this for common code.
- The security checks were dynamic
- Once JS code is compiled, its privilege levels are built into it. Furthermore, with a few exceptions, every object in existance has its privileges baked in. The model that we had recomputed both of these privilege levels on every use. We could vastly speed up our security checks by computing the relationship between code and the objects it used when the objects were exposed to it (see also: object capabilities).
The solution
Around the Firefox 4 timeframe, we decided to tackle these problems head-on by moving our security checks into the JS objects themselves. This work started taking place as part of the compartments work and continues today. It is important to note that, the problems in italics above actually caused real security bugs that we had to fix; this wasn’t simply theory. So, as part of the compartments work, we moved our security checks into the JS layer and computed whether they would pass or fail ahead of time. This turned out to be a massive performance win.
At the same time, we started taking advantage of the reduced reliance of our security checks on stack frames. In particular, we found that simply maintaining the JS stack was cost us in performance, so we both wanted to slim down our stack frames as well as avoid pushing them in cases where they weren’t needed.
Back to enablePrivilege
As explained in my last post, enablePrivilege relied heavily on our use of the JS stack. With the compartments work, we no longer needed to walk the stack for our other security checks. But enablePrivilege still required the use of the stack, leading to a situation where we had to re-add the ability to use the stack into our new security model (if I remember correctly, I think our first attempt to allow enablePrivilege to continue working accidentally disabled every single security check). Even worse, we had to continue to maintain state in the stack solely for this case. As our JITs got more complex, this burden got more expensive; the mere existence of the stack walking code has cost us weeks of work. The problems with the JS stack piled up on us after we’d had to fix several earlier security bugs caused by the fact that there were multiple ways of expressing privileged code,
meaning enablePrivilege was already high on our list of things to remove from the platform.
The removal of enablePrivilege, therefore, means that we will be able to speed up our JS engine and simplify our security model, while reducing the possibility of us introducing security bugs.
For next time
In the next installment, I’ll dive into some of the less technical reasons that enablePrivilege removal is good. After that, I’ll talk about how to replace enablePrivilege in web applications.
1 comment