25
Aug 11

Contexts and Compartments

A while ago (at the Platform offsite just after the last all-hands, actually) I wrote up what I understood about contexts and compartments. I’ve since sent it to a couple of people and put it up on the wiki, but haven’t distributed it more widely because I wasn’t sure it was all correct. I am far from an expert, but mrbkap (who *is* the expert) has now read through this and pointed out only one glaring mistake, which is now fixed. So other than the parts I’ve added since then, it should be more or less correct now and thus is ready for a wider audience.

See also http://www.christianwimmer.at/Publications/Wagner11a/Wagner11a.pdf for the fundamental idea of compartments.

Contexts=Control, Compartments=Data

JSContexts are control, JSCompartments are data.

A JSContext (from here on, just ”context”) represents the execution of JS code. A context contains a JS stack and is associated with a thread. A thread may use multiple contexts, but a given context will only execute on a single thread at a time.

A JSCompartment (”compartment”) is a memory space that objects and other garbage-collected things (”GCthings”) are stored within.

A context is associated with a single compartment at all times (not necessarily always the same one, but only ever one at a time). The context is often said to be “running inside” that compartment. Any object created with that context will be physically stored within the context’s current compartment. Just about any GCthing read or touched by that context should also be within that same compartment.

To access data in another compartment, a context must first “enter” that other compartment. This is termed a “cross-compartment call” — remember, contexts are control, so changing a context’s compartment is only meaningful if you’re going to run code. The context will enter another compartment, do some stuff, then return, at which time it’ll exit back to the original compartment. (The APIs allow you to change to a different compartment and never change back, but using that is almost always a bug and will trigger an assertion in a debug build the first time you touch an object in a compartment that differs from your context’s compartment.)

When a context is not running code — as in, its JS stack is empty and it is not in a request — then it isn’t really associated with any compartment at all. In the future, starting a request and entering an initial compartment will become the same action. Also, a context is only ever running on one thread at a time. Update: or perhaps we’ll eliminate contexts altogether and just map from a thread to the relevant data.

In implementation terms, a context has a field (cx->compartment) that gives the current compartment. Contexts also maintain a default scope object (cx->globalObject) that is required to always be within the same compartment, and a “pending exception” object which, if set, will also be in the same compartment. Any object created using a context will be created inside the context’s current compartment, and the object’s scope chain will be initialized to a scope object within that same compartment. (That scope object might be cx->globalObject, but really that’s just the ultimate fallback. Usually the scope object will be found via the stack.)

To make a cross-compartment call, cx->compartment is updated to the new compartment. The scope object must also be updated, and for that reason you must pass in a target object in the destination compartment. The scope object will be set to the target object’s global object. (There’s a hacky special case when you’re using a JSScript for the target object, since they don’t have global objects, but ignore that.) If an exception is pending, it will be set to a wrapper (really, a proxy) inside the new compartment. The wrapper mediates access to the original exception object that lives in the origin compartment.

Finally, a dummy frame that represents the compartment transition is pushed onto the JS stack. This frame is used for setting the scope object of anything created while executing within the new compartment. Also, the security privileges of executing code are determined by the current stack — eg, if your chrome code in a chrome compartment calls a content script in a content compartment, that script will execute with content privileges until it returns, then will revert to chrome privileges.

When debugging, it is helpful to know that a compartment is associated with a “JSPrincipals” object that represents the “security information” for the contents of that compartment. This is used to decide who can access what, and is mostly opaque to the JS engine. But for Gecko, it’ll typically contain a human-understandable URL, which makes it much easier to figure out what’s going on:

(gdb) p obj
 $1 = (JSObject *) 0x7fffbeef
 (gdb) p obj->compartment()
 $2 = (JSCompartment *) 0xbf5450
 (gdb) p obj->compartment()->principals()
 $3 = (JSPrincipals *) 0xc29860
 (gdb) p obj->compartment()->principals->codebase
 $4 = 0x7fffd120 "[System Principal]"
 ...or perhaps...
 $4 = 0x7fffd120 "http://angryhippos.com/accounts/"

Anything within a single compartment can freely and directly access anything else in that same compartment. No locking or wrappers are necessary (or possible). The overall model is thus a partitioning of all (garbage collectible) data into separate compartments, with controlled access from one compartment to another but lockless, direct access between objects within a compartment. Cross-compartment access is handled via “wrappers”, which is the subject of the next section.

Wrappers

GCthings may be wrapped in cross-compartment wrappers for a number of reasons. When a context is transitioning from one compartment to another (ie, it’s making a cross-compartment call), its scope object and pending exception (if any) are changed to wrappers pointing back to the objects in the old compartment. But any object can be wrapped in a cross-compartment wrapper if needed. You can clone an object from another compartment, and all of its properties will be wrappers pointing at the “real” properties in the origin compartment.

Cross-compartment wrappers do not compose. When you wrap an object, any existing wrappers will be ripped off first. (Slight oversimplification; there is one exception.) In fact, the type of wrapper used for an object is uniquely determined by the source and destination compartments.

The precise terminology is a little confusing. A cross-compartment wrapper is a JSObject whose class is one of the proxy classes. When you access such an
object, it fetches its proxy handler (a subclass of JSProxyHandler) out of a slot to decide how to handle that access. Confusingly, in the code a JSCrossCompartmentWrapper is the subclass of JSProxyHandler that manages cross-compartment access, but usually when we refer to a “cross-compartment wrapper”, we’re really talking about the JSObject. (The JSObject of type js::SomethingProxyClass that has a private JSSLOT_PROXY_HANDLER field containing a JSProxyHandler subclass that knows how to mediate access to the proxied object stored in JSSLOT_PROXY_PRIVATE. Phew.)

A proxy handler mediates access to the proxied objects based on a set of rules embodied by some subclass of JSProxyHandler. A proxy handler might allow all accesses through, conceal certain properties, or check on each access whether the source compartment is allowed to see a particular property. Examples of proxy handler classes are the things listed on https://developer.mozilla.org/en/XPConnect_wrappers : cross-origin wrappers (XOWs), chrome object wrappers (COWs), etc.

Also, the same wrapper will always be used for a given object. This is necessary for equality testing between independently generated wrappings of the same object, and useful for performance and memory usage as well. Internally, every compartment has a wrapperCache that is keyed off of wrapped objects’ identity. You could think of the flavor of wrapper (i.e., the type of proxy handler) being determined by the tuple «destination compartment, source compartment, object», but the object is stored within the source compartment so those last two are redundant with each other.

From the JS engine’s point of view, there are a bunch of objects, every object lives in a different compartment, and whenever you call something or point to something in another compartment, the engine will interpose a cross-compartment wrapper for you. It’s up to the embedding — the user of the JS engine — to decide how to divide up data into different compartments, and what the behavior is triggered when you cross between compartments. You could have a “home” compartment and a “bigger” compartment, and the cross-compartment wrapper could convert any string to Pig Latin when it is retrieved from “bigger” by “home”. More practically, you could conceal certain properties from view when accessing them from an “unprivileged” compartment (whatever that might mean in your embedding), or you could do locking or queuing when accessing one compartment from another compartment in a different thread. Or add a remoting layer.

XPConnect (Gecko’s SpiderMonkey embedding code) uses cross-compartment wrappers to implement security policies and access rules. The ‘Introduction’ section at https://developer.mozilla.org/en/XPConnect_security_membranes gives a very good description of what XPConnect is using the wrappers for. Gecko uses (mostly) one compartment for chrome, and one compartment for each content domain. The wrapper is chosen based on whether the two compartments are the same origin, or whether one is privileged to see anything or a subset of the information in the other, etc. See js/src/xpconnect/wrappers/WrapperFactory.cpp for the gruesome details.

Future

(Or, “What Luke Wagner is plotting”.)

There are various plans that will probably change this picture substantially. Our threading story right now is a bit convoluted — compartments can only be touched by one thread at a time but can supposedly switch between threads, or something, and contexts need to be in a request before doing anything and beginning a request binds the context to a thread but requests can be suspended, and a context points to a thread data but you need to rebind the thread data if you switch threads… it’s complicated, ok? I tried to document it once, but just kept confusing myself.

Luke plans to make JSRuntimes be single-thread only, eliminate JSContexts entirely, make JSCompartments be per-global (right now you can have multiple global objects in a compartment). I don’t really understand all that (are JSRuntimes the new JSContexts?) but the point is that things are a’changin.


24
Aug 11

hg qedit

On his blog, Paul O’Shannessy came up with an ‘hg qedit’ alias that opens up an editor on your .hg/patches/series file for reordering your patch queue. It’s a nice simple solution to a common problem, so obviously I felt compelled to muck it up.

Here’s my version, for insertion into your ~/.hgrc:

[alias]
qedit = !S=$(hg root)/.hg/patches/series; cp $S $S.bak && perl -pale 'BEGIN { chomp(@a = qx(hg qapplied -q)); die if $?; @a{@a}=(); }; s/^/# (applied) / if exists $a{$F[0]}' $S > $S.new && ${EDITOR-vim} $S.new && sed -e 's/^# .applied. //' $S.new > $S
                                                                                                                                                                                                                   # Did you see this by scrolling over?
                                                                                                                                                                                                                   # I want better code snippet support

This fixes the main problem with zpao’s solution, which is that it’s too clean and simple.

No, wait, that’s not a problem.

The problem is that when I edit my series file, I often forget that I have some patches applied and end up reordering applied patches, which makes a complete mess. The above alias opens up an editor on your series file, only it also inserts comments showing which patches are already applied. (If you really, really want to mess yourself up, go ahead and reorder the commented lines. You’ll get what you deserve.)

Here’s what my queue looks like when editing the series file:

# (applied) better-dtrace-probes
# (applied) try-enable-dtrace
# (applied) bug-650078-no-remote
bug-677985-callouts
bug-677949-gc-roots
hack-stackiter

Come to think of it, mq really shouldn’t let you mess up that way in the first place. It knows the original patch names for your applied patches (unless you are really determined to make your life difficult, and commit things on top without going through mq at all). It could detect when you reordered applied patches, and just undo what you did. And call you names. But maybe that would slow things down.

Update: it wasn’t working for jlebar, which turned out to be because he had added qapplied=-v to his [defaults] section. The above is now fixed for that scenario by adding a -q flag to hg qapplied.


02
Aug 11

Zombie Hunting

Armless Zombies?

I’ve been looking at bug 669730 where enabling Firebug on a page (http://nytimes.com/ to be precise) results in the page’s compartment living forever. This is easy to see, now that we have the incredibly useful about:memory and its per-compartment breakdown. (What’s a compartment? It’s a memory space to keep related garbage collectable objects in. See the compartment paper, or for some more detail about how they are used in Firefox, try my contexts vs compartments writeup, though it’s more about contexts than compartments.)

I managed to find the object keeping the compartment alive, and I thought I should document what I did to either help other people hunt down zombie compartments, or beg for better tools, or both. (Note that I haven’t actually fixed the bug, so this is a little premature, but the hunting process is far more likely to be reusable and of general interest than the specifics of this leak.) Oh, and I didn’t actually figure everything out; I just kind of stumbled across the right answer.

I have a zombie compartment, so something in the compartment isn’t getting collected. That means there’s at least one GCthing alive in the compartment that shouldn’t be. The “inner” objects aren’t of much interest, so what really matters is that there’s at least one unwanted root. I want to figure out what that root is.

Things I know of that can be roots are pointers gathered by the conservative stack scanner, cross-compartment wrappers, and explicitly added GC roots.

The conservative roots are unlikely to matter here, because this leak survives returning to the event loop. The cross-compartment wrappers are what I initially suspected. I think that whenever XPCOM points to a JS object, or at least a JS object in a content compartment as this one is, it goes through a cross-compartment wrapper and the wrapped object is considered to be a GC root. So I want to see the objects rooted by cross-compartment wrappers.

I guess the wrappers I care about will actually live in a different compartment and point into the nytimes.com compartment. But it doesn’t seem to matter in this case, because the only function I could see to set a breakpoint in is JSCompartment::markCrossCompartmentWrappers() and in my test runs, it never seemed to hit. It looks like maybe it only gets called when doing a compartmental GC, and we probably don’t do many of those on a zombie compartment. Still, how do those roots get marked? I still don’t know, because while wandering around the code trying to figure out what was going on, I stumbled across the right place to watch for the third set of roots — explicitly added roots — and I took a detour to check those out. (Thinking about it, it’s not the cross-compartment wrappers that are the roots, it’s the objects they point to. Maybe those end up in the explicitly-added roots list? Dunno.)

Specifically, MarkRuntime() in jsgc.cpp iterates through a runtime’s gcRootsHash and calls gc_root_traversal on each one. That grabs out the pointer value and name (yay!) of each root and scans it. So all I needed to do was check each of these roots to see which compartment it’s in, and stop when the compartment is the one I care about. Fortunately, gc_root_traversal calls MarkIfGCThingWord and it already computes the compartment. (It’s just a bit of bit masking and pointer chasing to do manually, so it’s not a big deal anyway.)

Conditional breakpoints are great and everything, but from the name of the function it sounded like it might get called a lot, so I just crammed my own debug code into the routine:

    static JSCompartment *interesting_compartment = NULL;
    if (aheader->compartment == interesting_compartment)
        printf("root: %p kind %u\n", (void*)addr, thingKind);

Then I reran under gdb. I still needed the address of the compartment (unfortunately, about:memory only shows compartment pointers for chrome compartments). So I looked for something looping over compartments, found several, and set a breakpoint in one of them. I’m not sure which one. They all loop over rt->compartments. For each compartment, I displayed the principals->codebasevalue:

(gdb) display (*c)->principals ? *(*c)->principals : 0

Then I ‘n’exted through until I found the nytimes.com one. With that pointer in hand, I set a breakpoint on my added code, above, and set ‘interesting_compartment’ to my magic pointer value. This printed out the address of the root in question, together with its ‘thingKind‘ which was zero. A quick look at jsgc.h showed me that zero means FINALIZE_OBJECT0, and the code just after my printf showed that I can cast that to JSObject*. A call to js_DumpObject((JSObject*)0x7fffd0a99d78) told me that this was an ‘Error’ object. Even better, when I walked up the stack one level, I could see that the root was labeled “JSDValue”.

So JSD is hanging onto a content Error object, probably one that it grabbed from hooking exception throwing or catching. Is it JSD not discarding something when you turn it off, or Firebug holding onto the object itself? I don’t know yet.


Learnings:

  • We need an easy way to get the pointer value of all compartments. Maybe in the ?verbose=1 output of about:memory?
  • Enumerating roots is handy. We should expose a function to dump out the roots given a compartment, so we could do this whole analysis via the chrome-privileged Web Console.
  • That means we need a way to refer to compartments from JS. Perhaps a weak map from principals’ codebases to JSCompartment* objects?

Related: see Jim Blandy’s bug 672736 for adding a findReferences JS call that gives all of the incoming edges to a JS object. I originally misinterpreted that to mean displaying the full path from a GC root to an object, and I started out trying to use findReferences by grabbing any object in the zombie compartment and calling findReferences on it. But I stopped when I realized that, knowing as little about the memory layout as I do, it was probably easier for me to find the roots themselves than figuring out how to look into the chunks/arenas/arena pools/whatever for the compartment to grab out random GCthings. And all I wanted was the root, so findReferences wouldn’t be of interest unless it crossed the XPConnect boundary and told me what was keeping the JS object alive via some sort of wrapper.

Now please, someone comment and tell me how I could have done this much more easily…