07
Oct 11

distcc, ccache, and bacon

This was initially a response to JGriffin’s GoFaster analysis post but grew out of control. Read that first.

Rampant speculation

tl;dr: hey, we could use ccache and distcc on our build system!

Just speculating (as usual), but…

The note about retiring slow slaves, combined with the performance gap between full and incremental builds, suggests something.

Why does additional hardware (the slow slaves) slow things down? Because load is unevenly distributed. Ignoring communication costs, the fastest way to build with a fast machine and a slow one that takes 2x longer would be to compile 2/3 of the files with the fast machine and 1/3 with the slow one. How? Remove all slow slaves from the build pool and convert them to distcc servers.

What about the clobber builds? Well, if you’ve already built a particular file before with the same compiler and options, it would be nice to not have to build it again. That’s what ccache is for. But a ccache per slave means you have to have built the same thing on the same slave. For try builds (which is where most of the clobbers are), that’s not going to happen all the time.

But combine that with the above distcc idea: you could run ccache under distcc on the distcc servers. Now you have a ccache/distcc sandwich: local ccache first, then distcc, then remote ccache, then finally some bacon. Because everything’s better with bacon.

ts;wm: (too short; want more)

You know, in terms of data sources, the above picture is wrong. It’s really local ccache, then remote ccache (via distcc), then remote compile, and only then bacon. But the configuration-centric ccache/distcc/ccache description makes for better visuals. Or would if I put the bacon on the inside, anyway.

Let’s walk through a clobber build. The stuff the local slave has built before gets pulled from local distcc. Some of the remaining stuff gets built locally. The rest gets sent over to various machines in the distcc pool. We can break those things down into 3 categories: (1) stuff that’s never been built anywhere, (2) stuff that’s been built on a different distcc host, and (3) stuff that’s been built on the same distcc host. #3 is a win. #1 is unavoidable, it’s the basic cost of doing business. (Actually, there’s another dimension, which is whether something has been built before on a non-distcc host. I’ll ignore that for now. Conceptually, you can make it go away by making every slave a distcc server.)

#2 is waste. But it’s less waste than we have now, if the distcc pool is smaller than the whole build pool, because you’re doing one redundant build per distcc host rather than one per builder. And it’s self-limiting: a distcc host that has a build cached returns it immediately, meaning it’s more likely to get stuck with something it needs to build, which sucks but at least it populates its ccache so it won’t have to do it again.

Now, I am assuming here that compile costs are greater than communication + ccache lookup costs, which is an insanely flawed assumption. But it’s very very true for my personal builds — I have my own distcc server, and my clobber builds (actually, *all* my builds) feel way way faster when I’m using it. So I don’t think the question is so much “would this work?” as it is “what would we need to do to make this work?”

For starters, do no harm: it would be great if we could partition the network so that distcc servers are separate from the current communication channels. Every build host would sit on two VLANs, say: the regular one and the distcc one. That would reduce chances of infrastructure meltdown through excessive distcc traffic. (I am not a network engineer, nor do I play one on TV, and this may require separate physical networks and possibly Pringles cans.)

On a related note, it might be wise to start out by restricting the slaves from doing too many distcc jobs at a time, to prevent the distcc jobs from getting bogged down through congestion. I do this for my own builds through a ~/.distcc/hosts file containing: “localhost/4 192.168.1.99/7”. That means you can use -j666, and it’ll still only do 4 jobs on localhost and 7 jobs on 192.168.1.99 simultaneously. (Actually, that’s my home ~/.distcc/hosts file. My server at work is beefier, and there I allow the remote to do 12 jobs at once. I have a cron job that checks every 5 minutes to see what network I’m on and sets a ~/.distcc/hosts symlink accordingly. But I digress.)

More worrying is the reason behind all that clobbering. If a slave turns to the dark side, runs amok, gets hit by a cosmic ray, or is just having a bad day, do we really want to use its ccached builds? More to the point, when something goes wrong, what do we need to clobber? Right now everything is local to a slave, so it’s straightforward to pull a slave from the pool, take it out behind the garage, and beat the crap out of it with a stick. With distcc and ccache, it’s harder to tell which server to blame.

Still, how often does this happen? (I have no idea. I’m just a troublemaking developer, dammit.) We can always wipe the ccache on the whole distcc pool. It’d be nice to be able to track problems to their source, though. Maybe we could use the distcc pool redundancy to our advantage: have them cross-check the checksums of their builds with each other. Same input, same output. But that’s even more speculative.

It’s not all bad, though — I’m guessing that most clobbers result from the build system not being able to handle various types of change. If the ccache/distcc/ccache sandwich makes clobbers substantially cheaper, we can be a lot freer with them. Someone accidentally cancelled an m-c build partway through? Clobber the world! Let’s make bacon!!

wtf;yai;bdb: (what the f#@; you’re an idiot; been done before)

Reality check
  • We use local ccache already – see bug 488412
  • distcc has been proposed a number of times, but for the life of me I cannot find the bug. There are most likely some very valid reasons not to use it. Such as making a complete interdependent hairball out of our build system where one machine can kill everything.
  • Given the results in bug 488412, it’s very plausible that remote ccaches would be of no benefit or a net loss. (Though those numbers were using NFS to retrieve remote ccache results, and I deeply distrust NFS.)

Screw Reality. What has it ever done for me?

Hey, if we really needed to conceal network latency and redundant rebuilds across different hosts, we could stream out ccache results before they were even needed! But that’s crazy talk.


21
Sep 11

JS Probes

Have you ever had your browser mysteriously stall periodically and wondered “what the f#@$! is it doing?!!” Or perhaps you’re working on something, say the garbage collector, and you’d like to see what effect your changes are having. Or maybe even write a little analysis that postprocesses some sort of trace of what is going on, and figures out what the optimal pattern of actions would be. (“If I’d thrown this big chunk of data out of the cache here, then I would’ve had room for all of these little things that got evicted instead, and would have had way fewer misses…”)

The usual way to do things like this is to manually add some instrumentation code (probably just logging a bunch of events) and postprocess the results. This works fine, but it has a few drawbacks: (1) you have to figure out where to insert your instrumentation, often in unfamiliar code; (2) you’ll need to recompile, possibly several times; (3) the logs can get very large very quickly; and (4) you’ll probably end up writing a very special-purpose postprocessor that (5) dumps stuff to a text file that only you know how to interpret, and even you will only remember what it all means for a week or two. The next time you need to do something similar, you’ll find that all of your instrumentation code is severely bitrotted and misses some paths that have been added in the meantime, so you’ll start everything over from scratch.

Well, tough luck. Sometimes those are just facts of life and you’ll need to suck it up. Quit whining, dammit.

But many times, the events of interest (or more precisely, “probe points”) are of general interest. If you can manage to slip them into the code and so get other developers to maintain them for you as they make changes, then everyone can rely on those probes being in roughly the right place permanently. That’s #1 above, and depending on how they’re implemented there’s a good chance you won’t even need to recompile, so that’s #2.

I’ve done an implementation of these sorts of probes in the SpiderMonkey Javascript engine. There are probe points like “a GC is starting (and it’s local to one compartment)”, “the heap has been resized”, and “javascript function F is being called/is returning.” Some of these are straightforward to place into the code — the start of a GC wasn’t hard to figure out, for example. Some weren’t so straightforward, such as JS function calls (they might seem simple, but what if you’re running JITted? Which JIT? Are you still running JITted by the time you return from the function?) I’ve delivered the probe information to various backends — anything from Windows’ ETW (blog post forthcoming whenever I manage to implement the start/stop functionality), to dtrace/systemtap (another blog post, probably coming sooner since I recently scraped together a demo), to a simple callback mechanism (see JS_SetFunctionCallback on MDN) and other special-purpose ones that only care about a small subset of probes.

#3 (log it all vs online handling) ventures into religious territory. It is easiest to mindlessly log everything of interest and postprocess it. But what if you want realtime updates? Or if you want to track different information depending on what you learn from other probe points? Or what if the volume of your log writing interferes with whatever you’re trying to measure (eg disk I/O)? Or maybe you need to track some sort of state in order to give the probes meaning. (GC when idle => good. Avoidable GC when the user is waiting => bad.)

Those arguments are what led to the creation of tools like DTrace and Systemtap. Both give you a scripting environment that can aggregate information from probes as they fire, control exactly what information gets tracked as things are happening, and can be attached/detached at any time. They’re pretty cool, and invaluable once you get familiar with them. They’re also extremely system-dependent and generally require root access or special builds or kernel debuginfo or something, which ends up meaning that you often can’t just hand off analysis scripts to other people and have those people get some use out of them. And even you may not be able to take them to another environment.

Still, they deal pretty well with #4 (avoiding one-use, special-purpose processors), at least for environments matching the one they were written for. And if they can draw from statically-inserted probe points (the type I was talking about above), they can actually be pretty general. #5 is still a killer, though — at least the way I write systemtap scripts, they all end up with idiosyncratic ways of dumping out the results of some particular analysis, and nobody else is going to get much enlightenment without studying the script for a while first.

What if we could do better? What if we could insert these static probes, but rather than feeding the information to some niche tool that is usable by only a handful of people, we make the data available to a plain old Firefox addon? You could collect, aggregate, summarize, mutilate, fold, spindle, or crush the data directly in JS code. Then we could let addon authors go crazy with visualizations and analysis libraries. That’d be cool, right?

Graph GC behavior. Warn the user when slow or suspicious stuff is happening. Figure out what’s going on during long event handlers. Graph the percentage of time spent in different subsystems. Correlate performance/trace data with user-meaningful actions. Make a flight-recording of various metrics and let the user walk through history. Your ideas here.

Ok, so I tricked you. I’m not going to tell you how to do any of that. This blog post is a tease, an advertisement for the work that Brian Burg did this summer during his Mozilla internship. If you’re interested, he’ll be giving his internship final presentation tomorrow (today when you’re reading this, or perhaps yesterday or last month for those of you who have fallen behind on your Planet reading.) That’s 1:30PM PDT on Thursday, September 22 at the Mountain View Mozilla headquarters, and I’m 97.2% sure it will be broadcast over Air Mozilla as well. And taped, I think? (Sadly, I can’t find where those are archived. Somebody please tell me and I’ll update this post.) There will be a demo. With pretty pictures! And he’ll be writing it up on his own blog Real Soon Now. I’m not going to say any more for now — I’d get it wrong anyway.

Update: Argh! I got the date wrong! It’s not Wednesday, September 21 as I originally wrote. It’s today, Thursday, September 22. Sorry for the confusion!


25
Aug 11

Contexts and Compartments

A while ago (at the Platform offsite just after the last all-hands, actually) I wrote up what I understood about contexts and compartments. I’ve since sent it to a couple of people and put it up on the wiki, but haven’t distributed it more widely because I wasn’t sure it was all correct. I am far from an expert, but mrbkap (who *is* the expert) has now read through this and pointed out only one glaring mistake, which is now fixed. So other than the parts I’ve added since then, it should be more or less correct now and thus is ready for a wider audience.

See also http://www.christianwimmer.at/Publications/Wagner11a/Wagner11a.pdf for the fundamental idea of compartments.

Contexts=Control, Compartments=Data

JSContexts are control, JSCompartments are data.

A JSContext (from here on, just ”context”) represents the execution of JS code. A context contains a JS stack and is associated with a thread. A thread may use multiple contexts, but a given context will only execute on a single thread at a time.

A JSCompartment (”compartment”) is a memory space that objects and other garbage-collected things (”GCthings”) are stored within.

A context is associated with a single compartment at all times (not necessarily always the same one, but only ever one at a time). The context is often said to be “running inside” that compartment. Any object created with that context will be physically stored within the context’s current compartment. Just about any GCthing read or touched by that context should also be within that same compartment.

To access data in another compartment, a context must first “enter” that other compartment. This is termed a “cross-compartment call” — remember, contexts are control, so changing a context’s compartment is only meaningful if you’re going to run code. The context will enter another compartment, do some stuff, then return, at which time it’ll exit back to the original compartment. (The APIs allow you to change to a different compartment and never change back, but using that is almost always a bug and will trigger an assertion in a debug build the first time you touch an object in a compartment that differs from your context’s compartment.)

When a context is not running code — as in, its JS stack is empty and it is not in a request — then it isn’t really associated with any compartment at all. In the future, starting a request and entering an initial compartment will become the same action. Also, a context is only ever running on one thread at a time. Update: or perhaps we’ll eliminate contexts altogether and just map from a thread to the relevant data.

In implementation terms, a context has a field (cx->compartment) that gives the current compartment. Contexts also maintain a default scope object (cx->globalObject) that is required to always be within the same compartment, and a “pending exception” object which, if set, will also be in the same compartment. Any object created using a context will be created inside the context’s current compartment, and the object’s scope chain will be initialized to a scope object within that same compartment. (That scope object might be cx->globalObject, but really that’s just the ultimate fallback. Usually the scope object will be found via the stack.)

To make a cross-compartment call, cx->compartment is updated to the new compartment. The scope object must also be updated, and for that reason you must pass in a target object in the destination compartment. The scope object will be set to the target object’s global object. (There’s a hacky special case when you’re using a JSScript for the target object, since they don’t have global objects, but ignore that.) If an exception is pending, it will be set to a wrapper (really, a proxy) inside the new compartment. The wrapper mediates access to the original exception object that lives in the origin compartment.

Finally, a dummy frame that represents the compartment transition is pushed onto the JS stack. This frame is used for setting the scope object of anything created while executing within the new compartment. Also, the security privileges of executing code are determined by the current stack — eg, if your chrome code in a chrome compartment calls a content script in a content compartment, that script will execute with content privileges until it returns, then will revert to chrome privileges.

When debugging, it is helpful to know that a compartment is associated with a “JSPrincipals” object that represents the “security information” for the contents of that compartment. This is used to decide who can access what, and is mostly opaque to the JS engine. But for Gecko, it’ll typically contain a human-understandable URL, which makes it much easier to figure out what’s going on:

(gdb) p obj
 $1 = (JSObject *) 0x7fffbeef
 (gdb) p obj->compartment()
 $2 = (JSCompartment *) 0xbf5450
 (gdb) p obj->compartment()->principals()
 $3 = (JSPrincipals *) 0xc29860
 (gdb) p obj->compartment()->principals->codebase
 $4 = 0x7fffd120 "[System Principal]"
 ...or perhaps...
 $4 = 0x7fffd120 "http://angryhippos.com/accounts/"

Anything within a single compartment can freely and directly access anything else in that same compartment. No locking or wrappers are necessary (or possible). The overall model is thus a partitioning of all (garbage collectible) data into separate compartments, with controlled access from one compartment to another but lockless, direct access between objects within a compartment. Cross-compartment access is handled via “wrappers”, which is the subject of the next section.

Wrappers

GCthings may be wrapped in cross-compartment wrappers for a number of reasons. When a context is transitioning from one compartment to another (ie, it’s making a cross-compartment call), its scope object and pending exception (if any) are changed to wrappers pointing back to the objects in the old compartment. But any object can be wrapped in a cross-compartment wrapper if needed. You can clone an object from another compartment, and all of its properties will be wrappers pointing at the “real” properties in the origin compartment.

Cross-compartment wrappers do not compose. When you wrap an object, any existing wrappers will be ripped off first. (Slight oversimplification; there is one exception.) In fact, the type of wrapper used for an object is uniquely determined by the source and destination compartments.

The precise terminology is a little confusing. A cross-compartment wrapper is a JSObject whose class is one of the proxy classes. When you access such an
object, it fetches its proxy handler (a subclass of JSProxyHandler) out of a slot to decide how to handle that access. Confusingly, in the code a JSCrossCompartmentWrapper is the subclass of JSProxyHandler that manages cross-compartment access, but usually when we refer to a “cross-compartment wrapper”, we’re really talking about the JSObject. (The JSObject of type js::SomethingProxyClass that has a private JSSLOT_PROXY_HANDLER field containing a JSProxyHandler subclass that knows how to mediate access to the proxied object stored in JSSLOT_PROXY_PRIVATE. Phew.)

A proxy handler mediates access to the proxied objects based on a set of rules embodied by some subclass of JSProxyHandler. A proxy handler might allow all accesses through, conceal certain properties, or check on each access whether the source compartment is allowed to see a particular property. Examples of proxy handler classes are the things listed on https://developer.mozilla.org/en/XPConnect_wrappers : cross-origin wrappers (XOWs), chrome object wrappers (COWs), etc.

Also, the same wrapper will always be used for a given object. This is necessary for equality testing between independently generated wrappings of the same object, and useful for performance and memory usage as well. Internally, every compartment has a wrapperCache that is keyed off of wrapped objects’ identity. You could think of the flavor of wrapper (i.e., the type of proxy handler) being determined by the tuple «destination compartment, source compartment, object», but the object is stored within the source compartment so those last two are redundant with each other.

From the JS engine’s point of view, there are a bunch of objects, every object lives in a different compartment, and whenever you call something or point to something in another compartment, the engine will interpose a cross-compartment wrapper for you. It’s up to the embedding — the user of the JS engine — to decide how to divide up data into different compartments, and what the behavior is triggered when you cross between compartments. You could have a “home” compartment and a “bigger” compartment, and the cross-compartment wrapper could convert any string to Pig Latin when it is retrieved from “bigger” by “home”. More practically, you could conceal certain properties from view when accessing them from an “unprivileged” compartment (whatever that might mean in your embedding), or you could do locking or queuing when accessing one compartment from another compartment in a different thread. Or add a remoting layer.

XPConnect (Gecko’s SpiderMonkey embedding code) uses cross-compartment wrappers to implement security policies and access rules. The ‘Introduction’ section at https://developer.mozilla.org/en/XPConnect_security_membranes gives a very good description of what XPConnect is using the wrappers for. Gecko uses (mostly) one compartment for chrome, and one compartment for each content domain. The wrapper is chosen based on whether the two compartments are the same origin, or whether one is privileged to see anything or a subset of the information in the other, etc. See js/src/xpconnect/wrappers/WrapperFactory.cpp for the gruesome details.

Future

(Or, “What Luke Wagner is plotting”.)

There are various plans that will probably change this picture substantially. Our threading story right now is a bit convoluted — compartments can only be touched by one thread at a time but can supposedly switch between threads, or something, and contexts need to be in a request before doing anything and beginning a request binds the context to a thread but requests can be suspended, and a context points to a thread data but you need to rebind the thread data if you switch threads… it’s complicated, ok? I tried to document it once, but just kept confusing myself.

Luke plans to make JSRuntimes be single-thread only, eliminate JSContexts entirely, make JSCompartments be per-global (right now you can have multiple global objects in a compartment). I don’t really understand all that (are JSRuntimes the new JSContexts?) but the point is that things are a’changin.


02
Aug 11

Zombie Hunting

Armless Zombies?

I’ve been looking at bug 669730 where enabling Firebug on a page (http://nytimes.com/ to be precise) results in the page’s compartment living forever. This is easy to see, now that we have the incredibly useful about:memory and its per-compartment breakdown. (What’s a compartment? It’s a memory space to keep related garbage collectable objects in. See the compartment paper, or for some more detail about how they are used in Firefox, try my contexts vs compartments writeup, though it’s more about contexts than compartments.)

I managed to find the object keeping the compartment alive, and I thought I should document what I did to either help other people hunt down zombie compartments, or beg for better tools, or both. (Note that I haven’t actually fixed the bug, so this is a little premature, but the hunting process is far more likely to be reusable and of general interest than the specifics of this leak.) Oh, and I didn’t actually figure everything out; I just kind of stumbled across the right answer.

I have a zombie compartment, so something in the compartment isn’t getting collected. That means there’s at least one GCthing alive in the compartment that shouldn’t be. The “inner” objects aren’t of much interest, so what really matters is that there’s at least one unwanted root. I want to figure out what that root is.

Things I know of that can be roots are pointers gathered by the conservative stack scanner, cross-compartment wrappers, and explicitly added GC roots.

The conservative roots are unlikely to matter here, because this leak survives returning to the event loop. The cross-compartment wrappers are what I initially suspected. I think that whenever XPCOM points to a JS object, or at least a JS object in a content compartment as this one is, it goes through a cross-compartment wrapper and the wrapped object is considered to be a GC root. So I want to see the objects rooted by cross-compartment wrappers.

I guess the wrappers I care about will actually live in a different compartment and point into the nytimes.com compartment. But it doesn’t seem to matter in this case, because the only function I could see to set a breakpoint in is JSCompartment::markCrossCompartmentWrappers() and in my test runs, it never seemed to hit. It looks like maybe it only gets called when doing a compartmental GC, and we probably don’t do many of those on a zombie compartment. Still, how do those roots get marked? I still don’t know, because while wandering around the code trying to figure out what was going on, I stumbled across the right place to watch for the third set of roots — explicitly added roots — and I took a detour to check those out. (Thinking about it, it’s not the cross-compartment wrappers that are the roots, it’s the objects they point to. Maybe those end up in the explicitly-added roots list? Dunno.)

Specifically, MarkRuntime() in jsgc.cpp iterates through a runtime’s gcRootsHash and calls gc_root_traversal on each one. That grabs out the pointer value and name (yay!) of each root and scans it. So all I needed to do was check each of these roots to see which compartment it’s in, and stop when the compartment is the one I care about. Fortunately, gc_root_traversal calls MarkIfGCThingWord and it already computes the compartment. (It’s just a bit of bit masking and pointer chasing to do manually, so it’s not a big deal anyway.)

Conditional breakpoints are great and everything, but from the name of the function it sounded like it might get called a lot, so I just crammed my own debug code into the routine:

    static JSCompartment *interesting_compartment = NULL;
    if (aheader->compartment == interesting_compartment)
        printf("root: %p kind %u\n", (void*)addr, thingKind);

Then I reran under gdb. I still needed the address of the compartment (unfortunately, about:memory only shows compartment pointers for chrome compartments). So I looked for something looping over compartments, found several, and set a breakpoint in one of them. I’m not sure which one. They all loop over rt->compartments. For each compartment, I displayed the principals->codebasevalue:

(gdb) display (*c)->principals ? *(*c)->principals : 0

Then I ‘n’exted through until I found the nytimes.com one. With that pointer in hand, I set a breakpoint on my added code, above, and set ‘interesting_compartment’ to my magic pointer value. This printed out the address of the root in question, together with its ‘thingKind‘ which was zero. A quick look at jsgc.h showed me that zero means FINALIZE_OBJECT0, and the code just after my printf showed that I can cast that to JSObject*. A call to js_DumpObject((JSObject*)0x7fffd0a99d78) told me that this was an ‘Error’ object. Even better, when I walked up the stack one level, I could see that the root was labeled “JSDValue”.

So JSD is hanging onto a content Error object, probably one that it grabbed from hooking exception throwing or catching. Is it JSD not discarding something when you turn it off, or Firebug holding onto the object itself? I don’t know yet.


Learnings:

  • We need an easy way to get the pointer value of all compartments. Maybe in the ?verbose=1 output of about:memory?
  • Enumerating roots is handy. We should expose a function to dump out the roots given a compartment, so we could do this whole analysis via the chrome-privileged Web Console.
  • That means we need a way to refer to compartments from JS. Perhaps a weak map from principals’ codebases to JSCompartment* objects?

Related: see Jim Blandy’s bug 672736 for adding a findReferences JS call that gives all of the incoming edges to a JS object. I originally misinterpreted that to mean displaying the full path from a GC root to an object, and I started out trying to use findReferences by grabbing any object in the zombie compartment and calling findReferences on it. But I stopped when I realized that, knowing as little about the memory layout as I do, it was probably easier for me to find the roots themselves than figuring out how to look into the chunks/arenas/arena pools/whatever for the compartment to grab out random GCthings. And all I wanted was the root, so findReferences wouldn’t be of interest unless it crossed the XPConnect boundary and told me what was keeping the JS object alive via some sort of wrapper.

Now please, someone comment and tell me how I could have done this much more easily…