21
Jan 12

bzexport –new: crash test dummies wanted

Scenario 1: you have a patch to some bug sitting in our mercurial queue. You want to attach it to a bug, but the bugzilla interface is painful and annoying. What do you do?

Use bzexport. It’s great! You can even request review at the same time.

What I really like about bzexport is that while writing and testing a patch, I’m in an editor and the command line. I may not even have a browser running, if I’m constantly re-starting it to test something out. Needing to go to the bugzilla web UI interrupts my flow. With bzexport, I can stay in the shell and move onto something else immediately.

Scenario 2: You have a patch, but haven’t filed a bug yet. Neither has anybody else. But your patch has a pretty good description of what the bug is. (This is common, especially for small things.) Do you really have to go through the obnoxious bug-filing procedure? It sure is tempting just to roll this fix up into some other vaguely related bug, isn’t it? Surely there’s a simple way to do things the right way without bouncing between interfaces?

Well, you’re screwed. Unless you’re willing to test something out for me. If not, please stop reading.
Continue reading →


03
Nov 11

Patch reordering

I have a patch queue that looks roughly like:

  initial-API
  consumer-1
  consumer-2
  unrelated
  consumer-3-plus-API-changes-and-consumer-1-and-2-updates-for-new-API

(So my base repo has a patch ‘initial-API-changes’ applied to it, followed by a patch ‘consumer-1’, etc.)

The idea is that I am working on a new API of some sort, and have a couple of independent consumers of that API. The first two are “done”, but when working on the 3rd, I realize that I need to make changes to or clean up the API that they’re all using. So I hack away, and end up with a patch that contains both consumer 3 plus some API changes, and to get it to compile I also update consumers 1 and 2 to accommodate the new changes. All of that is rolled up into a big hairball of a patch.

Now, what I want is:

  final-API
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

But how do I do that (using mq patches)? I can use qcrefresh+qnew to fairly easily get to:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3 (new API)
  API-changes-plus-API-changes-for-consumers-1-and-2

or I could split out the consumer 1 & 2 API changes:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3 (new API)
  API-changes
  consumer-2-API-changes
  consumer-1-API-changes

which theoretically I could qfold the consumer 1 and consumer 2 patches:

  initial-API
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)
  API-changes

Unfortunately, consumer-1-API-changes collides with API-changes, so the fold will fail. It shouldn’t collide, really, but it does because part of the code to “register” consumer-1 with the new API happens to sit right alongside the API itself. Even worse, how do I “sink” the ‘API-changes’ patch down so I can fold it into initial-API to produce final-API? (Apologies for displaying my stacks upside-down from my terminology!) A naive qfold will only work if the API-changes stuff is separate from all the consumer-* patches.

My manual solution is to start with the initial queue:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3-plus-API-changes-and-consumer-1-and-2-updates-for-new-API

and then use qcrefresh to rip the API changes and their effects on consumers 1 & 2 back out, leaving:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  API-changes-and-consumer-1-and-2-updates-for-new-API
  (in working directory) consumer-3 (new API)

I qrename/qmv the current patch to ‘api-change’ and qnew ‘consumer-3’ (its original name), cursing about how my commit messages are now on the wrong patch. Now I have

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  api-change (API changes and consumer 1 and 2 updates for new API)
  consumer-3 (new API)

Now I know that ‘unrelated’ doesn’t touch any of the same files, so I can qgoto consumer-2 and qfold api-change safely, producing:

  initial-API
  consumer-1 (old API)
  consumer-2 (new API, but also with API change and consumer 1 updates)
  unrelated
  consumer-3 (new API)

I again qcrefresh,qmv,qnew to pull a reduced version of the api-change patch, giving:

  initial-API
  consumer-1 (old API)
  api-change (with API change and consumer 1 updates)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

Repeat. I’m basically taking a combined patch and sinking it down towards its destination, carving off pieces to incorporate into patches as I pass them by. Now I have:

  initial-API
  api-change (with *only* the API change!)
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

and finally I can qfold api-change into initial-API, rename it to final-API, and have my desired result.

What a pain in the ass! Though the qcrefresh/qmv/qnew step is a lot better than what I’ve been doing up until now. Without qcrefresh, it would be

 % hg qrefresh -X .
 % hg qcrecord api-change
 % hg qnew consumer-n
 % hg qpop
 % hg qpop
 % hg qpop
 % hg qpush --move api-change
 % hg qpush --move consumer-n
 % hg qfold old-consumer-n

which admittedly preserves the change message from old-consumer-n, which is an advantage over my qcrefresh version.
Or alternatively: fold all of the patches together, and qcrecord until you have your desired final result. In this particular case, the ‘unrelated’ patch was a whole series of patches, and they weren’t unrelated enough to just trivially reorder them out of the way.

Without qcrecord, this is intensely painful, and probably involves hand-editing patch files.

My dream workflow would be to have qfold do the legwork: first scan through all intervening patches and grab out the portions of the folded patch that only modify nonconflicting files. Then try to get clever and do the same thing for the portions of the conflicted files that are independent. (The cleverness isn’t strictly necessary, but I’ve found that I end up selecting the same portions of my sinking patch over and over again, which gets old.) Then sink the patch as far as it will go before hitting a still-conflicting file, and open up the crecord UI to pull out just the parts that belong to the patch being folded (aka sunk). Repeat this for every intervening conflicting patch until the patch has sunk to its destination, then fold it in. If things get too hairy, then at any point abort the operation, leaving behind a half-sunk patch sitting next to the unmodified patch it conflicted with. (Alternatively, undo the entire operation, but since I keep my mq repo revision-controlled, I don’t care all that much.)

I originally wanted something that would do 3-way merges instead of the crecord UI invocations, but merges really want to move you “forward” to the final result of merging separate patches/lines of development. Here, I want to go backwards to a patch that, if merged, would produce the result I already have. So merge(base,base+A,base+B) -> base+AB which is the same as base+BA. From that, I could infer a B’ such that base+A+B’ is my merged base+AB, but that doesn’t do me any good.

In my case, I have base+A+B and want B” and A” such that base+B”+A” == base+A+B.

To anyone who made it this far: is there already an easy way to go about this? Is there something wrong with my development style that I get into these sorts of situations? In my case, I had already landed ‘initial-API’; please don’t tell me that the answer is that I always have to get the API right in the first place. Does anyone else get into this mess? (I can’t say I’ve run into this all that often, but it’s happened more than once or twice.)

I suppose if I had landed consumers 1 and 2, I would’ve just had to modify their uses of the API afterwards. So I could do that here, too. But reviews could tangle things up pretty easily — if a reviewer of consumer 1 or 2 notices the API uglinesses that I fixed for consumer 3, then landing the earlier consumers becomes dependent on landing consumer 3, which sucks. But also, none of this is really ready to land, and I’d like to iterate the API in my queue for a while with all the different consumers as test users, *without* lumping everything together into one massive patch.


07
Oct 11

distcc, ccache, and bacon

This was initially a response to JGriffin’s GoFaster analysis post but grew out of control. Read that first.

Rampant speculation

tl;dr: hey, we could use ccache and distcc on our build system!

Just speculating (as usual), but…

The note about retiring slow slaves, combined with the performance gap between full and incremental builds, suggests something.

Why does additional hardware (the slow slaves) slow things down? Because load is unevenly distributed. Ignoring communication costs, the fastest way to build with a fast machine and a slow one that takes 2x longer would be to compile 2/3 of the files with the fast machine and 1/3 with the slow one. How? Remove all slow slaves from the build pool and convert them to distcc servers.

What about the clobber builds? Well, if you’ve already built a particular file before with the same compiler and options, it would be nice to not have to build it again. That’s what ccache is for. But a ccache per slave means you have to have built the same thing on the same slave. For try builds (which is where most of the clobbers are), that’s not going to happen all the time.

But combine that with the above distcc idea: you could run ccache under distcc on the distcc servers. Now you have a ccache/distcc sandwich: local ccache first, then distcc, then remote ccache, then finally some bacon. Because everything’s better with bacon.

ts;wm: (too short; want more)

You know, in terms of data sources, the above picture is wrong. It’s really local ccache, then remote ccache (via distcc), then remote compile, and only then bacon. But the configuration-centric ccache/distcc/ccache description makes for better visuals. Or would if I put the bacon on the inside, anyway.

Let’s walk through a clobber build. The stuff the local slave has built before gets pulled from local distcc. Some of the remaining stuff gets built locally. The rest gets sent over to various machines in the distcc pool. We can break those things down into 3 categories: (1) stuff that’s never been built anywhere, (2) stuff that’s been built on a different distcc host, and (3) stuff that’s been built on the same distcc host. #3 is a win. #1 is unavoidable, it’s the basic cost of doing business. (Actually, there’s another dimension, which is whether something has been built before on a non-distcc host. I’ll ignore that for now. Conceptually, you can make it go away by making every slave a distcc server.)

#2 is waste. But it’s less waste than we have now, if the distcc pool is smaller than the whole build pool, because you’re doing one redundant build per distcc host rather than one per builder. And it’s self-limiting: a distcc host that has a build cached returns it immediately, meaning it’s more likely to get stuck with something it needs to build, which sucks but at least it populates its ccache so it won’t have to do it again.

Now, I am assuming here that compile costs are greater than communication + ccache lookup costs, which is an insanely flawed assumption. But it’s very very true for my personal builds — I have my own distcc server, and my clobber builds (actually, *all* my builds) feel way way faster when I’m using it. So I don’t think the question is so much “would this work?” as it is “what would we need to do to make this work?”

For starters, do no harm: it would be great if we could partition the network so that distcc servers are separate from the current communication channels. Every build host would sit on two VLANs, say: the regular one and the distcc one. That would reduce chances of infrastructure meltdown through excessive distcc traffic. (I am not a network engineer, nor do I play one on TV, and this may require separate physical networks and possibly Pringles cans.)

On a related note, it might be wise to start out by restricting the slaves from doing too many distcc jobs at a time, to prevent the distcc jobs from getting bogged down through congestion. I do this for my own builds through a ~/.distcc/hosts file containing: “localhost/4 192.168.1.99/7”. That means you can use -j666, and it’ll still only do 4 jobs on localhost and 7 jobs on 192.168.1.99 simultaneously. (Actually, that’s my home ~/.distcc/hosts file. My server at work is beefier, and there I allow the remote to do 12 jobs at once. I have a cron job that checks every 5 minutes to see what network I’m on and sets a ~/.distcc/hosts symlink accordingly. But I digress.)

More worrying is the reason behind all that clobbering. If a slave turns to the dark side, runs amok, gets hit by a cosmic ray, or is just having a bad day, do we really want to use its ccached builds? More to the point, when something goes wrong, what do we need to clobber? Right now everything is local to a slave, so it’s straightforward to pull a slave from the pool, take it out behind the garage, and beat the crap out of it with a stick. With distcc and ccache, it’s harder to tell which server to blame.

Still, how often does this happen? (I have no idea. I’m just a troublemaking developer, dammit.) We can always wipe the ccache on the whole distcc pool. It’d be nice to be able to track problems to their source, though. Maybe we could use the distcc pool redundancy to our advantage: have them cross-check the checksums of their builds with each other. Same input, same output. But that’s even more speculative.

It’s not all bad, though — I’m guessing that most clobbers result from the build system not being able to handle various types of change. If the ccache/distcc/ccache sandwich makes clobbers substantially cheaper, we can be a lot freer with them. Someone accidentally cancelled an m-c build partway through? Clobber the world! Let’s make bacon!!

wtf;yai;bdb: (what the f#@; you’re an idiot; been done before)

Reality check
  • We use local ccache already – see bug 488412
  • distcc has been proposed a number of times, but for the life of me I cannot find the bug. There are most likely some very valid reasons not to use it. Such as making a complete interdependent hairball out of our build system where one machine can kill everything.
  • Given the results in bug 488412, it’s very plausible that remote ccaches would be of no benefit or a net loss. (Though those numbers were using NFS to retrieve remote ccache results, and I deeply distrust NFS.)

Screw Reality. What has it ever done for me?

Hey, if we really needed to conceal network latency and redundant rebuilds across different hosts, we could stream out ccache results before they were even needed! But that’s crazy talk.


21
Sep 11

JS Probes

Have you ever had your browser mysteriously stall periodically and wondered “what the f#@$! is it doing?!!” Or perhaps you’re working on something, say the garbage collector, and you’d like to see what effect your changes are having. Or maybe even write a little analysis that postprocesses some sort of trace of what is going on, and figures out what the optimal pattern of actions would be. (“If I’d thrown this big chunk of data out of the cache here, then I would’ve had room for all of these little things that got evicted instead, and would have had way fewer misses…”)

The usual way to do things like this is to manually add some instrumentation code (probably just logging a bunch of events) and postprocess the results. This works fine, but it has a few drawbacks: (1) you have to figure out where to insert your instrumentation, often in unfamiliar code; (2) you’ll need to recompile, possibly several times; (3) the logs can get very large very quickly; and (4) you’ll probably end up writing a very special-purpose postprocessor that (5) dumps stuff to a text file that only you know how to interpret, and even you will only remember what it all means for a week or two. The next time you need to do something similar, you’ll find that all of your instrumentation code is severely bitrotted and misses some paths that have been added in the meantime, so you’ll start everything over from scratch.

Well, tough luck. Sometimes those are just facts of life and you’ll need to suck it up. Quit whining, dammit.

But many times, the events of interest (or more precisely, “probe points”) are of general interest. If you can manage to slip them into the code and so get other developers to maintain them for you as they make changes, then everyone can rely on those probes being in roughly the right place permanently. That’s #1 above, and depending on how they’re implemented there’s a good chance you won’t even need to recompile, so that’s #2.

I’ve done an implementation of these sorts of probes in the SpiderMonkey Javascript engine. There are probe points like “a GC is starting (and it’s local to one compartment)”, “the heap has been resized”, and “javascript function F is being called/is returning.” Some of these are straightforward to place into the code — the start of a GC wasn’t hard to figure out, for example. Some weren’t so straightforward, such as JS function calls (they might seem simple, but what if you’re running JITted? Which JIT? Are you still running JITted by the time you return from the function?) I’ve delivered the probe information to various backends — anything from Windows’ ETW (blog post forthcoming whenever I manage to implement the start/stop functionality), to dtrace/systemtap (another blog post, probably coming sooner since I recently scraped together a demo), to a simple callback mechanism (see JS_SetFunctionCallback on MDN) and other special-purpose ones that only care about a small subset of probes.

#3 (log it all vs online handling) ventures into religious territory. It is easiest to mindlessly log everything of interest and postprocess it. But what if you want realtime updates? Or if you want to track different information depending on what you learn from other probe points? Or what if the volume of your log writing interferes with whatever you’re trying to measure (eg disk I/O)? Or maybe you need to track some sort of state in order to give the probes meaning. (GC when idle => good. Avoidable GC when the user is waiting => bad.)

Those arguments are what led to the creation of tools like DTrace and Systemtap. Both give you a scripting environment that can aggregate information from probes as they fire, control exactly what information gets tracked as things are happening, and can be attached/detached at any time. They’re pretty cool, and invaluable once you get familiar with them. They’re also extremely system-dependent and generally require root access or special builds or kernel debuginfo or something, which ends up meaning that you often can’t just hand off analysis scripts to other people and have those people get some use out of them. And even you may not be able to take them to another environment.

Still, they deal pretty well with #4 (avoiding one-use, special-purpose processors), at least for environments matching the one they were written for. And if they can draw from statically-inserted probe points (the type I was talking about above), they can actually be pretty general. #5 is still a killer, though — at least the way I write systemtap scripts, they all end up with idiosyncratic ways of dumping out the results of some particular analysis, and nobody else is going to get much enlightenment without studying the script for a while first.

What if we could do better? What if we could insert these static probes, but rather than feeding the information to some niche tool that is usable by only a handful of people, we make the data available to a plain old Firefox addon? You could collect, aggregate, summarize, mutilate, fold, spindle, or crush the data directly in JS code. Then we could let addon authors go crazy with visualizations and analysis libraries. That’d be cool, right?

Graph GC behavior. Warn the user when slow or suspicious stuff is happening. Figure out what’s going on during long event handlers. Graph the percentage of time spent in different subsystems. Correlate performance/trace data with user-meaningful actions. Make a flight-recording of various metrics and let the user walk through history. Your ideas here.

Ok, so I tricked you. I’m not going to tell you how to do any of that. This blog post is a tease, an advertisement for the work that Brian Burg did this summer during his Mozilla internship. If you’re interested, he’ll be giving his internship final presentation tomorrow (today when you’re reading this, or perhaps yesterday or last month for those of you who have fallen behind on your Planet reading.) That’s 1:30PM PDT on Thursday, September 22 at the Mountain View Mozilla headquarters, and I’m 97.2% sure it will be broadcast over Air Mozilla as well. And taped, I think? (Sadly, I can’t find where those are archived. Somebody please tell me and I’ll update this post.) There will be a demo. With pretty pictures! And he’ll be writing it up on his own blog Real Soon Now. I’m not going to say any more for now — I’d get it wrong anyway.

Update: Argh! I got the date wrong! It’s not Wednesday, September 21 as I originally wrote. It’s today, Thursday, September 22. Sorry for the confusion!