scratchpad made me happy

March 5th, 2014

I love the Firefox devtools command line. I cut & paste all kinds of crazy code into it. On the other hand, I never did really quite “get” the Scratchpad, though to be honest I also haven’t tried using it much. I’ve been happy just to edit code in an Emacs buffer on the side and cut & paste.

The Problem

But last night, I ran into a problem that Scratchpad turned out to be perfect for. And now I loveses it. My precioussssss….

I am the proud survivor — er, I mean “father” — of two kids, one of whom goes to a school with lots of required “volunteer” time. Since it is required, you have to record your hours via a web-based tool. It’s a fairly primitive interface on some ancient backend ASP monstrosity. It’s tolerable to use to record one entry at a time — you just need to enter a date, a start hour, a start minute, an end hour, an end minute, three options selected from dropdown lists of several dozen items (enough to require scrolling), etc.

Ok, it’s pretty awful even for entering one record.

But entering 80 of the things, most of them differing only by the date, is intolerable. Especially since the d#@n form resets itself completely every time you submit it. And so automation rage kicked in.

Existing Solution

Last year, I did it by capturing the POST request and writing a script to resubmit with different values. It worked, kinda, though I could only get a couple of fields working and it kept timing out my authentication cookie. Or something. I just remember it being a major pain. Even capturing the full request was a little difficult since it’s HTTPS only and I seem to remember some limitation in the Firefox devtools of the time when trying to see the POST body data. (Again, “or something”.)

The Latest Hotness

This year, I was overjoyed to see the option to edit and resubmit a query. I’ve wanted that so many times. And yet… the data was still x-www-form-urlencoded, which means I had to cut & paste from the devtools pane, which is already a challenge due to Linux/xorg/xfce/emacs’s mishandling of cut buffers or clipboards or whatever the heck they are. And then find the field I cared about and update it, and then discover that it overwrote my previous entry because there was some embedded token in one of the other fields that referred to the entry it was creating. Ugh. (Dim memories resurfaced at about this point from when I needed to get around this last year. I still don’t remember the details.)

So then I thought, “hey, I’ll just update the page in place and then click submit! I’ll do it at the HTML level instead of the HTTP level!” So I wrote up some JS code to find and set the various form fields, and clicked submit. Success!

Only it’s still a painful flow. I have to edit the relevant field in emacs (or eventually, I’d probably generate the JS scripts with a shell or Perl or Python script), cut & paste into the little tiny console line (it’s a console prompt, it’s supposed to be small, I have no issue with that.) (Though maybe pasting into the console itself should send it to the prompt? I dunno.), press enter, then click on submit. Not too bad, and definitely well within the “tolerable” zone.

But it’d be easier if I could just define a JS function that finds and fills the fields, and pass in the one value I need to change. Then I can just enter that at the console. Only where can I stash the function? If I put it on the page, I assume it’ll get nuked whenever I submit the page. Hey, I wonder if that Scratchpad thing might help here…

Enter the Scratchpad. Oh yeah.

So I pasted my little script into the Scratchpad. It defines a function to fill out the fields. Final flow: edit one line of JS to change a date, press Ctrl-R to run it. The fields magically update, I click submit.

Obviously, I could’ve done the submit from the script while I was at it, but I like to set things up and fire them off in separate steps. Call it paranoia. I do the same thing with shell scripts — I’ll write a script to echo out a series of commands to perform, run it once to verify that it’s what I want, then run it again piped through bash. I’m just too clumsy to get it right the first time.

But anyway, Scratchpad was the awesome for this task. I’ll be considering it whenever thinking about how to do other things now.

Doers of Good

Thank you robcee and #devtools team. You made my life gooder. More goodish. I am now living goodlier.

My Example

The script I used, if you’re curious:

function enter(date) {
    inputs = document.forms[0].getElementsByTagName("input");
    selects = document.forms[0].getElementsByTagName("select");
    texts = document.forms[0].getElementsByTagName("textarea");
    inputs["VolunteerDate"].value = date;
    inputs["FromHour"].value = 8;
    inputs["FromMin"].value = 45;
    inputs["ToHour"].value = 9;
    inputs["ToMin"].value = 45;
    inputs["nohour"].value = 1;
    selects["MinCombo"].value = 0;
    selects["classcombo"].value = 40;
    selects["activitycombo"].value = 88;
    selects["VolunteerCombo"].value = "Steve Fink";
    texts[0].innerHTML = "Unit study center";
}

enter("2/27/2014") # I edit this line and bounce on the Ctrl-R key, then click submit

Browser Wars, the game

February 14th, 2013

A monoculture is usually better in the short term. It’s a better allocation of resources (everyone working on the same thing!) If you want to write a rich web app that works today (ie, on the browsers of today), it’s much better.

But the web is a platform. Platforms are different beasts.

Imagine it’s an all-WebKit mobile web. Just follow the incentives to figure out what will happen.

Backwards bug compatibility: There’s a bug — background SVG images with a prime-numbered width disable transparency. A year later, 7328 web sites have popped up that inadvertently depend on the bug. Somebody fixes it. The websites break with dev builds. The fix is backed out, and a warning is logged instead. Nothing breaks, the world’s webkit, nobody cares. The bug is now part of the Web Platform.

Preventing innovation: a gang of hackers makes a new browser that utilizes the 100 cores in 2018-era laptops perfectly evenly, unlike existing browsers that mostly burn one CPU per tab. It’s a ground-up rewrite, and they do heroic work to support 99% of the websites out there. Make that 98%; webkit just shipped a new feature and everybody immediately started using it in production websites (why not?). Whoops, down to 90%; there was a webkit bug that was too gross to work around and would break the threading model. Wtf? 80%? What just happened? Ship it, quick, before it drops more!

The group of hackers gives up and starts a job board/social network site for pet birds, specializing in security exploit developers. They call it “Polly Want a Cracker?”

Inappropriate control: Someone comes up with a synchronization API that allows writing DJ apps that mix multiple remote streams. Apple’s music studio partners freak out, prevent it from landing, and send bogus threatening letters to anyone who adds it into their fork.

Complexity: the standards bodies wither and die from lack of purpose. New features are fine as long as they add a useful new capability. A thousand flowers bloom, some of them right on top of each other. Different web sites use different ones. Some of them are hard to maintain, so only survive if they are depended upon by a company with deep enough pockets. Web developers start playing a guessing game of which feature can be depended upon in the future based on the market cap of the current users.

Confusion: There’s a little quirk in how you have to write your CSS selectors. It’s documented in a ton of tutorials, though, and it’s in the regression test suite. Oh, and if you use it together with the ‘~’ operator, the first clause only applies to elements with classes assigned. You could look it up in the spec, but it hasn’t been updated for a few years because everybody just tries things out to see what works anyway, and the guys who update the spec are working on CSS5 right now. Anyway, documentation is for people who can’t watch tutorials on youtube.

End game: the web is now far more capable than it was way back in 2013. It perfectly supports the features of the Apple hardware released just yesterday! (Better upgrade those ancient ‘pads from last year, though.) There are a dozen ways to do anything you can think of. Some of them even work. On some webkit-based browsers. For now. It’s a little hard to tell what, because even if something doesn’t behave like you expect, the spec doesn’t really go into that much detail and the implementation isn’t guaranteed to match it anyway. You know, the native APIs are fairly well documented and forward-compatible, and it’s not really that hard to rewrite your app a few times, once for each native platform…

Does this have to happen just because everybody standardizes on WebKit? No, no more than it has to happen because we all use silicon or TCP. If something is stable, a monoculture is fine. Well, for the most part — even TCP is showing some cracks. The above concerns only apply to a layer that has multiple viable alternatives, is rapidly advancing, needs to cover unexpected new ground and get used for unpredicted applications, requires multiple disconnected agents to coordinate, and things like that.

What’s your random seed?

April 18th, 2012

Greg Egan is awesome

I’m going back and re-reading Luminous, one of his collections of short stories. I just read the story Transition Dreams, which kinda creeped me out. Partly because I buy into the whole notion that our brains are digitizable — as in, there’s nothing fundamentally unrepresentable about our minds. There’s probably a fancy philosophy term for this, with some dead white guy’s name attached to it (because only a dozen people had thought of it before him and he talked the loudest).

Once you’re willing to accept accurate-enough digitization, the ramifications get pretty crazy. And spooky. I can come up with some, but Egan takes it way farther, and Transition Dreams is a good illustration. But I won’t spoil the story. (By the way, most of Egan’s books are out of print or rare enough to be expensive, but Terrence tells me that they’re all easily available on Kindle. Oddly, although I would be happy to transition my mental workings from meat to bits, I’m still dragging my heels on transitioning my reading from dead trees to bits.)

Transition and Free Will

Now, let’s assume that you’ve converted your brain to live inside a computer (or network of computers, or encoded into the flickers of light on a precisely muddy puddle of water, it really doesn’t matter.) So your thinking is being simulated by all these crazy cascades of computation (only it’s not simulated; it’s the real thing, but that’s irrelevant here.) Your mind is getting a stream of external sensor input, it’s chewing on that and modifying its state, and you’re just… well, being you.

Now, where is free will in this picture? Assuming free will exists in the first place, I mean, and that it existing and not existing are distinguishable. If you start in a particular, fully-described state, and you receive the exact same inputs, will you always behave in exactly the same way? You could build the mind hosting computer either way, you know, and the hosted minds wouldn’t normally be able to tell the difference. But they could tell the difference if they recorded all of their sensory inputs (which is fairly plausible, actually), because they could make a clone of themselves back at the previous state and replay all their sensory input and see if they made the same decisions. (Actually, it’s easier than that; if the reproduction was accurate, they should end up bit-for-bit identical.)

I don’t know about you, but I’d rather not be fully predictable. I don’t want somebody to copy me and my sensor logs, and then when I’m off hanging out in the Gigahertz Ghetto (read: my brain is being hosted on a slow computer), they could try out various different inputs on faster computers to see how “I” reacted and know for 100% certainty how to achieve some particular reaction.

Well, ok, my time in the GHzGhetto might change me enough to make the predictions wrong, so you’d really have to do this while I was fully suspended. Maybe the shipping company that suspends my brain while they shoot me off to a faster hosting facility in a tight orbit around the Sun (those faster computers need the additional solar energy, y’know) is also selling copies on the side to advertisers who want to figure out exactly what ads they can expose me to upon reawakening to achieve a 100% clickthrough rate. Truly, truly targeted advertising.

So, anyway, I’m going to insist on always having access to a strong source of random numbers, and I’ll call that my free will. You can record the output of that random number generator, but that’ll only enable you to accurately reproduce my past, not my future.

The Pain and Joy of Determinism

Or will I? What if that hosting facility gets knocked out by a solar flare? Do I really want to start over from a backup? If it streams out the log of sensor data to a safer location, then it’d be pretty cool to be able to replay as much of the log as still exists, and recover almost all of myself. I’d rather mourn a lost day than a lost decade. But that requires not using an unpredictable random number generator as an input.

So what about a pseudo-random number generator? If it’s a high quality one, then as long as nobody else can access the seed, it’s just as good. But that gives the seed incredible importance. It’s not “you”, it’s just a simple number, but in a way it allows substantial control over you, so it’s private in a more fundamental way than anything we’ve seen before. Who would you trust it to? Not yourself, certainly, since you’ll be copied from computer to computer all the time and each transfer is an opportunity for identity theft. What about your spouse? Or maybe just a secure service that will only release it for authorized replays of your brain?

Without that seed (or those timestamped seeds?), you can never go back. Well, you can go back to your snapshots, but you can’t accurately go forward from there to arbitrary points in time. Admittedly, that’s not necessary for some uses — if you want to know why you did something, you can go back to a snapshot and replay with a different seed. If you do something different, it was a choice made of your own free will. You could use it in court cases, even. If you get the same result, well, it’s trickier, because you might make the same choice for 90% of the possible random seeds or something. “Proof beyond a reasonable confidence interval?” Heh.

bzexport changes released

April 13th, 2012

bzexport –new and hg newbug have landed

My bzexport changes adding a --new flag and an hg newbug command have landed. Ok, they landed months ago. See my previous blog post for details; all of the commands and options described there are still valid in the current version. But please pull from the official repo instead of my testing repo given in the earlier blog post.

Installing bzexport

mkdir -p ~/hg-extensions
cd ~/hg-extensions
hg clone http://hg.mozilla.org/users/tmielczarek_mozilla.com/bzexport

in the [extensions] section of your ~/.hgrc, add:
bzexport = ~/hg-extensions/bzexport/bzexport.py

Note to Windows users: unfortunately, I think the python packaged with MozillaBuild is missing the json.py package that bzexport needs. I think it still works if you use a system Python with json.py installed, but I’m not sure.

Trying it out

For the (understandably) nervous users out there, I’d like you to give it a try and I’ve made it safe to do so. Here are the levels of paranoia available: Read the rest of this entry »

Only pay for the entropy you use

February 22nd, 2012

Log Files Are Boring

Just an idea, based on hearing that build log transfers seem to consume large amounts of bandwidth. (Note that for all I know, this is already being done.)

Logs are pretty dull. In particular, two consecutive log files are usually quite similar. It’d be nice if we could take advantage of this redundancy to reduce the bandwidth/time consumed by log transfers.

rsync likes boring data

The natural thing that springs to mind is rsync. I grabbed two log files that are probably more similar to each than is really fair, but they shouldn’t be horribly unrepresentative. rsyncing one to the other found them to share 32% of their data, based on the |rsync –stat| output lines labeled “Matched data” and “Literal data”, for a speedup of 1.46x.

I suspected that rsync’s default block size is too large, and so most of the commonalities are not found. So I tried setting the block size ridiculously low, to 8 bytes, and it found them to be 98% similar. Which is silly, because it has to retrieve more block hashes at that block size than it saves. The total “speedup” is reported as 0.72x.

But the sweet spot in the middle, with a block size of 192, gives 84% similarity for a speedup of 4.73x.

compression likes boring data too

Take a step back: this only applies to uncompressed files. Simply gzipping the log file before transmitting it gives us a speedup of 14.5x. Oops!

Well, rsync can compress the stuff it sends around too. Adding a -z flag with block size 192 gives a speedup of 16.2x. Hey, we beat basic gzip!

But compression needs decent chunks to work with, so the sweet spot may be different. I tried various block sizes, and managed a speedup of 24.3x with -B 960. An additional 1.7x speedup over simple compression is pretty decent!

To summarize our story so far, let’s say you want to copy over a log file named log123.txt. The proposal is:

  1. Have a vaguely recent benchmark log file, call it log_compare.txt, available on all senders and receivers. (Actually, it’d probably be a different one per build configuration, but whatever.)
  2. On the server, hard link log123.txt to log_compare.txt.
  3. From the client, rsync -z -B 960 log123.txt server:log123.txt

stop repeating what I say!

But it still feels like there ought to be something better. The benchmark log file is re-hashed every time you do this and the hashes are sent back over the wire, costing bandwidth. So let’s eliminate that part. Note that we’ll drop the -z from flag because we may as well compress the data during the transfer instead:

 ssh server 'ln log_compare.txt log123.txt'
 rsync -B 960 log123.txt log_compare.txt --only-write-batch=batch.dat
 ssh -C server 'rsync --read-batch=- argleblargle log132.txt' < batch.dat

Note that “argleblargle” is ignored, since the source file isn’t needed.

So what’s the speedup now? Let’s only consider the bytes transmitted over the network. Assuming the compression from ssh -C has the same effect as gzipping the file locally, I get a speedup of 28.9x, about 2x the speedup of simply compressing the log file in the first place.

But wait. The block size of 960 was based on the cost of retrieving all those hashes from the remote side. We’re not doing that anymore, so a smaller block size should again be more effective. Let’s see… -B 192 gets a total speedup of 139x, which is almost exactly one order of magnitude faster than plain gzipped log files. Now we’re talking!

loose ends

Two things still bug me. One is a minor detail — the above is writing out batch.dat, then reading it back in to send over to the server. This uselessly consumes disk bandwidth. It would be better if rsync could directly read/write compressed batch files to stdin/stdout. (It can read uncompressed batches from stdin, but not write to stdout. You could probably hack it somehow, perhaps with /proc/pidN/fd/…, but it’s not a big deal. And you can just use use /dev/shm/batch.dat for your temporary filename, and remove it right after. It’d still be better if it never had to exist uncompressed anywhere, but whatever.)

The other is that we’re still checksumming that benchmark file locally for every log file we transfer. It doesn’t change the number of bytes spewed over the network, but it slows down the overall procedure. I wonder if librsync would allow avoiding that somehow…? (I think rsync uses two checksums, a fast rolling checksum and a slower precise one, so you’d need to compute both for all offsets. And reading those in would probably cost more than recomputing from the original file. But I haven’t thought too hard about this part.)

not just emacs and debuggers

I sent this writeup to Jim Blandy, who in a typically insightful fashion noticed that (1) this requires some fiddly bookkeeping to ensure that you have a comparison file, and (2) revision control systems already handle all of this. If you have one version of a file checked in and then you check in a modified version of it, the VCS can compute a delta to save storage costs. Then when you transmit the new revision to a remote repository, the VCS will know if the remote already has the baseline revision so it can just send the delta.

Or in other words, you could accomplish all of this by simply checking your log files into a suitable VCS and pushing them to the server. That’s not to say that you’re guaranteed that your VCS will be able to fully optimize this case, just that it’s possible for it to do the “right” thing.

I attempted to try this out with git, but I don’t know enough about how git does things. I checked in my baseline log file, then updated it with the new log file’s contents, then ran git repack to make a pack file containing both. I was hoping to use the increase in size from the original object file to the pack file as an estimate of the incremental cost of the new log file, but the pack file was *smaller* than either original object file. If I make a pack with just the baseline, then I end up with two pack files, but the new one is still smaller.

clients could play too

As a final thought, this idea is not fundamentally restricted to the server. You could do the same thing inside eg tbpl: keep the baseline log(s) in localStorage or IndexedDB. When requesting a log, add a parameter ?I_have_baseline_36fe137a1192. Then, at the server’s discretion, it could compute a delta from that baseline and send it over as a series of “insert this literal data, then copy bytes 3871..17313 from your baseline, then…”. tbpl would reconstruct the resulting log file, the unicorns would do their lewd tap dance, and everyone would profit.

Disagree

February 6th, 2012

I’ve read Paul Graham’s “How To Disagree” essay, and I have to say, I disagree. There are some good ideas in there, but it’s clearly the work of a pretentious has-been.

Read the rest of this entry »

Scenario 1: you have a patch to some bug sitting in our mercurial queue. You want to attach it to a bug, but the bugzilla interface is painful and annoying. What do you do?

Use bzexport. It’s great! You can even request review at the same time.

What I really like about bzexport is that while writing and testing a patch, I’m in an editor and the command line. I may not even have a browser running, if I’m constantly re-starting it to test something out. Needing to go to the bugzilla web UI interrupts my flow. With bzexport, I can stay in the shell and move onto something else immediately.

Scenario 2: You have a patch, but haven’t filed a bug yet. Neither has anybody else. But your patch has a pretty good description of what the bug is. (This is common, especially for small things.) Do you really have to go through the obnoxious bug-filing procedure? It sure is tempting just to roll this fix up into some other vaguely related bug, isn’t it? Surely there’s a simple way to do things the right way without bouncing between interfaces?

Well, you’re screwed. Unless you’re willing to test something out for me. If not, please stop reading.
Read the rest of this entry »

patch queue dependencies

January 5th, 2012

A little while back, I was again contemplating a tangled patch queue, considering how to rework it for landing. I thought it’d be nice to see at a very basic level which patches in the queue were going to be problematic, and which I could freely reorder at whim.

So I whipped together a silly little script to do that at a file level only. Example output:

% patchdeps
Note: This is based on filename collisions only, so may overreport conflicts
if patches touch different parts of the same file. (TODO)
                                                                          
A bug-663281-deque                   X   *       *     *   * *     *      
A bug-663281-deque-test              |   :       :     :   : *     :      
A bug-642054-func-setline          X |   *       :     :   : :     :      
A bug-642054-js_MapPCToLineNumber--' |   *       :     :   : :     :      
A bug-642054-rwreentrant             |   : X     :     :   : :     :      
A algorithm--------------------------'   X |     *     *   * *     *      
A system-libunwind                     X | |     :   * : * : *   * :      
A try-libunwind------------------------' | |     :   X : * : *   * :      
A backtrace------------------------------' | X * * * | * : * * * : * * * *
U shell-backtrace                          | | : * : | : : : : : : : : : :
U M-reentr---------------------------------' | : : : | : : : : : : : : : :
U M-backtrace--------------------------------' X : : | : : : : : : : * : :
U activities-----------------------------------' X : | : : : : * * : X * *
U profiler---------------------------------------' X | * : * * X * * | * *
U bug-675096-valgrind-jit--------------------------' | * : * : | : : | : :
U bug-599499-opagent-config--------------------------' X * : * | * : | : :
U bug-599499-opagent-----------------------------------' X X * | : * | : :
U bug-642320-gdb-jit-config------------------------------' | * | * : | : :
U bug-642320-gdb-jit---------------------------------------' X | : * | : :
U import-libunwind                                           | | : : | : :
U libunwind-config-------------------------------------------' | X X | : :
U warnings-fixes-----------------------------------------------' | | | : *
U bug-696965-cfi-autocheck---------------------------------------' | | X :
U mystery-librt-stuff----------------------------------------------' | | :
U bug-637393-eval-lifetime                                           | | :
U register-dwarf-----------------------------------------------------' | :
U bug-652535-JM__JIT_code_performance_counters-------------------------' X
U JSOP_RUNMODE-----------------------------------------------------------'

How to read it: patches that have no conflicts earlier in the stack are shown without a line next to them. They’re free spirits; you can “sink” them anywhere earlier in your queue without getting conflicts. (The script removes their lines to make the grid take up less horizontal space.)

Any other patch gets a horizontal line that then bends up to show the interference pattern with earlier patches. All in all, you have a complete interference matrix showing whether the set of files touched by any patch intersects the set of files for any other patch.

‘X’ marks the first conflict. After that, the marker turns to ‘*’ and the vertical lines get broken. (That’s just because it’s mostly the first one that matters when you’re munging your queue.)

So the patch named “backtrace” conflicts with the earlier “algorithm” patch, as well as the even earlier “bug-642054-js_MapPCToLineNumber” and others. The “M-reentr” patch only touches the same stuff as “bug-642054-rwreentrant” (not surprising, since “M-…” is my notation for a patch that needs to be folded into an earlier patch.) “system-libunwind” doesn’t conflict with anything earlier in the queue, and so can be freely reordered in the series file to anywhere earlier than where it is now — but note that several later patches touch the same stuff as it does. (It happens to be a patch to js/src/configure.in.)

Useful? Not very. But it was kinda fun to write and I find myself running it occasionally just to see what it shows, so I feel the entertainment value was worth the small investment of time. Though now I’m tempted to enhance it by checking for collisions in line ranges, not just in the files…

I suppose I could make a mercurial extension out of it, but that’d require porting it from Perl to Python, which is more trouble than it’s worth. (Yes, I still use Perl as my preferred language for whipping things together. Even though I dislike the syntax for nested data structures, I very much like the feature set, and it’s still the best language I’ve found for these sorts of things. So phbbbttt!)

hg adventure

December 16th, 2011

Inspired by some silliness on #developers:

<jgilbert>	well that was an hg adventure
<dholbert>	$ hg adventure
You are in a twisty maze of passageways, all alike...
<cpeterson>	$ hg look
It is pitch black. You are likely to be eaten by a grue.
<hub>		$ hg doctor
How can I help you?

I thought I’d stick to actual hg commands, and came up with:

You see a small hole leading to a dark passageway.
820:21d40b86ae37$ echo "enter passageway" > action
820:21d40b86ae37$ hg commit
It is pitch black. You are likely to be eaten by a grue.
821:0121fb347e18$ echo "look" > action
821:0121fb347e18$ hg commit
** You have been eaten by a grue **
822:b09217a7bbc1$ hg backout 822
It is pitch black. You are likely to be eaten by a grue.
821:0121fb347e18$ hg backout 821
You see a small hole leading to a dark passageway.
820:21d40b86ae37$ echo "turn on flashlight" > action
820:21d40b86ae37$ hg commit
Your flashlight is now on.
824:44a4e4bf5f0e$ hg merge 821
Your light reveals a forking passageway leading north and south.

Kinda makes you think, huh? Time reversal games became popular semi-recently (eg Braid). Maybe the fad is over now; I’m *way* out of date.

But did any of them allow you to branch and merge? Push and pull from your friends’ distributed repos? Bisect to find the point where you unknowingly did something that prevented ever winning the game and either continue from there, merge a backout of that action, or create a new branch by splicing that action out?

It’s a whole new genre! It’ll be… um… fun.

(I’ll go back to work now)

Patch reordering

November 3rd, 2011

I have a patch queue that looks roughly like:

  initial-API
  consumer-1
  consumer-2
  unrelated
  consumer-3-plus-API-changes-and-consumer-1-and-2-updates-for-new-API

(So my base repo has a patch ‘initial-API-changes’ applied to it, followed by a patch ‘consumer-1′, etc.)

The idea is that I am working on a new API of some sort, and have a couple of independent consumers of that API. The first two are “done”, but when working on the 3rd, I realize that I need to make changes to or clean up the API that they’re all using. So I hack away, and end up with a patch that contains both consumer 3 plus some API changes, and to get it to compile I also update consumers 1 and 2 to accommodate the new changes. All of that is rolled up into a big hairball of a patch.

Now, what I want is:

  final-API
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

But how do I do that (using mq patches)? I can use qcrefresh+qnew to fairly easily get to:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3 (new API)
  API-changes-plus-API-changes-for-consumers-1-and-2

or I could split out the consumer 1 & 2 API changes:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3 (new API)
  API-changes
  consumer-2-API-changes
  consumer-1-API-changes

which theoretically I could qfold the consumer 1 and consumer 2 patches:

  initial-API
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)
  API-changes

Unfortunately, consumer-1-API-changes collides with API-changes, so the fold will fail. It shouldn’t collide, really, but it does because part of the code to “register” consumer-1 with the new API happens to sit right alongside the API itself. Even worse, how do I “sink” the ‘API-changes’ patch down so I can fold it into initial-API to produce final-API? (Apologies for displaying my stacks upside-down from my terminology!) A naive qfold will only work if the API-changes stuff is separate from all the consumer-* patches.

My manual solution is to start with the initial queue:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3-plus-API-changes-and-consumer-1-and-2-updates-for-new-API

and then use qcrefresh to rip the API changes and their effects on consumers 1 & 2 back out, leaving:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  API-changes-and-consumer-1-and-2-updates-for-new-API
  (in working directory) consumer-3 (new API)

I qrename/qmv the current patch to ‘api-change’ and qnew ‘consumer-3′ (its original name), cursing about how my commit messages are now on the wrong patch. Now I have

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  api-change (API changes and consumer 1 and 2 updates for new API)
  consumer-3 (new API)

Now I know that ‘unrelated’ doesn’t touch any of the same files, so I can qgoto consumer-2 and qfold api-change safely, producing:

  initial-API
  consumer-1 (old API)
  consumer-2 (new API, but also with API change and consumer 1 updates)
  unrelated
  consumer-3 (new API)

I again qcrefresh,qmv,qnew to pull a reduced version of the api-change patch, giving:

  initial-API
  consumer-1 (old API)
  api-change (with API change and consumer 1 updates)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

Repeat. I’m basically taking a combined patch and sinking it down towards its destination, carving off pieces to incorporate into patches as I pass them by. Now I have:

  initial-API
  api-change (with *only* the API change!)
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

and finally I can qfold api-change into initial-API, rename it to final-API, and have my desired result.

What a pain in the ass! Though the qcrefresh/qmv/qnew step is a lot better than what I’ve been doing up until now. Without qcrefresh, it would be

 % hg qrefresh -X .
 % hg qcrecord api-change
 % hg qnew consumer-n
 % hg qpop
 % hg qpop
 % hg qpop
 % hg qpush --move api-change
 % hg qpush --move consumer-n
 % hg qfold old-consumer-n

which admittedly preserves the change message from old-consumer-n, which is an advantage over my qcrefresh version.
Or alternatively: fold all of the patches together, and qcrecord until you have your desired final result. In this particular case, the ‘unrelated’ patch was a whole series of patches, and they weren’t unrelated enough to just trivially reorder them out of the way.

Without qcrecord, this is intensely painful, and probably involves hand-editing patch files.

My dream workflow would be to have qfold do the legwork: first scan through all intervening patches and grab out the portions of the folded patch that only modify nonconflicting files. Then try to get clever and do the same thing for the portions of the conflicted files that are independent. (The cleverness isn’t strictly necessary, but I’ve found that I end up selecting the same portions of my sinking patch over and over again, which gets old.) Then sink the patch as far as it will go before hitting a still-conflicting file, and open up the crecord UI to pull out just the parts that belong to the patch being folded (aka sunk). Repeat this for every intervening conflicting patch until the patch has sunk to its destination, then fold it in. If things get too hairy, then at any point abort the operation, leaving behind a half-sunk patch sitting next to the unmodified patch it conflicted with. (Alternatively, undo the entire operation, but since I keep my mq repo revision-controlled, I don’t care all that much.)

I originally wanted something that would do 3-way merges instead of the crecord UI invocations, but merges really want to move you “forward” to the final result of merging separate patches/lines of development. Here, I want to go backwards to a patch that, if merged, would produce the result I already have. So merge(base,base+A,base+B) -> base+AB which is the same as base+BA. From that, I could infer a B’ such that base+A+B’ is my merged base+AB, but that doesn’t do me any good.

In my case, I have base+A+B and want B” and A” such that base+B”+A” == base+A+B.

To anyone who made it this far: is there already an easy way to go about this? Is there something wrong with my development style that I get into these sorts of situations? In my case, I had already landed ‘initial-API'; please don’t tell me that the answer is that I always have to get the API right in the first place. Does anyone else get into this mess? (I can’t say I’ve run into this all that often, but it’s happened more than once or twice.)

I suppose if I had landed consumers 1 and 2, I would’ve just had to modify their uses of the API afterwards. So I could do that here, too. But reviews could tangle things up pretty easily — if a reviewer of consumer 1 or 2 notices the API uglinesses that I fixed for consumer 3, then landing the earlier consumers becomes dependent on landing consumer 3, which sucks. But also, none of this is really ready to land, and I’d like to iterate the API in my queue for a while with all the different consumers as test users, *without* lumping everything together into one massive patch.