patch queue dependencies

January 5th, 2012

A little while back, I was again contemplating a tangled patch queue, considering how to rework it for landing. I thought it’d be nice to see at a very basic level which patches in the queue were going to be problematic, and which I could freely reorder at whim.

So I whipped together a silly little script to do that at a file level only. Example output:

% patchdeps
Note: This is based on filename collisions only, so may overreport conflicts
if patches touch different parts of the same file. (TODO)
                                                                          
A bug-663281-deque                   X   *       *     *   * *     *      
A bug-663281-deque-test              |   :       :     :   : *     :      
A bug-642054-func-setline          X |   *       :     :   : :     :      
A bug-642054-js_MapPCToLineNumber--' |   *       :     :   : :     :      
A bug-642054-rwreentrant             |   : X     :     :   : :     :      
A algorithm--------------------------'   X |     *     *   * *     *      
A system-libunwind                     X | |     :   * : * : *   * :      
A try-libunwind------------------------' | |     :   X : * : *   * :      
A backtrace------------------------------' | X * * * | * : * * * : * * * *
U shell-backtrace                          | | : * : | : : : : : : : : : :
U M-reentr---------------------------------' | : : : | : : : : : : : : : :
U M-backtrace--------------------------------' X : : | : : : : : : : * : :
U activities-----------------------------------' X : | : : : : * * : X * *
U profiler---------------------------------------' X | * : * * X * * | * *
U bug-675096-valgrind-jit--------------------------' | * : * : | : : | : :
U bug-599499-opagent-config--------------------------' X * : * | * : | : :
U bug-599499-opagent-----------------------------------' X X * | : * | : :
U bug-642320-gdb-jit-config------------------------------' | * | * : | : :
U bug-642320-gdb-jit---------------------------------------' X | : * | : :
U import-libunwind                                           | | : : | : :
U libunwind-config-------------------------------------------' | X X | : :
U warnings-fixes-----------------------------------------------' | | | : *
U bug-696965-cfi-autocheck---------------------------------------' | | X :
U mystery-librt-stuff----------------------------------------------' | | :
U bug-637393-eval-lifetime                                           | | :
U register-dwarf-----------------------------------------------------' | :
U bug-652535-JM__JIT_code_performance_counters-------------------------' X
U JSOP_RUNMODE-----------------------------------------------------------'

How to read it: patches that have no conflicts earlier in the stack are shown without a line next to them. They’re free spirits; you can “sink” them anywhere earlier in your queue without getting conflicts. (The script removes their lines to make the grid take up less horizontal space.)

Any other patch gets a horizontal line that then bends up to show the interference pattern with earlier patches. All in all, you have a complete interference matrix showing whether the set of files touched by any patch intersects the set of files for any other patch.

‘X’ marks the first conflict. After that, the marker turns to ‘*’ and the vertical lines get broken. (That’s just because it’s mostly the first one that matters when you’re munging your queue.)

So the patch named “backtrace” conflicts with the earlier “algorithm” patch, as well as the even earlier “bug-642054-js_MapPCToLineNumber” and others. The “M-reentr” patch only touches the same stuff as “bug-642054-rwreentrant” (not surprising, since “M-…” is my notation for a patch that needs to be folded into an earlier patch.) “system-libunwind” doesn’t conflict with anything earlier in the queue, and so can be freely reordered in the series file to anywhere earlier than where it is now — but note that several later patches touch the same stuff as it does. (It happens to be a patch to js/src/configure.in.)

Useful? Not very. But it was kinda fun to write and I find myself running it occasionally just to see what it shows, so I feel the entertainment value was worth the small investment of time. Though now I’m tempted to enhance it by checking for collisions in line ranges, not just in the files…

I suppose I could make a mercurial extension out of it, but that’d require porting it from Perl to Python, which is more trouble than it’s worth. (Yes, I still use Perl as my preferred language for whipping things together. Even though I dislike the syntax for nested data structures, I very much like the feature set, and it’s still the best language I’ve found for these sorts of things. So phbbbttt!)

hg adventure

December 16th, 2011

Inspired by some silliness on #developers:

<jgilbert>	well that was an hg adventure
<dholbert>	$ hg adventure
You are in a twisty maze of passageways, all alike...
<cpeterson>	$ hg look
It is pitch black. You are likely to be eaten by a grue.
<hub>		$ hg doctor
How can I help you?

I thought I’d stick to actual hg commands, and came up with:

You see a small hole leading to a dark passageway.
820:21d40b86ae37$ echo "enter passageway" > action
820:21d40b86ae37$ hg commit
It is pitch black. You are likely to be eaten by a grue.
821:0121fb347e18$ echo "look" > action
821:0121fb347e18$ hg commit
** You have been eaten by a grue **
822:b09217a7bbc1$ hg backout 822
It is pitch black. You are likely to be eaten by a grue.
821:0121fb347e18$ hg backout 821
You see a small hole leading to a dark passageway.
820:21d40b86ae37$ echo "turn on flashlight" > action
820:21d40b86ae37$ hg commit
Your flashlight is now on.
824:44a4e4bf5f0e$ hg merge 821
Your light reveals a forking passageway leading north and south.

Kinda makes you think, huh? Time reversal games became popular semi-recently (eg Braid). Maybe the fad is over now; I’m *way* out of date.

But did any of them allow you to branch and merge? Push and pull from your friends’ distributed repos? Bisect to find the point where you unknowingly did something that prevented ever winning the game and either continue from there, merge a backout of that action, or create a new branch by splicing that action out?

It’s a whole new genre! It’ll be… um… fun.

(I’ll go back to work now)

Patch reordering

November 3rd, 2011

I have a patch queue that looks roughly like:

  initial-API
  consumer-1
  consumer-2
  unrelated
  consumer-3-plus-API-changes-and-consumer-1-and-2-updates-for-new-API

(So my base repo has a patch ‘initial-API-changes’ applied to it, followed by a patch ‘consumer-1′, etc.)

The idea is that I am working on a new API of some sort, and have a couple of independent consumers of that API. The first two are “done”, but when working on the 3rd, I realize that I need to make changes to or clean up the API that they’re all using. So I hack away, and end up with a patch that contains both consumer 3 plus some API changes, and to get it to compile I also update consumers 1 and 2 to accommodate the new changes. All of that is rolled up into a big hairball of a patch.

Now, what I want is:

  final-API
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

But how do I do that (using mq patches)? I can use qcrefresh+qnew to fairly easily get to:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3 (new API)
  API-changes-plus-API-changes-for-consumers-1-and-2

or I could split out the consumer 1 & 2 API changes:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3 (new API)
  API-changes
  consumer-2-API-changes
  consumer-1-API-changes

which theoretically I could qfold the consumer 1 and consumer 2 patches:

  initial-API
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)
  API-changes

Unfortunately, consumer-1-API-changes collides with API-changes, so the fold will fail. It shouldn’t collide, really, but it does because part of the code to “register” consumer-1 with the new API happens to sit right alongside the API itself. Even worse, how do I “sink” the ‘API-changes’ patch down so I can fold it into initial-API to produce final-API? (Apologies for displaying my stacks upside-down from my terminology!) A naive qfold will only work if the API-changes stuff is separate from all the consumer-* patches.

My manual solution is to start with the initial queue:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  consumer-3-plus-API-changes-and-consumer-1-and-2-updates-for-new-API

and then use qcrefresh to rip the API changes and their effects on consumers 1 & 2 back out, leaving:

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  API-changes-and-consumer-1-and-2-updates-for-new-API
  (in working directory) consumer-3 (new API)

I qrename/qmv the current patch to ‘api-change’ and qnew ‘consumer-3′ (its original name), cursing about how my commit messages are now on the wrong patch. Now I have

  initial-API
  consumer-1 (old API)
  consumer-2 (old API)
  unrelated
  api-change (API changes and consumer 1 and 2 updates for new API)
  consumer-3 (new API)

Now I know that ‘unrelated’ doesn’t touch any of the same files, so I can qgoto consumer-2 and qfold api-change safely, producing:

  initial-API
  consumer-1 (old API)
  consumer-2 (new API, but also with API change and consumer 1 updates)
  unrelated
  consumer-3 (new API)

I again qcrefresh,qmv,qnew to pull a reduced version of the api-change patch, giving:

  initial-API
  consumer-1 (old API)
  api-change (with API change and consumer 1 updates)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

Repeat. I’m basically taking a combined patch and sinking it down towards its destination, carving off pieces to incorporate into patches as I pass them by. Now I have:

  initial-API
  api-change (with *only* the API change!)
  consumer-1 (new API)
  consumer-2 (new API)
  unrelated
  consumer-3 (new API)

and finally I can qfold api-change into initial-API, rename it to final-API, and have my desired result.

What a pain in the ass! Though the qcrefresh/qmv/qnew step is a lot better than what I’ve been doing up until now. Without qcrefresh, it would be

 % hg qrefresh -X .
 % hg qcrecord api-change
 % hg qnew consumer-n
 % hg qpop
 % hg qpop
 % hg qpop
 % hg qpush --move api-change
 % hg qpush --move consumer-n
 % hg qfold old-consumer-n

which admittedly preserves the change message from old-consumer-n, which is an advantage over my qcrefresh version.
Or alternatively: fold all of the patches together, and qcrecord until you have your desired final result. In this particular case, the ‘unrelated’ patch was a whole series of patches, and they weren’t unrelated enough to just trivially reorder them out of the way.

Without qcrecord, this is intensely painful, and probably involves hand-editing patch files.

My dream workflow would be to have qfold do the legwork: first scan through all intervening patches and grab out the portions of the folded patch that only modify nonconflicting files. Then try to get clever and do the same thing for the portions of the conflicted files that are independent. (The cleverness isn’t strictly necessary, but I’ve found that I end up selecting the same portions of my sinking patch over and over again, which gets old.) Then sink the patch as far as it will go before hitting a still-conflicting file, and open up the crecord UI to pull out just the parts that belong to the patch being folded (aka sunk). Repeat this for every intervening conflicting patch until the patch has sunk to its destination, then fold it in. If things get too hairy, then at any point abort the operation, leaving behind a half-sunk patch sitting next to the unmodified patch it conflicted with. (Alternatively, undo the entire operation, but since I keep my mq repo revision-controlled, I don’t care all that much.)

I originally wanted something that would do 3-way merges instead of the crecord UI invocations, but merges really want to move you “forward” to the final result of merging separate patches/lines of development. Here, I want to go backwards to a patch that, if merged, would produce the result I already have. So merge(base,base+A,base+B) -> base+AB which is the same as base+BA. From that, I could infer a B’ such that base+A+B’ is my merged base+AB, but that doesn’t do me any good.

In my case, I have base+A+B and want B” and A” such that base+B”+A” == base+A+B.

To anyone who made it this far: is there already an easy way to go about this? Is there something wrong with my development style that I get into these sorts of situations? In my case, I had already landed ‘initial-API’; please don’t tell me that the answer is that I always have to get the API right in the first place. Does anyone else get into this mess? (I can’t say I’ve run into this all that often, but it’s happened more than once or twice.)

I suppose if I had landed consumers 1 and 2, I would’ve just had to modify their uses of the API afterwards. So I could do that here, too. But reviews could tangle things up pretty easily — if a reviewer of consumer 1 or 2 notices the API uglinesses that I fixed for consumer 3, then landing the earlier consumers becomes dependent on landing consumer 3, which sucks. But also, none of this is really ready to land, and I’d like to iterate the API in my queue for a while with all the different consumers as test users, *without* lumping everything together into one massive patch.

I think I’m missing something. How do people get those changeset URLs to paste into bugs? Ok, if I’m landing on mozilla-central or a project branch, I just get it from tbpl since I’ll be staring at it anyway. But what about some other repo? Like, say, ssh://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog?

As usual, I coded my way around the problem before asking the question, which is stupid and backwards. But just in case there really isn’t a good way, here’s my silly hackaround. Put this in the [alias] section of your ~/.hgrc then, after landing a change, do ‘hg urls -l 3′ or similar. (That’ll give you the latest 3 changesets):

  urls = !$HG log --template='{node|short} {desc|firstline}\n' ${HG_ARGS/urls /} | perl -lpe 'BEGIN { ($url = shift) =~ s/^\w+/http/ }; s!^(?=\w+)!$url/rev/!' `hg path default`

Picking that apart, it removes the misfeature that $HG_ARGS contains the command you’re running, then passes the remaining command line to hg log with a template set to just print out the changeset shorthash and the first line of the commit message. It sends that and the URL of the default upstream repo through a perl command that rewrites the hg log output to “/rev/ “. Oh, it changes the first part of the repo URL to http because my test case is actually an SSH url and there just happens to be an HTTP server at the same url.

A mess, but it works for me.

And yes, I should switch to a blog that isn’t hostile to code. Sorry about that line up there.

Record your freshness

May 26th, 2011

I often like to split patches up into independent pieces, for ease of reviewing by both reviewers and myself. You can split off preparatory refactorings, low-level mechanism from high-level users, features from tests, etc., making it much easier to evaluate the sanity of each piece.

But it’s something of a pain to do. If I’ve been hacking along and accumulated a monster patch, with stock hg and mq I’d do:

  hg qref -X '*'                  # get all the changes in the working directory; only
                                  # needed if you've been qref'ing along the way
  hg qref -I '...pattern...'      # put in any touched files
  hg qnew temp                    # stash away the rest so you can edit the patch
  hg qpop
  hg qpop                         # go back to unpatched version
  emacs $(hg root --mq)/patchname # hack out the pieces you don't want,
                                  # put them in /tmp/p or somewhere...
  hg qpush                        # reapply just the parts you want
  patch -p1 < /tmp/p
  ...                             # you get the point. There'll be a qfold somewhere in here...

and on and on. It’s a major pain. I even started working on a web-based patch munging tool because I was doing it so often.

Then I discovered qcrecord, part of the crecord extension. It is teh awesome with a capital T (and A, but this is a family blog). It gives you a mostly-spiffy-but-slightly-clunky curses (textual) interface to select which files to include, and within those files which patch chunks to include, and within those chunks which individual lines to include. That last part, especially, is way cool — it lets you do things that you’d have to be crazy to attempt working with the raw patches, and are a major nuisance with the raw files.

Assuming you are again starting with a huge patch that you’ve been qreffing, the workflow goes something like:

  hg qref -X '*'
  hg qcrecord my-patch-part1
  hg qcrecord my-patch-part2
  hg qcrecord my-patch-part3
  hg qpop -a
  hg qrm original-patchname
  hg qpush -a

Way, way nicer. No more dangerous direct edits of patch files. But what’s that messy business about nuking the original patch? Hold that thought.

Now that you have a nicely split-up patch series, you’ll be wanting to edit various parts of it. As usual with mq, you qpop or qgoto to the patch you want to hack on, then edit it, and finally qref (qrefresh). But many times you’ll end up putting in some bits and pieces that really belong in the other patches. So if you were working on my-patch-part2 and made some changes that really belong in my-patch-part3, you do something like:

  hg qcrecord piece-meant-for-part3             # only select the part intended for part3
  hg qnew remaining-updates-for-part2           # make a patch with the rest of the updates, to go into part2
  hg qgoto my-patch-part2
  hg qpush --move remaining-updates-for-part2   # now we have part2 and its updates adjacent
  hg qpop
  hg qfold remaining-updates-for-part2          # fold them together, producing a final part2
  hg qpush
  hg qfold my-patch-part3                       # fold in part3 with its updates from the beginning
  hg qmv my-patch-part3                         # and rename, mangling the comment

or at least, that’s what I generally do. If I were smarter, I would use qcrecord to pick out the remaining updates for part2, making it just:

  hg qcrecord more-part2    # select everything intended for part2
  hg qnew update-part3      # make a patch with the rest, intended for part3
  hg qfold my-patch-part3   # fold to make a final part3
  hg qmv my-patch-part3     # ...with the wrong name, so fix and mess up the comment
  hg qgoto my-patch-part2
  hg qfold more-part2       # and make a final part2

but that’s still a mess. The fundamental problem is that, as great as qcrecord is, it always wants to create a new patch. And you don’t.

Enter qcrefresh. It doesn’t exist, but you can get it by replacing your stock crecord with

  hg clone https://sfink@bitbucket.org/sfink/crecord # Obsolete!

Update: it has been merged into the main crecord repo! Use

  hg clone https://bitbucket.org/edgimar/crecord

It does the obvious thing — it does the equivalent of a qrefresh, except it uses the crecord interface to select what parts should end up in the current patch. So now the above is:

  hg qcref                 # Keep everything you want for the current patch
  hg qnew update-part3
  hg qfold my-patch-part3
  hg qmv my-patch-part3

Still a little bit of juggling (though you could alias the latter 3 commands in your ~/.hgrc, I guess.) It would be nice if qfold had a “reverse fold” option.

Finally, when splitting up a large patch you often want to keep the original patch’s name and comment, so you’d really do:

  hg qcref                 # keep just the parts you want in the main patch
  hg qcrec my-patch-part2  # make a final part2
  hg qcrec my-patch-part3  # make a final part3

And life is good.

Wading through history

April 20th, 2011

Recently — well, actually, by now it wasn’t recently at all — I received a review request for a patch to JSD. It fixed an intermittent crash when using Firebug on a page that went into an endless stack-eating loop. A couple of people had worked on reproducing it, and the exact conditions were a little flaky, so I first tried it out myself. Kaboom! Yay!

So I imported the patch just to verify that it fixed the problem. Before compiling with it, I updated my tree to the latest version. Why? I don’t know. Just because it’s what I usually do. It seemed like a good idea at the time.

Only it wasn’t. It was a really, really dumb idea. I was changing two variables while trying to test one of them, and I got what I deserved: it stopped crashing after the patch, but when digging in to verify that it really was behaving as intended, I discovered it still wasn’t crashing.

This was just before the All Hands, and although I poked at it every few days, I didn’t make any headway: the patch seemed good, but I really wanted to confirm that it fixed the crash. (There were reasons why I was a little skeptical, but it’s not really relevant here.)

Eventually, when I had some time to think about it properly, I realized the best thing to do would be to revert to the older version that crashed for me. But how to find it?

One way would be to binary search nightlies. But I happened to be on a poor network connection, and downloading nightlies was insanely slow.

Also, I thought I should be able to do better. I run with an mq extension (mq = Mercurial Queues) that commits my patch queue on any change. Get it at git://github.com/hotsphink/mqext.git (I really should switch to bitbucket, rather than pointlessly restricting my audience to people who are minimally comfortable with both git and hg.) So all I had to do was to go back to the point where I imported the patch from bugzilla.

Finding the right moment was easy: ‘hg log –mq’ showed me all the changes made to my patch queue, one of which was commented “IMPORT: bz://643360″ (an autogenerated comment courtesy of mqext.)  That was changeset 026ac43e9114. Yay!

But that changeset is for my patch queue, not my source repo. Fortunately, mq stores ‘parent’ fields in patch files that give the source repo changeset id that a patch was applied on top of. I’ll skip a number of failed attempts to track through this, and just give my final recipe:

  1. (already described) hg log –mq to find the appropriate changeset in the patch queue repo.
  2. cd to .hg/patches and run hg cat -r changeset series. This is because you need to know the names of the patch files in order to look at them — or specifically, the name of the first patch file, because it’s the only one whose parent will still be in the source repo. All other patches’ parents will be the source repo with mq patches applied to them, and will have been stripped out of the repo due to intervening actions. Because hg (or rather, mq) is not interested in preserving history.
  3. hg cat -r firstpatchname and look for the “# Parent changeset” line.
  4. cd back to your source repo and fetch that revision however you want — update to it, or clone a repo with it, or whatever.

I’m guessing this little recipe isn’t going to be useful to very many people, but I wanted to write it out for myself. So phbbbtt!!!

 

Work Configuration

March 8th, 2011

Inspired by Nicholas Nethercote’s description of how he sets up his tracemonkey work environment, I thought I’d describe my work configuration and how it differs from njn’s.

Like Nick, I work almost entirely off of the tracemonkey tree these days, and mostly within js/src. I don’t use the js shell all that much compared to the full browser, though, so I tend to do things with the whole tree.

working repositories

Similar to Nick, I have a ~/src/ directory populated with clones of the tracemonkey repo. I have one, “TM-upstream/”, that follows the upstream tracemonkey repository. In fact, I use cron to pull updates hourly. The rest are created as clones of TM-upstream, or sometimes of each other. I vary in how I create these. Some are created via ‘hg clone TM-upstream TM-whatever’, although for whatever reason I usually do ‘cp -rlp TM-upstream TM-whatever’ and then edit TM-whatever/.hg/hgrc to change the ‘default’ path to TM-upstream. The ‘cp’ method is faster, but the end result is pretty much the same. Sometimes I copy the mq subdirectory (.hg/patches) from the repo I’m cloning, sometimes I create a new one from scratch. And sometimes I don’t use one at all.

Oh, and with emacs I had to do

  (setq vc-make-backup-files t)

to make it break hardlinks when modifying files. Breaking hardlinks is normally the default, but it seems like vc mode has a different default that is really really bad if you’re using ‘cp -rlp’ to clone your repos.

All of my (tracemonkey-based) repos start with “TM-”, probably because I use my src/ subdirectory for checkouts of various other projects (bugzilla-tweaks, archer-mozilla, archer, firebug, addon-sdk, etc.). Not all of those are hg-based; I have several git repos and even an svn checkout or two. For the Mozilla tree, I tend to only actively use one or two repos at a time; the rest are for dormant unfinished work.

I made a shell function ‘pullup’ that does ‘(cd $(hg path default) && hg pull)’, which goes to the default upstream repo (probably TM-upstream, unless this is a clone of a clone) and updates its objects. (Note the lack of a -u; I don’t want to update the working directory for the upstream repo without a good reason.) To update my working repo, I’ll ‘hg qpush -a’ to apply as many patches as I can, then probably ‘hg qpop’ to pop off the last one because it failed. (I tend to have a small pile of heavily bitrotted patches lurking around at the end of my series file.) Then I’ll do ‘pullup’ to update the upstream repo and ‘hg pull –rebase’ to merge the changes into my patch queue. My ~/.hgrc sets my merge tool to kdiff3, so any conflicts will pop up the visual merge editor.

I push changes directly from my working repo by using

  hg qpop
  hg show | head
  hg qref -e # if needed

to fix up the commit messages, then qpush everything back on that I’m committing. (I tend to break up my commits into at least 2 pieces, so I usually push more than one change at a time.) Then I do ‘hg qfinish -a’, do my last round of testing, and ‘hg push tracemonkey’ (tracemonkey is set in the [paths] section of my ~/.hgrc).

I don’t bother to run ‘hg outgoing’, because I only commit patches that I’m about to push. I suppose if I were collaborating with someone else, I might get some extra crud that I’d need to worry about, but so far I’ve always done that through patches imported into my patch queue.

object directories

I place my object directories underneath the source directory, so that I can use hg commands while my working directory is underneath the object directory. I mostly use plain ‘~/src/TM-whatever/obj’, which is almost always a debug build. If I need an opt build, it’ll be ‘obj-opt’ in place of ‘obj’. Rarely, I’ll make ‘obj-somethingelse’ for special purposes.

Prefixing things with ‘obj’ helps when moving stuff between machines, because I can do

  rsync -av --exclude='/obj*' TM-whatever desthost:/some/where

building

When underneath obj/js/src, I’ll just run ‘make’ or ‘make -j16′ or whatever to rebuild (even when testing with the browser, because my mozconfig always has ‘ac_add_options –enable-shared-js’ so rebuilding here is enough. In fact, I tend to forget to remove it when making opt builds for performance testing.)

I also tend to modify things in js/jsd and js/src/xpconnect/src, so I have a special makefile that does a minimal rebuild for those:

ROOT := $(shell hg root)

all:
 $(MAKE) -C $(ROOT)/obj/js/src
 $(MAKE) -C $(ROOT)/obj/js/jsd
 $(MAKE) -C $(ROOT)/obj/js/src/xpconnect/src
 $(MAKE) -C $(ROOT)/obj/layout/build
 $(MAKE) -C $(ROOT)/obj/toolkit/library

I have that saved as ~/mf, and I have a shell alias ‘mk’ that does ‘make -f ~/mf’. So I’ll make my changes, then run ‘mk -k -j12′ or whatever. (I don’t know why I bother to give numbers to my -j options, since I use distcc’s hosts syntax for limiting concurrent jobs anyway.)

Even lazier, I have my emacs set up to pick the right make command depending on what directory I’m in (please excuse my weak elisp-fu):

; Customizations based on the current buffer's path

(defun get-hg-dir (path)
 (if (equal path "/")
 nil
 (if (file-exists-p (expand-file-name ".hg" path))
 (expand-file-name ".hg" path)
 (get-hg-dir (directory-file-name (file-name-directory path))))))

; For Mozilla source:
;  - if within an hg-controlled directory, set the compile-command to
;      make -f ~/mf...
;    which will do a fairly minimal rebuild of the whole tree
;  - unless we're also underneath js/src, in which case, just do a make
;    within the JS area
(defun custom-compile-hook ()
 (let ((path (buffer-file-name))
 (dir (directory-file-name (file-name-directory (buffer-file-name)))))
 (if (not (null (get-hg-dir path)))
 (if (string-match "js/src" dir)
 (set (make-local-variable 'compile-command)
 (concat "make -C " (expand-file-name (concat dir "/../../obj/js/src")) " -k"))
 (set (make-local-variable 'compile-command)
 (concat "make -f ~/mf -k -j12"))))))

(add-hook 'find-file-hook 'custom-compile-hook)

I have my F12 key bound to ‘compile, so I just hit F12, check that the command is right, then press enter to build. One problem I have is that our build output is much too verbose, so I don’t notice warnings very well. I keep meaning to shut it up (probably by only printing the file being compiled unless there are errors/warnings), but I haven’t gotten around to it.

compiling: distcc and ccache

I rely heavily on distcc for my builds. I do almost all of my Mozilla work on a single laptop machine, though occasionally I’ll reboot it into Windows to suffer through something there, or use one of my two desktops (one home, one work). My work desktop is quite beefy. My home desktop is less so, but still good enough to speed up builds dramatically. I run a cron job on my laptop to autodetect where I am and switch my ~/.distcc/hosts symlink to the appropriate hosts file, which contains “localhost finkdesk/12″ at work and “localhost 192.168.1.99/7″ at home. The /12 and /7 are the max number of concurrent jobs distcc will trigger; I set it lower on my home machine to keep from bogging it down with contending jobs, though honestly I haven’t benchmarked to see what the right numbers are.

About half the time, I’ll have distccmon-gnome running to monitor where the jobs are going to. It’s a quick way to spot when I’m sending things to the wrong place (eg when I’m VPNed into the work network and finkdesk is reachable; if I accidentally send things there, distcc will slow everything down because the network time way outweighs the compilation speedups.) Or, more often, that something’s messed up and all builds are going to localhost. Or that I’m only getting a single job at a time because I forgot to use -j again.

I also use ccache at all times, but I don’t do anything nonstandard with it. Just be sure to set CCACHE_PREFIX=distcc and allow it to get big with ‘ccache -M’.

linking: gold

When I’m working outside of js/src proper, I also like to use the gold linker in place of the default binutils bfd linker. I’m on Fedora 14, so to switch to gold I do

  cd /etc/alternatives
  rm ld
  ln -s /usr/bin/ld.gold ld

(and to switch back, link to ld.bfd). gold takes my minimal links from 30 seconds to about 10 seconds, which is really nice. Unfortunately, I frequently have to switch back to ld.bfd due to incompatibilities. elfhack and valgrind are the usual offenders. Update: According to jseward, valgrind >= 3.6.0 should work fine. Yay! (I currently have 3.5.0).

patch queue

While they’re in my mq, all of my patches are labeled with the bug number and a brief description. When I’m reshuffling changes between my various patches, I create temporary patches whose names are prefixed with “M-” (for Merge) to indicate that I’m planning on qfolding them into some other existing patch. I also use “T-” for temporary patches (debugging printouts or whatnot). It helps to see the state of everything with a glance at my ‘hg qseries -v’ output (which, due to aliases and defaults, I actually spell ‘hg series’).

Very recently, I’ve started using ‘hg qcrecord’ to split up and reorganize patches, and I’m loving it. It’s the same basic story, though — I use it to create temporary to-be-merged patches that I qfold later. I tend to do

  hg qref -X '*'
  hg qcrecord

quite a bit to move stuff out of the current patch (well, the current patch + the current changes on top of it).

disk space

Finally, I also try to occasionally go through all my TM-* directories and run ‘hg relink’ to rediscover what can be hardlinked. It takes a while, so I really ought to cron it. It tends to recover surprisingly large amounts of disk space.

Complete and total tangent:

My underinformed, overopininated take on this is that hg’s disk structures are wrong. As I understand it, the wasted space comes from: (1) you clone a repo, which creates a bunch of hardlinks, using very little space; (2) you periodically update the base repo, breaking many of the hardlinks; then (3) you update the derived repo with those changes. hg doesn’t figure out that it can re-link the object files — which is understandable, since it would need to know for a given file that not only are the latest versions identical, but also that the complete set of revisions between the two repos is identical.

It doesn’t seem that hard for it to figure this out. But even if it did, any local change in the derived repository is going to prevent sharing anyway. That’s what bugs me. Conceptually, hg’s object store is a big pile of byte strings, one for every revision of every file, and each tagged with (and looked up by) its checksum. There’s an optimization that all the revs of a single file can be stored compactly as a set of deltas rather than storing a full (compressed) copy of every rev, but that really ought to be an optimization, not a fundamental data structure. If you ditched the optimization entirely and kept a full copy of every rev, you could trivially share a repo across all of your checkouts. (You could even share a repo with completely unrelated projects, though that’d be more likely to hurt than help.) I would find this much nicer.

Actually, it’s not just that all the versions of a file need to be stored within one filesystem file. hg seems to want the set of versions within a filesystem file to mean something. I would rather have that information (the set of known revisions) stored within a checkout, so that extra revs would be harmless. Then you don’t need to lose the optimization; you can still stuff all revisions into one file, even revisions from completely unrelated branches. You’d even have flexibility to use multiple filesystem files for a single source file, if it has a bunch of revisions that you want rapid access to. (So file1 contains revA + a few deltas, file2 has revB only, file3 has revC + a few deltas, etc. Think images.)

I think I’m probably describing git’s data structures here. If so, it seems like git has it right. Checkouts should have their own state, history, etc., but feed off of a chaotic assortment of checksummed data wads that are optimized for whatever you want to optimize for. It gives much more flexibility.

You shouldn’t even really need to have all revisions stored locally, if you know of a place on the network where you can find old/unrelated revisions when you want them. If you ever ask to jump back 3 years, then sure, it’d take a while to pull down the needed data, but most of the time you’d save lots of disk space for stuff you’re never going to ask for anyway. (And if it bothers you, you can always pull it all down.)

Or maybe I’m wrong about how hg does things.

Whew

Ok, that was long. Thanks for making it this far.  Let me know what I got wrong or what I’m doing stupidly. Preferably with a description of your vastly better way of doing it!

I love using Mercurial’s MQ extension for managing patch queues, even though I have a strong suspicion that it’s fundamentally the wrong idea. I’m only going to discuss one part of that wrongness now, though: it forgets things. Lots of things.

Much of the point of using a revision control system is to not forget anything. I should be able to freely try various lines of development, and get back any of my earlier work. Normally, that would just mean being able to revert to earlier revisions of my source tree, although even there I should really be able to revert portions of changesets. But when using additional tools like mq that manage how I got to a particular source tree, I should be able to back up to any previous state with the tool’s assistance. Fundamentally, it’s not about moving back and forth through a history of artifact versions. It’s that I should never lose any work, even if I do something that in retrospect turns out to be dumb. Or especially when I do something dumb, I should say — that’s why I’m using a revision control system instead of a dumb backup system. It’s supposed to understand source code and what perverse things developers do when writing and modifying it.

Here’s a concrete example:

  • Developer edits code
  • hg qnew my-amazing-patch
  • Developer edits code some more
  • hg qrefresh
  • Developer says “oh f#@@#!!!!”

The problem is that when the developer refreshed the patch, Mercurial forgot the original patch. It also forgot the source tree that existed when the original patch was applied. So if those further edits turned out to be a Bad Idea, well, oops!

Yes, there is a way out: mq patch queues can themselves be revision controlled. Then as long as the developer remembers to hg commit --mq after every change to the patch queue, everything is golden.

You could even argue that this is the Right Way to work. After all, you don’t expect — or even want — your revision control system to remember every character you type. You’d never be able to identify the right point in time to back up to amid the mass of older revisions. Leaving the decision to the developer as to when a state is important enough to remember just makes sense.

Except it doesn’t. The developer already decided that the state was important by running qnew or qrefresh. Why burden the poor sap with yet another decision? Especially when making that decision requires typing in another command, which means that the mental threshold for interestingness is higher, which means it’ll pretty much never happen.

See https://bitbucket.org/sfink/mqext/ for the obvious solution. That’s actually a grab bag of mq extensions, all of which should really be submitted upstream. But I haven’t bothered.

The part that’s relevant to what I’m about here is that I added -Q options to all of the patch queue-modifying functions I could think of. Specifying -Q will commit the change to the patch queue repository, with a commit message describing the basic change (or you can set the message with -M).

Or you can go a step further, as I did, and use the [defaults] section in your ~/.hgrc to set the -Q flag automatically for whichever commands. See the help message (or the README) for details on installation and usage. Update: and now it’s easier, because you can set qcommit = auto in your [mqext] section and it’ll add the -Q option to the relevant commands. Which is good, since there are more of them than you think.

If you install this, you may want to try out the ‘qshow’ command, too. It’s my favorite of the other things implemented in that extension (I alias it to just ‘show’ because my left pinky is slow.) I use it constantly to review the various patches in the queue. hg show <n> is the way I usually use it; it prints out patch #n in your queue (the numbers come from hg qseries -v, though you really ought to just put -v in your [defaults] section too. Or alias series=qseries -v as I did.)

Feel free to use it, fork it, complain about it, or whatever. I’m still trying to figure out whether I really like it or not. It slows down qref operations, which kinda sucks. But I guess if I really cared I would turn off the default -Q for that one command, and just specify it manually. And I haven’t done that yet.

Oh right. One crucial thing I should mention: actually using any of this saved state is a dangerous affair. Why? Well, because you probably have a couple of patches in your queue applied at the time you decide to back up to an older state, and modifying applied patches is not very healthy. Especially if you reordered your series file. In fact, I would probably recommend doing these steps before (or just after, it doesn’t matter) reverting to an older revision of your patch queue:

  1. hg update -r qparent -C
  2. rm $(hg root --mq)/stateThat will “unapply” all patches, forcefully. You can then qpush (or better, qgoto) the place you want in your queue. Note that shell $(…) is the modern version of backticks, in case you’re unfamiliar.Finally, here’s a sampler of the sorts of log messages the extension extension produces:
    UPDATE: multipage-test
     js/jsd/jsd_xpc.cpp               |    1 +
     js/jsd/test/Makefile.in          |    3 +-
     js/jsd/test/browser_multipage.js |  428 +++++++++++++++++++++++++++++++++++++++
     js/src/jsapi.cpp                 |    4 +
     js/src/jscntxt.cpp               |    4 +-
     js/src/jscompartment.cpp         |    1 +
     js/src/jswrapper.cpp             |    9 +
     7 files changed, 447 insertions(+), 3 deletions(-)
    
    NEW: rename-multipage
    
    RENAME: bug615277-JM-execHook-3 -> bug615277-JM-execHook
    
    DELETE: bug-612717.diff
    
    UPDATE: better-note-dump

    Or as the output of hg log --mq (which only shows the 1st line of each commit message):

    changeset:   92:e2ed45b4a8bf
    user:        Steve Fink 
    date:        Tue Dec 07 14:54:46 2010 -0800
    summary:     UPDATE: bug615277-JM-execHook
    
    changeset:   91:6e36813b7291
    user:        Steve Fink 
    date:        Tue Dec 07 14:51:45 2010 -0800
    summary:     NEW: rename-multipage
    
    changeset:   90:b66861e98c29
    user:        Steve Fink 
    date:        Tue Dec 07 14:38:15 2010 -0800
    summary:     RENAME: bug615277-JM-execHook-3 -> bug615277-JM-execHook
    
    changeset:   89:c02111e0d18d
    user:        Steve Fink 
    date:        Tue Dec 07 14:37:27 2010 -0800
    summary:     NEW: bug615277-JM-execHook-3