08
Mar 11

Work Configuration

Inspired by Nicholas Nethercote’s description of how he sets up his tracemonkey work environment, I thought I’d describe my work configuration and how it differs from njn’s.

Like Nick, I work almost entirely off of the tracemonkey tree these days, and mostly within js/src. I don’t use the js shell all that much compared to the full browser, though, so I tend to do things with the whole tree.

working repositories

Similar to Nick, I have a ~/src/ directory populated with clones of the tracemonkey repo. I have one, “TM-upstream/”, that follows the upstream tracemonkey repository. In fact, I use cron to pull updates hourly. The rest are created as clones of TM-upstream, or sometimes of each other. I vary in how I create these. Some are created via ‘hg clone TM-upstream TM-whatever’, although for whatever reason I usually do ‘cp -rlp TM-upstream TM-whatever’ and then edit TM-whatever/.hg/hgrc to change the ‘default’ path to TM-upstream. The ‘cp’ method is faster, but the end result is pretty much the same. Sometimes I copy the mq subdirectory (.hg/patches) from the repo I’m cloning, sometimes I create a new one from scratch. And sometimes I don’t use one at all.

Oh, and with emacs I had to do

  (setq vc-make-backup-files t)

to make it break hardlinks when modifying files. Breaking hardlinks is normally the default, but it seems like vc mode has a different default that is really really bad if you’re using ‘cp -rlp’ to clone your repos.

All of my (tracemonkey-based) repos start with “TM-“, probably because I use my src/ subdirectory for checkouts of various other projects (bugzilla-tweaks, archer-mozilla, archer, firebug, addon-sdk, etc.). Not all of those are hg-based; I have several git repos and even an svn checkout or two. For the Mozilla tree, I tend to only actively use one or two repos at a time; the rest are for dormant unfinished work.

I made a shell function ‘pullup’ that does ‘(cd $(hg path default) && hg pull)’, which goes to the default upstream repo (probably TM-upstream, unless this is a clone of a clone) and updates its objects. (Note the lack of a -u; I don’t want to update the working directory for the upstream repo without a good reason.) To update my working repo, I’ll ‘hg qpush -a’ to apply as many patches as I can, then probably ‘hg qpop’ to pop off the last one because it failed. (I tend to have a small pile of heavily bitrotted patches lurking around at the end of my series file.) Then I’ll do ‘pullup’ to update the upstream repo and ‘hg pull –rebase’ to merge the changes into my patch queue. My ~/.hgrc sets my merge tool to kdiff3, so any conflicts will pop up the visual merge editor.

I push changes directly from my working repo by using

  hg qpop
  hg show | head
  hg qref -e # if needed

to fix up the commit messages, then qpush everything back on that I’m committing. (I tend to break up my commits into at least 2 pieces, so I usually push more than one change at a time.) Then I do ‘hg qfinish -a’, do my last round of testing, and ‘hg push tracemonkey’ (tracemonkey is set in the [paths] section of my ~/.hgrc).

I don’t bother to run ‘hg outgoing’, because I only commit patches that I’m about to push. I suppose if I were collaborating with someone else, I might get some extra crud that I’d need to worry about, but so far I’ve always done that through patches imported into my patch queue.

object directories

I place my object directories underneath the source directory, so that I can use hg commands while my working directory is underneath the object directory. I mostly use plain ‘~/src/TM-whatever/obj’, which is almost always a debug build. If I need an opt build, it’ll be ‘obj-opt’ in place of ‘obj’. Rarely, I’ll make ‘obj-somethingelse’ for special purposes.

Prefixing things with ‘obj’ helps when moving stuff between machines, because I can do

  rsync -av --exclude='/obj*' TM-whatever desthost:/some/where

building

When underneath obj/js/src, I’ll just run ‘make’ or ‘make -j16’ or whatever to rebuild (even when testing with the browser, because my mozconfig always has ‘ac_add_options –enable-shared-js’ so rebuilding here is enough. In fact, I tend to forget to remove it when making opt builds for performance testing.)

I also tend to modify things in js/jsd and js/src/xpconnect/src, so I have a special makefile that does a minimal rebuild for those:

ROOT := $(shell hg root)

all:
 $(MAKE) -C $(ROOT)/obj/js/src
 $(MAKE) -C $(ROOT)/obj/js/jsd
 $(MAKE) -C $(ROOT)/obj/js/src/xpconnect/src
 $(MAKE) -C $(ROOT)/obj/layout/build
 $(MAKE) -C $(ROOT)/obj/toolkit/library

I have that saved as ~/mf, and I have a shell alias ‘mk’ that does ‘make -f ~/mf’. So I’ll make my changes, then run ‘mk -k -j12’ or whatever. (I don’t know why I bother to give numbers to my -j options, since I use distcc’s hosts syntax for limiting concurrent jobs anyway.)

Even lazier, I have my emacs set up to pick the right make command depending on what directory I’m in (please excuse my weak elisp-fu):

; Customizations based on the current buffer's path

(defun get-hg-dir (path)
 (if (equal path "/")
 nil
 (if (file-exists-p (expand-file-name ".hg" path))
 (expand-file-name ".hg" path)
 (get-hg-dir (directory-file-name (file-name-directory path))))))

; For Mozilla source:
;  - if within an hg-controlled directory, set the compile-command to
;      make -f ~/mf...
;    which will do a fairly minimal rebuild of the whole tree
;  - unless we're also underneath js/src, in which case, just do a make
;    within the JS area
(defun custom-compile-hook ()
 (let ((path (buffer-file-name))
 (dir (directory-file-name (file-name-directory (buffer-file-name)))))
 (if (not (null (get-hg-dir path)))
 (if (string-match "js/src" dir)
 (set (make-local-variable 'compile-command)
 (concat "make -C " (expand-file-name (concat dir "/../../obj/js/src")) " -k"))
 (set (make-local-variable 'compile-command)
 (concat "make -f ~/mf -k -j12"))))))

(add-hook 'find-file-hook 'custom-compile-hook)

I have my F12 key bound to ‘compile, so I just hit F12, check that the command is right, then press enter to build. One problem I have is that our build output is much too verbose, so I don’t notice warnings very well. I keep meaning to shut it up (probably by only printing the file being compiled unless there are errors/warnings), but I haven’t gotten around to it.

compiling: distcc and ccache

I rely heavily on distcc for my builds. I do almost all of my Mozilla work on a single laptop machine, though occasionally I’ll reboot it into Windows to suffer through something there, or use one of my two desktops (one home, one work). My work desktop is quite beefy. My home desktop is less so, but still good enough to speed up builds dramatically. I run a cron job on my laptop to autodetect where I am and switch my ~/.distcc/hosts symlink to the appropriate hosts file, which contains “localhost finkdesk/12” at work and “localhost 192.168.1.99/7” at home. The /12 and /7 are the max number of concurrent jobs distcc will trigger; I set it lower on my home machine to keep from bogging it down with contending jobs, though honestly I haven’t benchmarked to see what the right numbers are.

About half the time, I’ll have distccmon-gnome running to monitor where the jobs are going to. It’s a quick way to spot when I’m sending things to the wrong place (eg when I’m VPNed into the work network and finkdesk is reachable; if I accidentally send things there, distcc will slow everything down because the network time way outweighs the compilation speedups.) Or, more often, that something’s messed up and all builds are going to localhost. Or that I’m only getting a single job at a time because I forgot to use -j again.

I also use ccache at all times, but I don’t do anything nonstandard with it. Just be sure to set CCACHE_PREFIX=distcc and allow it to get big with ‘ccache -M’.

linking: gold

When I’m working outside of js/src proper, I also like to use the gold linker in place of the default binutils bfd linker. I’m on Fedora 14, so to switch to gold I do

  cd /etc/alternatives
  rm ld
  ln -s /usr/bin/ld.gold ld

(and to switch back, link to ld.bfd). gold takes my minimal links from 30 seconds to about 10 seconds, which is really nice. Unfortunately, I frequently have to switch back to ld.bfd due to incompatibilities. elfhack and valgrind are the usual offenders. Update: According to jseward, valgrind >= 3.6.0 should work fine. Yay! (I currently have 3.5.0).

patch queue

While they’re in my mq, all of my patches are labeled with the bug number and a brief description. When I’m reshuffling changes between my various patches, I create temporary patches whose names are prefixed with “M-” (for Merge) to indicate that I’m planning on qfolding them into some other existing patch. I also use “T-” for temporary patches (debugging printouts or whatnot). It helps to see the state of everything with a glance at my ‘hg qseries -v’ output (which, due to aliases and defaults, I actually spell ‘hg series’).

Very recently, I’ve started using ‘hg qcrecord’ to split up and reorganize patches, and I’m loving it. It’s the same basic story, though — I use it to create temporary to-be-merged patches that I qfold later. I tend to do

  hg qref -X '*'
  hg qcrecord

quite a bit to move stuff out of the current patch (well, the current patch + the current changes on top of it).

disk space

Finally, I also try to occasionally go through all my TM-* directories and run ‘hg relink’ to rediscover what can be hardlinked. It takes a while, so I really ought to cron it. It tends to recover surprisingly large amounts of disk space.

Complete and total tangent:

My underinformed, overopininated take on this is that hg’s disk structures are wrong. As I understand it, the wasted space comes from: (1) you clone a repo, which creates a bunch of hardlinks, using very little space; (2) you periodically update the base repo, breaking many of the hardlinks; then (3) you update the derived repo with those changes. hg doesn’t figure out that it can re-link the object files — which is understandable, since it would need to know for a given file that not only are the latest versions identical, but also that the complete set of revisions between the two repos is identical.

It doesn’t seem that hard for it to figure this out. But even if it did, any local change in the derived repository is going to prevent sharing anyway. That’s what bugs me. Conceptually, hg’s object store is a big pile of byte strings, one for every revision of every file, and each tagged with (and looked up by) its checksum. There’s an optimization that all the revs of a single file can be stored compactly as a set of deltas rather than storing a full (compressed) copy of every rev, but that really ought to be an optimization, not a fundamental data structure. If you ditched the optimization entirely and kept a full copy of every rev, you could trivially share a repo across all of your checkouts. (You could even share a repo with completely unrelated projects, though that’d be more likely to hurt than help.) I would find this much nicer.

Actually, it’s not just that all the versions of a file need to be stored within one filesystem file. hg seems to want the set of versions within a filesystem file to mean something. I would rather have that information (the set of known revisions) stored within a checkout, so that extra revs would be harmless. Then you don’t need to lose the optimization; you can still stuff all revisions into one file, even revisions from completely unrelated branches. You’d even have flexibility to use multiple filesystem files for a single source file, if it has a bunch of revisions that you want rapid access to. (So file1 contains revA + a few deltas, file2 has revB only, file3 has revC + a few deltas, etc. Think images.)

I think I’m probably describing git’s data structures here. If so, it seems like git has it right. Checkouts should have their own state, history, etc., but feed off of a chaotic assortment of checksummed data wads that are optimized for whatever you want to optimize for. It gives much more flexibility.

You shouldn’t even really need to have all revisions stored locally, if you know of a place on the network where you can find old/unrelated revisions when you want them. If you ever ask to jump back 3 years, then sure, it’d take a while to pull down the needed data, but most of the time you’d save lots of disk space for stuff you’re never going to ask for anyway. (And if it bothers you, you can always pull it all down.)

Or maybe I’m wrong about how hg does things.

Whew

Ok, that was long. Thanks for making it this far.  Let me know what I got wrong or what I’m doing stupidly. Preferably with a description of your vastly better way of doing it!


02
Mar 11

Updating UUIDs (the wrong way)

When you change an IDL interface, you have to update its uuid. Simple enough, just grab a new uuid and stick it in the .idl file. Easy enough, right?

I’ve been working on JSD (the JavaScript debugger interface) recently, and its IDL file contains 18 different interfaces. So say I add a method to jsdIScript. What interfaces do I need to update? From empirical observations (as in, when I forget to do one I get yelled at by a reviewer), you have to update the uuid on the interface containing any added methods, any interfaces that use that interface within their definitions, and any interface that… inherits? extends? that interface. Recursively.

Update: No you don’t, says #developers, though currently the conversation is not finished and it seems like it may come down to something of a judgement call (something like “if you modify the vtable, update the uuid. If you change an interface used in a method in such a way that it might break a user, update the uuid.”) I’ve updated my script to use the more common rule (inheritance only), though if you want the stricter behavior you can get it with –mode=params.

Leaving the rest of this post here for now:


Solution 1: Manually go through and trace the dependencies, updating uuids as you go.

Bleckth. Maybe someone else can manage to do that properly, but I’m an airhead, and I’d never get that exactly right.

Solution 2: Nuke ’em all and let God sort ’em out. (As in, update every single uuid in the file.)

Easy, but inelegant and I sometimes wonder if knowing whether an interface changed might actually matter to someone. Besides, this could result in me getting yelled at by a reviewer, and that’s what I depend on for figuring out the rules (see “empirical observations”, above.)

Solution 3: automate solution 1.

…so I did.

Give it a .idl file to chew on and one or more interfaces that you know you’ve changed, and it’ll chase through the dependencies for you. Assuming I got the rules right. It spits out a new file with brand-new uuids on all affected interfaces, and even spews to stderr the set of interfaces it’s updating and why:

% update-uuids jsdIDebuggerService.idl jsdIContext >/dev/null
uuids to update:
 jsdIContext because it was given on command line
   3e5c934d-6863-4d81-96f5-76a3b962fc2b -> 24ad10b2-8b4f-49f6-9236-f0ecaed0e19a
 jsdIStackFrame because its body contains jsdIContext
   7c95422c-7579-4a6f-8ef7-e5b391552ee5 -> 4c8c5902-77e8-4a9d-99f8-0ae6f0c58eec
 jsdIContextEnumerator because its body contains jsdIContext
   57d18286-550c-4ca9-ac33-56f12ebba91e -> d102ff63-59ea-4ed7-86d5-490c9e9b6b5a
 jsdICallHook because its body contains jsdIStackFrame
   3eff1314-7ae3-4cf8-833b-c33c24a55633 -> 150610f5-89cd-4f76-b960-06447471eb00
 jsdIExecutionHook because its body contains jsdIStackFrame
   3a722496-9d78-4f0a-a797-293d9e8cb8d2 -> cd3bfe98-c8a3-4c98-91d1-c7e2e79c396c
 jsdIDebuggerService because its body contains jsdIContextEnumerator
   aa232c7f-855f-4488-a92c-6f89adc668cc -> 75ab47da-2400-4efe-bb5e-745dceba4e06

(Sorry, no examples of inheritance-triggered changes there.)

It’s a quickie parse-with-regexes Perl hack.

One major flaw — it only considers a single .idl file. If some other .idl file depends on an interface modified within your file, then this won’t tell you it needs to be updated. I’m pretty sure that no other IDLs depend on the JSD IDL, so I don’t care yet. If this would be useful to you and that’s a necessary feature, let me know and I’ll throw it in. It’s easy enough to implement as long as you provide the list of IDL files to consider.

The other major flaw is that this doesn’t update the uuids in header files, once again because JSD didn’t need it. That would be some more work, and I don’t even know if I have the rules right so I’m not going to bother unless someone asks me to and tells me this is the right thing in the first place.

If nobody comments here within a week or so telling me I’m completely wrong about how this stuff works, I’ll add a link to this script on MDC. Maybe. If I remember. (And it turns out, someone did tell me I’m completely wrong. Yay!)