Mar 14

scratchpad made me happy

I love the Firefox devtools command line. I cut & paste all kinds of crazy code into it. On the other hand, I never did really quite “get” the Scratchpad, though to be honest I also haven’t tried using it much. I’ve been happy just to edit code in an Emacs buffer on the side and cut & paste.

The Problem

But last night, I ran into a problem that Scratchpad turned out to be perfect for. And now I loveses it. My precioussssss….

I am the proud survivor — er, I mean “father” — of two kids, one of whom goes to a school with lots of required “volunteer” time. Since it is required, you have to record your hours via a web-based tool. It’s a fairly primitive interface on some ancient backend ASP monstrosity. It’s tolerable to use to record one entry at a time — you just need to enter a date, a start hour, a start minute, an end hour, an end minute, three options selected from dropdown lists of several dozen items (enough to require scrolling), etc.

Ok, it’s pretty awful even for entering one record.

But entering 80 of the things, most of them differing only by the date, is intolerable. Especially since the d#@n form resets itself completely every time you submit it. And so automation rage kicked in.

Existing Solution

Last year, I did it by capturing the POST request and writing a script to resubmit with different values. It worked, kinda, though I could only get a couple of fields working and it kept timing out my authentication cookie. Or something. I just remember it being a major pain. Even capturing the full request was a little difficult since it’s HTTPS only and I seem to remember some limitation in the Firefox devtools of the time when trying to see the POST body data. (Again, “or something”.)

The Latest Hotness

This year, I was overjoyed to see the option to edit and resubmit a query. I’ve wanted that so many times. And yet… the data was still x-www-form-urlencoded, which means I had to cut & paste from the devtools pane, which is already a challenge due to Linux/xorg/xfce/emacs’s mishandling of cut buffers or clipboards or whatever the heck they are. And then find the field I cared about and update it, and then discover that it overwrote my previous entry because there was some embedded token in one of the other fields that referred to the entry it was creating. Ugh. (Dim memories resurfaced at about this point from when I needed to get around this last year. I still don’t remember the details.)

So then I thought, “hey, I’ll just update the page in place and then click submit! I’ll do it at the HTML level instead of the HTTP level!” So I wrote up some JS code to find and set the various form fields, and clicked submit. Success!

Only it’s still a painful flow. I have to edit the relevant field in emacs (or eventually, I’d probably generate the JS scripts with a shell or Perl or Python script), cut & paste into the little tiny console line (it’s a console prompt, it’s supposed to be small, I have no issue with that.) (Though maybe pasting into the console itself should send it to the prompt? I dunno.), press enter, then click on submit. Not too bad, and definitely well within the “tolerable” zone.

But it’d be easier if I could just define a JS function that finds and fills the fields, and pass in the one value I need to change. Then I can just enter that at the console. Only where can I stash the function? If I put it on the page, I assume it’ll get nuked whenever I submit the page. Hey, I wonder if that Scratchpad thing might help here…

Enter the Scratchpad. Oh yeah.

So I pasted my little script into the Scratchpad. It defines a function to fill out the fields. Final flow: edit one line of JS to change a date, press Ctrl-R to run it. The fields magically update, I click submit.

Obviously, I could’ve done the submit from the script while I was at it, but I like to set things up and fire them off in separate steps. Call it paranoia. I do the same thing with shell scripts — I’ll write a script to echo out a series of commands to perform, run it once to verify that it’s what I want, then run it again piped through bash. I’m just too clumsy to get it right the first time.

But anyway, Scratchpad was the awesome for this task. I’ll be considering it whenever thinking about how to do other things now.

Doers of Good

Thank you robcee and #devtools team. You made my life gooder. More goodish. I am now living goodlier.

My Example

The script I used, if you’re curious:

function enter(date) {
    inputs = document.forms[0].getElementsByTagName("input");
    selects = document.forms[0].getElementsByTagName("select");
    texts = document.forms[0].getElementsByTagName("textarea");
    inputs["VolunteerDate"].value = date;
    inputs["FromHour"].value = 8;
    inputs["FromMin"].value = 45;
    inputs["ToHour"].value = 9;
    inputs["ToMin"].value = 45;
    inputs["nohour"].value = 1;
    selects["MinCombo"].value = 0;
    selects["classcombo"].value = 40;
    selects["activitycombo"].value = 88;
    selects["VolunteerCombo"].value = "Steve Fink";
    texts[0].innerHTML = "Unit study center";

enter("2/27/2014") # I edit this line and bounce on the Ctrl-R key, then click submit

Feb 13

Browser Wars, the game

A monoculture is usually better in the short term. It’s a better allocation of resources (everyone working on the same thing!) If you want to write a rich web app that works today (ie, on the browsers of today), it’s much better.

But the web is a platform. Platforms are different beasts.

Imagine it’s an all-WebKit mobile web. Just follow the incentives to figure out what will happen.

Backwards bug compatibility: There’s a bug — background SVG images with a prime-numbered width disable transparency. A year later, 7328 web sites have popped up that inadvertently depend on the bug. Somebody fixes it. The websites break with dev builds. The fix is backed out, and a warning is logged instead. Nothing breaks, the world’s webkit, nobody cares. The bug is now part of the Web Platform.

Preventing innovation: a gang of hackers makes a new browser that utilizes the 100 cores in 2018-era laptops perfectly evenly, unlike existing browsers that mostly burn one CPU per tab. It’s a ground-up rewrite, and they do heroic work to support 99% of the websites out there. Make that 98%; webkit just shipped a new feature and everybody immediately started using it in production websites (why not?). Whoops, down to 90%; there was a webkit bug that was too gross to work around and would break the threading model. Wtf? 80%? What just happened? Ship it, quick, before it drops more!

The group of hackers gives up and starts a job board/social network site for pet birds, specializing in security exploit developers. They call it “Polly Want a Cracker?”

Inappropriate control: Someone comes up with a synchronization API that allows writing DJ apps that mix multiple remote streams. Apple’s music studio partners freak out, prevent it from landing, and send bogus threatening letters to anyone who adds it into their fork.

Complexity: the standards bodies wither and die from lack of purpose. New features are fine as long as they add a useful new capability. A thousand flowers bloom, some of them right on top of each other. Different web sites use different ones. Some of them are hard to maintain, so only survive if they are depended upon by a company with deep enough pockets. Web developers start playing a guessing game of which feature can be depended upon in the future based on the market cap of the current users.

Confusion: There’s a little quirk in how you have to write your CSS selectors. It’s documented in a ton of tutorials, though, and it’s in the regression test suite. Oh, and if you use it together with the ‘~’ operator, the first clause only applies to elements with classes assigned. You could look it up in the spec, but it hasn’t been updated for a few years because everybody just tries things out to see what works anyway, and the guys who update the spec are working on CSS5 right now. Anyway, documentation is for people who can’t watch tutorials on youtube.

End game: the web is now far more capable than it was way back in 2013. It perfectly supports the features of the Apple hardware released just yesterday! (Better upgrade those ancient ‘pads from last year, though.) There are a dozen ways to do anything you can think of. Some of them even work. On some webkit-based browsers. For now. It’s a little hard to tell what, because even if something doesn’t behave like you expect, the spec doesn’t really go into that much detail and the implementation isn’t guaranteed to match it anyway. You know, the native APIs are fairly well documented and forward-compatible, and it’s not really that hard to rewrite your app a few times, once for each native platform…

Does this have to happen just because everybody standardizes on WebKit? No, no more than it has to happen because we all use silicon or TCP. If something is stable, a monoculture is fine. Well, for the most part — even TCP is showing some cracks. The above concerns only apply to a layer that has multiple viable alternatives, is rapidly advancing, needs to cover unexpected new ground and get used for unpredicted applications, requires multiple disconnected agents to coordinate, and things like that.

Apr 12

bzexport changes released

bzexport –new and hg newbug have landed

My bzexport changes adding a --new flag and an hg newbug command have landed. Ok, they landed months ago. See my previous blog post for details; all of the commands and options described there are still valid in the current version. But please pull from the official repo instead of my testing repo given in the earlier blog post.

Installing bzexport

mkdir -p ~/hg-extensions
cd ~/hg-extensions
hg clone http://hg.mozilla.org/users/tmielczarek_mozilla.com/bzexport

in the [extensions] section of your ~/.hgrc, add:
bzexport = ~/hg-extensions/bzexport/bzexport.py

Note to Windows users: unfortunately, I think the python packaged with MozillaBuild is missing the json.py package that bzexport needs. I think it still works if you use a system Python with json.py installed, but I’m not sure.

Trying it out

For the (understandably) nervous users out there, I’d like you to give it a try and I’ve made it safe to do so. Here are the levels of paranoia available: Continue reading →

Feb 12

Only pay for the entropy you use

Log Files Are Boring

Just an idea, based on hearing that build log transfers seem to consume large amounts of bandwidth. (Note that for all I know, this is already being done.)

Logs are pretty dull. In particular, two consecutive log files are usually quite similar. It’d be nice if we could take advantage of this redundancy to reduce the bandwidth/time consumed by log transfers.

rsync likes boring data

The natural thing that springs to mind is rsync. I grabbed two log files that are probably more similar to each than is really fair, but they shouldn’t be horribly unrepresentative. rsyncing one to the other found them to share 32% of their data, based on the |rsync –stat| output lines labeled “Matched data” and “Literal data”, for a speedup of 1.46x.

I suspected that rsync’s default block size is too large, and so most of the commonalities are not found. So I tried setting the block size ridiculously low, to 8 bytes, and it found them to be 98% similar. Which is silly, because it has to retrieve more block hashes at that block size than it saves. The total “speedup” is reported as 0.72x.

But the sweet spot in the middle, with a block size of 192, gives 84% similarity for a speedup of 4.73x.

compression likes boring data too

Take a step back: this only applies to uncompressed files. Simply gzipping the log file before transmitting it gives us a speedup of 14.5x. Oops!

Well, rsync can compress the stuff it sends around too. Adding a -z flag with block size 192 gives a speedup of 16.2x. Hey, we beat basic gzip!

But compression needs decent chunks to work with, so the sweet spot may be different. I tried various block sizes, and managed a speedup of 24.3x with -B 960. An additional 1.7x speedup over simple compression is pretty decent!

To summarize our story so far, let’s say you want to copy over a log file named log123.txt. The proposal is:

  1. Have a vaguely recent benchmark log file, call it log_compare.txt, available on all senders and receivers. (Actually, it’d probably be a different one per build configuration, but whatever.)
  2. On the server, hard link log123.txt to log_compare.txt.
  3. From the client, rsync -z -B 960 log123.txt server:log123.txt

stop repeating what I say!

But it still feels like there ought to be something better. The benchmark log file is re-hashed every time you do this and the hashes are sent back over the wire, costing bandwidth. So let’s eliminate that part. Note that we’ll drop the -z from flag because we may as well compress the data during the transfer instead:

 ssh server 'ln log_compare.txt log123.txt'
 rsync -B 960 log123.txt log_compare.txt --only-write-batch=batch.dat
 ssh -C server 'rsync --read-batch=- argleblargle log132.txt' < batch.dat

Note that “argleblargle” is ignored, since the source file isn’t needed.

So what’s the speedup now? Let’s only consider the bytes transmitted over the network. Assuming the compression from ssh -C has the same effect as gzipping the file locally, I get a speedup of 28.9x, about 2x the speedup of simply compressing the log file in the first place.

But wait. The block size of 960 was based on the cost of retrieving all those hashes from the remote side. We’re not doing that anymore, so a smaller block size should again be more effective. Let’s see… -B 192 gets a total speedup of 139x, which is almost exactly one order of magnitude faster than plain gzipped log files. Now we’re talking!

loose ends

Two things still bug me. One is a minor detail — the above is writing out batch.dat, then reading it back in to send over to the server. This uselessly consumes disk bandwidth. It would be better if rsync could directly read/write compressed batch files to stdin/stdout. (It can read uncompressed batches from stdin, but not write to stdout. You could probably hack it somehow, perhaps with /proc/pidN/fd/…, but it’s not a big deal. And you can just use use /dev/shm/batch.dat for your temporary filename, and remove it right after. It’d still be better if it never had to exist uncompressed anywhere, but whatever.)

The other is that we’re still checksumming that benchmark file locally for every log file we transfer. It doesn’t change the number of bytes spewed over the network, but it slows down the overall procedure. I wonder if librsync would allow avoiding that somehow…? (I think rsync uses two checksums, a fast rolling checksum and a slower precise one, so you’d need to compute both for all offsets. And reading those in would probably cost more than recomputing from the original file. But I haven’t thought too hard about this part.)

not just emacs and debuggers

I sent this writeup to Jim Blandy, who in a typically insightful fashion noticed that (1) this requires some fiddly bookkeeping to ensure that you have a comparison file, and (2) revision control systems already handle all of this. If you have one version of a file checked in and then you check in a modified version of it, the VCS can compute a delta to save storage costs. Then when you transmit the new revision to a remote repository, the VCS will know if the remote already has the baseline revision so it can just send the delta.

Or in other words, you could accomplish all of this by simply checking your log files into a suitable VCS and pushing them to the server. That’s not to say that you’re guaranteed that your VCS will be able to fully optimize this case, just that it’s possible for it to do the “right” thing.

I attempted to try this out with git, but I don’t know enough about how git does things. I checked in my baseline log file, then updated it with the new log file’s contents, then ran git repack to make a pack file containing both. I was hoping to use the increase in size from the original object file to the pack file as an estimate of the incremental cost of the new log file, but the pack file was *smaller* than either original object file. If I make a pack with just the baseline, then I end up with two pack files, but the new one is still smaller.

clients could play too

As a final thought, this idea is not fundamentally restricted to the server. You could do the same thing inside eg tbpl: keep the baseline log(s) in localStorage or IndexedDB. When requesting a log, add a parameter ?I_have_baseline_36fe137a1192. Then, at the server’s discretion, it could compute a delta from that baseline and send it over as a series of “insert this literal data, then copy bytes 3871..17313 from your baseline, then…”. tbpl would reconstruct the resulting log file, the unicorns would do their lewd tap dance, and everyone would profit.