compressing strings in JS

As we kept increasing the amount of information we send in via Telemetry, we need to start thinking about how to keep the size of the ping packets containing that information as small as possible. The packets are just JSON, so the first thing to try is to compress the data with gzip prior to sending it.

This is how you compress a string in a language like Python:

compressed = zlib.compress(str)

(Yes, yes, this is not gzip compression.  Close enough for pedagogical purposes.)

Short and simple. Boy, I hope it’s that easy in JS. Hm, let’s see, there’s this nsIStreamConverter interface, that looks promising:

let converter = Cc["@mozilla.org/streamconv;1?from=uncompressed&to=gzip"].createInstance(Ci.nsIStreamConverter);
let stream = Cc["@mozilla.org/stringinputstream;1"].createInstance(Ci.nsIStringInputStream);
stream.data = string;
// Hm, having to respecify input/output types is a bit weird.
let gzipStream = converter.convert(stream, "uncompressed", "gzip", null);

OK, we wound up with a stream, rather than a string, but that’s OK, because nsIXMLHttpRequest.send will happily accept a stream. So, nothing to worry about. (This is a little white lie; please hold your comments until the end.)

Hm, that doesn’t seem to work. I get NS_ERROR_NOT_IMPLEMENTED. Oh, look, nsDeflateConverter doesn’t implement nsIStreamConverter.convert. In fact, none of the stream converters in the tree seem to implement convert. What a bummer.

Hey, here’s nsIStreamConverterService! Maybe he can help. His convert method just punts to nsIStreamConverter.convert, so that won’t work, though. Ah, nsIStreamConverter has an asyncConvertData method, let’s try that:

function Accumulator() {
  this.buffer = "";
}
Accumulator.prototype = {
  buffer: null,
  onRequestStart(request, context) {},
  onRequestStop(request, context, statusCode) {},
  onDataAvailable(request, context, inputStream, offset, count) {
    let stream = Cc["@mozilla.org/binaryinputstream;1"].createInstance(Ci.nsIBinaryInputStream);
    stream.setInputStream(inputStream);
    let input = stream.readByteArray(count);
    this.buffer += String.fromCharCode.apply(input);
  }
};

let accumulator = new Accumulator();
let converter = Cc["@mozilla.org/streamconv;1?from=uncompressed&to=gzip"].createInstance(Ci.nsIStreamConverter);
// More respecifying input/output types.
converter.asyncConvertData("uncompressed", "gzip", accumulator, null);
// Oh, that method doesn't actually convert anything, it just prepares
// the instance for doing conversion.
let stream = Cc["@mozilla.org/stringinputstream;1"].createInstance(Ci.nsIStringInputStream);
stream.data = string;
converter.onRequestStart(null, null);
converter.onDataAvailable(null, null, stream, 0, string.length);
converter.onRequestStop(null, null, 201 /* 417 */);
compressed = accumulator.buffer;

Well, it’s not as simple as I hoped for, but I guess it works.

FWIW, I do understand why the input/output types have to be respecified.  But I think this is about the best way to do things currently; that’s kind of frightening. The above is one of those instances where you start to understand why people complain about things being so crufty.

6 comments

  1. Well, once again this shows that as good as async methods are in terms of “snappiness”, as ugly and complicated they are in actual usage.
    I had to learn that the hard way with indexedDB and XHR as well. 🙁

  2. You’re using APIs designed for asynchronous streams coming off the network. Yes, they’re crufty for compressing a memory buffer.

    File a bug on a different API (one that you will only be able to use from off the main thread, probably)?

  3. It looks like Accumulator is your attempt to duplicate the effect of a stream loader.

    • Nathan Froyd

      Indeed it is, thanks for pointing that out. That removes a good bit of the cruft.

  4. Sooo, isn’t compression over the network normally handled by the webserver and mod_deflate or equivalent, usually using gzip? And automatically unpacked on the browser?

    Even on dynamic pages?

  5. ehm. n/m – that’s from server to browser, not browser to server…

    Man. That’s a shame… it should be both 🙁