Ephemeron Tables aka JavaScript WeakMaps and How They Work

Introduction

I read Ephemerons explained today after finding it on Hacker News, and it was good but lengthy. It was also described in terms of the Squeak language and included post-mortem finalization, which is unavailable in JavaScript (and frankly sounds terrifying from an implementation point of view!) I thought I’d try my hand at writing up a shorter and hopefully simpler explanation covering only what is available in JS.

Ephemerons—Effin’ Ron What?

Ephemeron tables are the underlying data structure for JavaScript WeakMaps. WeakMaps are very similar to plain Maps where if you have a Map and a key, you can look up a value. The differences are (1) you can only use objects (and soon symbols) as WeakMap keys, (2) the API is limited to prevent retrieving any entries without having the corresponding key in hand, and (3) WeakMaps are hooked into the garbage collector (GC) so that they don’t keep as much stuff alive.

If a regular Map is alive, then so are all of its keys. And values. And anything they might contain, recursively.

If a WeakMap is alive on the other hand, then it won’t keep any key or value alive unless something else is keeping a particular key alive. Then it will keep the corresponding value alive.

That’s it. Everything else falls out of that.

In terms of usage, WeakMaps are good for annotating an object with data that isn’t useful if the object is no longer needed. You could map an object to some expensive-to-compute cached information, for example. Or maybe you want to track whether an Error object has been logged, or associate a DOM object with some information about it. It’s like adding an invisible property to an object that you can only look at if you look it up in some invisibleProperty WeakMap.

WeakMaps and Garbage Collection

Let me expand on that last point a little. Say you have your invisibleProperty WeakMap filled up with a bunch of entries mapping various objects to their properties. If any of the objects dies, then the corresponding value is no longer kept alive simply by being in that WeakMap. (It may not actually die, because something else may refer to it.) Moreover, if you discard the invisibleProperty WeakMap itself (by setting invisibleProperty=null and not having anything else that refers to it), then none of those objects will keep the corresponding value alive anymore, or at least not just because they’re in the now-dead WeakMap.

Mathematically, a WeakMap entry is an edge from a WeakMap WM and a key K to some value V:

    AND(WM,K)→V

In order for the entry to keep V alive, both WM and K have to be alive. If either is dead, then the entry has no effect.

WeakMap GC Implementation

An important consequence of the above rule is that something might be alive for reasons that require looking through a chain of WeakMap entries. You might have a WeakMap entry value that is itself a WeakMap. Or the value is used as a key in the same or another WeakMap. This complicates the simple marking GC implementation, which is: start from a set of objects known to be live, and mark everything that they contain (or can directly reach in any way) as live, recursively.

With WeakMaps, when you mark some object as being live, you don’t know whether it might be a key in some random WeakMap out there. Or perhaps you do, because you’ve cleverly set a bit on everything used as a WeakMap key—but this does you no good, because you also need to know whether any of the WeakMaps the key is in are themselves alive, and you may not have figured that out yet.

The simple solution: mark everything normally, but collect a list of all of the WeakMaps you discover to be live. Then loop through all of their entries, check each key to see if it’s alive, and if so mark its corresponding value as live.

But one loop may not be enough—if you mark any values, then you may have discovered either a new WeakMap or an object that is used in a known or not-yet-known WeakMap. So you’ll need to keep repeating until no new values are marked. In terms of computational complexity, you’ll visit up to n objects each time through the loop, and you’ll loop up to n times, for a total of O(n²) operations. Perhaps you’re not familiar with that expression? It is written in the language of computer science, where it is pronounced “oh, crap!”.

Now, normally it would be really hard to hit the worst case here. But on the Web, anything—no matter how stupid—can somehow make somebody money. Or amuse them, or whatever. Therefore somebody will do it.

Linear-time Implementation

There’s a straightforward fix, though—don’t do the above. Instead, every time you discover a live WeakMap, add all of its entries to a big hashtable. Additionally, every time you mark an object, look it up in the hashtable and mark the values of any entries you find. If n is the number of live objects, you’ll visit each one once and do a constant amount of work, for a running time of O(n) (pronounced “oops I forgot about the constants”).

You wouldn’t actually implement it quite that way, but you can use a little optimism to skip the slow part almost all of the time. And when you can’t—well, O(n²) means for small n, it will take a few extra milliseconds. For large n, it will take until Thursday. So at least you won’t be collecting garbage until Thursday.

Tags: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.