09
Jun 22

Ephemeron Tables aka JavaScript WeakMaps and How They Work

Introduction

I read Ephemerons explained today after finding it on Hacker News, and it was good but lengthy. It was also described in terms of the Squeak language and included post-mortem finalization, which is unavailable in JavaScript (and frankly sounds terrifying from an implementation point of view!) I thought I’d try my hand at writing up a shorter and hopefully simpler explanation covering only what is available in JS.

Ephemerons—Effin’ Ron What?

Ephemeron tables are the underlying data structure for JavaScript WeakMaps. WeakMaps are very similar to plain Maps where if you have a Map and a key, you can look up a value. The differences are (1) you can only use objects (and soon symbols) as WeakMap keys, (2) the API is limited to prevent retrieving any entries without having the corresponding key in hand, and (3) WeakMaps are hooked into the garbage collector (GC) so that they don’t keep as much stuff alive.

If a regular Map is alive, then so are all of its keys. And values. And anything they might contain, recursively.

If a WeakMap is alive on the other hand, then it won’t keep any key or value alive unless something else is keeping a particular key alive. Then it will keep the corresponding value alive.

That’s it. Everything else falls out of that.

In terms of usage, WeakMaps are good for annotating an object with data that isn’t useful if the object is no longer needed. You could map an object to some expensive-to-compute cached information, for example. Or maybe you want to track whether an Error object has been logged, or associate a DOM object with some information about it. It’s like adding an invisible property to an object that you can only look at if you look it up in some invisibleProperty WeakMap.

WeakMaps and Garbage Collection

Let me expand on that last point a little. Say you have your invisibleProperty WeakMap filled up with a bunch of entries mapping various objects to their properties. If any of the objects dies, then the corresponding value is no longer kept alive simply by being in that WeakMap. (It may not actually die, because something else may refer to it.) Moreover, if you discard the invisibleProperty WeakMap itself (by setting invisibleProperty=null and not having anything else that refers to it), then none of those key/value entries will keep the corresponding value alive anymore.

Mathematically, a WeakMap entry is an edge from a WeakMap WM and a key K to some value V:

    BOTH(WM,K)→V

In order for the entry to keep V alive, both WM and K have to be alive. If either is dead, then the entry has no effect on V’s liveness, and in fact is unobservable. (You can’t look something up in a weakmap that you don’t have. And you can’t look up a key that you don’t have either. So whether the entry exists or not makes no difference to you.)

WeakMap GC Implementation

An important consequence of the above rule is that something might be alive for reasons that require looking through a chain of WeakMap entries. You might have a WeakMap entry value that is itself a WeakMap. Or the value is used as a key in the same or another WeakMap. This complicates the simple marking GC implementation, which is: start from a set of objects known to be live, and mark everything that they contain (or can directly reach in any way) as live, recursively.

With WeakMaps, when you mark some object as being live, you don’t know whether it might be a key in some random WeakMap out there. Or perhaps you do, because you’ve cleverly set a bit on everything used as a WeakMap key—but this does you little good, because you also need to know whether any of the WeakMaps the key is in are themselves alive, and you may not have figured that out yet.

The simple solution: mark everything normally, but collect a list of all of the WeakMaps you discover to be live. Then loop through all of their entries, check each key to see if it’s alive, and if so mark its corresponding value as live.

But one loop may not be enough—if you mark any values, then you may have discovered either a new WeakMap or an object that is used in a known or not-yet-known WeakMap. So you’ll need to keep repeating until no new values are marked. In terms of computational complexity, you’ll visit up to n objects each time through the loop, and you’ll loop up to n times, for a total of O(n²) operations. Perhaps you’re not familiar with that expression? It is written in the language of computer science, where it is pronounced “oh, crap!”.

Now, normally it would be really hard to hit the worst case here. But on the Web, anything—no matter how stupid—can somehow make somebody money. Or amuse them, or whatever. Therefore somebody will do it.

Linear-time Implementation

There’s a straightforward fix, though—don’t do the above. Instead, every time you discover a live WeakMap, add all of its entries to a big hashtable. Additionally, every time you mark an object, look it up in the hashtable and if you find it, mark the values of any entries you find. If n is the number of live objects, you’ll visit each one once and do a constant amount of work, for a running time of O(n) (pronounced “oops I forgot about the constants”).

You wouldn’t actually implement it quite that way, but you can use a little optimism to skip the slow part almost all of the time. And when you can’t—well, O(n²) means for small n, it will take a few extra milliseconds. For large n, it will take until Thursday. So at least you won’t be collecting garbage until Thursday.

Sorry, what?

Do your tracing in two phases:

  • Start from a set of roots. Mark everything reachable from the roots, recursively. Whenever you encounter an ephemeron table (and therefore know it is live, since it is reachable from a root), iterate over all of its entries. If an entry’s key is already known to be live, trace its value. Otherwise, add it to a table of pending entries, keyed by the key. After this phase is done, you will probably have visited the vast majority of the object graph. (Everything remaining is only reachable by going through one or more ephemeron table entries.)
  • At this point, you have a mostly-marked graph, and a table of ephemeron keys, some of which are now marked (but weren’t when you added them to the table). Scan through the table and find all of the now-marked keys, and trace their values. However, now whenever you visit any object (the value or anything reachable from the value), immediately look up the value in the table and trace through any entries you find. (If you encounter a not-yet-seen ephemeron table during this process, do the same thing as before.)

Every object in the graph will be visited at most twice, and every operation on an object is O(1)—constant time. So the overall scan is O(fast n + slow n) = O(n).