mozilla | sfink @ Mozilla

Mozilla — No Comments
03
Jul 25

Effectful Logging

These recent blog posts are veering in the “here’s a horrible thing I just did!” direction. No apologies.

Recently, I was working on a weird problem where I wanted to snapshot /proc/$pid/maps before and after a couple of mmap and madvise calls. But I didn’t particularly want to write C++ code to do it. So:

JS_LOG_FMT(debug, Info, "About to mmap at {:x}", ptr); JS_LOG_FMT(debug, Info, "SKYNET: mkdir /tmp/js-{0}/before /tmp/js-{0}/after", getpid()); JS_LOG_FMT(debug, Info, "SKYNET: cp /proc/{0}/maps /tmp/js-{0}/before", getpid()); sleep(3); mmap(...); JS_LOG_FMT(debug, Info, "SKYNET: cp /proc/{0}/maps /tmp/js-{0}/after", getpid()); sleep(3);

That produces “SKYNET: …” log messages, with a pause. If only someone were reading those log messages and quickly cutting & pasting the commands… let’s give them 3 seconds to do each.

Then I run in my terminal:
MOZ_LOG=debug:5 $JS blahblah.js |& perl -lpe 'system($1) if /SKYNET: (.*)/'

Whenever one of these messages is produced, the Perl script grabs it and runs it in a new shell. Victory!

Note: the Skynet reference is from way back when we all watched Terminator and thought that it would be possible to prevent the AIs from taking over the world just by keeping one company from giving it so much power, in the form of tools and permission to do arbitrary things. We didn’t predict that in only a few decades, thousands of people would be doing exactly that on a daily basis.

Mozilla — No Comments
07
May 25

Sinful Debugging

Recently, I was debugging my SpiderMonkey changes when running a JS test script, and got annoyed at the length of the feedback cycle: I’d make a change to the test script or the C++ code, rerun (under rr), go into the debugger, stop execution at a point where I knew what variable was what, set convenience variables to their pointer values, then run to where the interesting stuff was happening.

One part I had already solved: if I have some JS variables, say base and str, and I want to capture their pointer values in the debugger, I’ll call Math.sin(0, base, str) and set a breakpoint on math_sin. Why the leading 0? In the past, I’d run into problems when Math.sin converted its first argument to a number, which disrupted what I was trying to look at. So now I feed it a number first and put my “real” stuff after it where it’s ignored, even when it doesn’t matter (and it usually doesn’t).

But it’s painful to get set up again. It looks something like:

(rr) b math_sin
(rr) c # ontinue
(rr) p vp[3].toString()
$1 = (JSString *) 0x33315dc003c0
(rr) set $base=$
(rr) p vp[4].toString()
$2 = (JSString *) 0x33315dc01828
(rr) set $str=$

Yay, now I can do things like p $str->dump() and it will work!

But even worse, sometimes I’d want to add or remove variables. So I hacked around it:

First, instead of just passing in the variables, I pass in names along with them. An actual example:

Math.sin(0, "ND2", ND2, "TD3", TD3, "NB4", NB4, "NB5", NB5, "TD6", TD6);

(yes, those names mean something to me). Then the setup becomes more like:

(rr) b math_sin
(rr) c # ontinue
(rr) p vp[3]
$7 = $JS::Value("ND2")
(rr) set $ND2=vp[4].toString()
(rr) p vp[5]
$7 = $JS::Value("TD3")
(rr) set $ND2=vp[6].toString()

Ok, that’s longer, and requires cutting & pasting. I know, shut up.

The next set is to automate with a gdbinit script. Here’s a slightly modified version of mine:

define mlabel
  set $_VP=vp
  python
import re
argc = int(gdb.parse_and_eval("argc"))
for i in range(3, argc + 2, 2):
  namer = f"$_VP[{i}]"
  m = re.search(r'::Value\("(.*?)"',
                str(gdb.parse_and_eval(namer)))
  if not m:
    print(f"Failed to match: {namer}")
    continue
  name = m.group(1)
  setter = f"set ${name}=$_VP[{i+1}].toGCThing()"
  gdb.execute(setter)
end
end
document mlabel
Special-purpose tool for grabbing out things passed to Math.sin(0, "name1", val1, "name2", ...) and converting them to labels.
end

“mlabel” stands for “multi-label”, because it… well, it doesn’t label anything, but in my real version, it runs my own command

label name=value

that does some other magic besides setting a gdb convenience variable (yes, they’re actually called that).

I don’t remember why I went through the extra step of setting a $_VP variable rather than using vp directly. But it’s probably specific to my scenario, so you’ll have to adapt this anyway. This post is meant more to give you an idea.

The result is my JS test code talking to my gdb session and spilling its secrets. Now when I’m debugging the interesting stuff, I can do (rr) print $TD3->dump() and it will do something useful.

Here’s a log of an actual session:

--------------------------------------------------
 ---> Reached target process 3902272 at event 14.
--------------------------------------------------
(rr) Working directory /home/sfink/src/mozilla-ff/js/src/jit-test/tests/gc.
(rr) pretty
Loading JavaScript value pretty-printers; see js/src/gdb/README.
If they cause trouble, type: disable pretty-printer .* SpiderMonkey
SpiderMonkey unwinder is disabled by default, to enable it type:
	enable unwinder .* SpiderMonkey
(rr) b math_sin
Breakpoint 1 at 0x564d91b3942b: file /home/sfink/src/mozilla-ff/js/src/jsmath.cpp, line 649.
(rr) c
Continuing.

Thread 1 hit Breakpoint 1, math_sin (cx=cx@entry=0x7f20e5d3a200, argc=11, vp=0x7f20d5ba8168)
    at /home/sfink/src/mozilla-ff/js/src/jsmath.cpp:649
stopped at breakpoint 1: (N/A) -> (N/A)
(rr) mlabel
all occurrences of 0x33315dc003c0 will be replaced with $ND2 of type js::gc::Cell *
all occurrences of 0x38deaf678670 will be replaced with $TD3 of type js::gc::Cell *
all occurrences of 0x33315dc00340 will be replaced with $NB4 of type js::gc::Cell *
all occurrences of 0x33315dc00188 will be replaced with $NB5 of type js::gc::Cell *
all occurrences of 0x38deaf678658 will be replaced with $TD6 of type js::gc::Cell *
(rr) b promoteString
Breakpoint 2 at 0x564d92737698: file /home/sfink/src/mozilla-ff/js/src/gc/Tenuring.cpp, line 882.
(rr) c
Continuing.

Thread 1 hit Breakpoint 2, js::gc::TenuringTracer::promoteString (this=this@entry=0x7ffcf5f9cb40, 
    src="MY YOUNGEST MEMORY IS OF A TOE, A GIANT BLUE TOE, IT MADE FUN OF ME INCESSANTLY BUT THAT DID NOT BOTHER ME IN THE LEAST. MY MOTHER WOULD HAVE BEEN HORRIFIED, BUT SHE WAS A GOOSE AND HAD ALREADY LAID T"...)
    at /home/sfink/src/mozilla-ff/js/src/gc/Tenuring.cpp:882
stopped at breakpoint 2: (N/A) -> (N/A)
(rr) p (void*)src
$1 = (void *) $NB5
(rr)

Mozilla — No Comments
09
Jun 22

Ephemeron Tables aka JavaScript WeakMaps and How They Work

Introduction

I read Ephemerons explained today after finding it on Hacker News, and it was good but lengthy. It was also described in terms of the Squeak language and included post-mortem finalization, which is unavailable in JavaScript (and frankly sounds terrifying from an implementation point of view!) I thought I’d try my hand at writing up a shorter and hopefully simpler explanation covering only what is available in JS.

Ephemerons—Effin’ Ron What?

Ephemeron tables are the underlying data structure for JavaScript WeakMaps. WeakMaps are very similar to plain Maps where if you have a Map and a key, you can look up a value. The differences are (1) you can only use objects (and soon symbols) as WeakMap keys, (2) the API is limited to prevent retrieving any entries without having the corresponding key in hand, and (3) WeakMaps are hooked into the garbage collector (GC) so that they don’t keep as much stuff alive.

If a regular Map is alive, then so are all of its keys. And values. And anything they might contain, recursively.

If a WeakMap is alive on the other hand, then it won’t keep any key or value alive unless something else is keeping a particular key alive. Then it will keep the corresponding value alive.

That’s it. Everything else falls out of that.

In terms of usage, WeakMaps are good for annotating an object with data that isn’t useful if the object is no longer needed. You could map an object to some expensive-to-compute cached information, for example. Or maybe you want to track whether an Error object has been logged, or associate a DOM object with some information about it. It’s like adding an invisible property to an object that you can only look at if you look it up in some invisibleProperty WeakMap.

WeakMaps and Garbage Collection

Let me expand on that last point a little. Say you have your invisibleProperty WeakMap filled up with a bunch of entries mapping various objects to their properties. If any of the objects dies, then the corresponding value is no longer kept alive simply by being in that WeakMap. (It may not actually die, because something else may refer to it.) Moreover, if you discard the invisibleProperty WeakMap itself (by setting invisibleProperty=null and not having anything else that refers to it), then none of those key/value entries will keep the corresponding value alive anymore.

Mathematically, a WeakMap entry is an edge from a WeakMap WM and a key K to some value V:

BOTH(WM,K)→V

In order for the entry to keep V alive, both WM and K have to be alive. If either is dead, then the entry has no effect on V’s liveness, and in fact is unobservable. (You can’t look something up in a weakmap that you don’t have. And you can’t look up a key that you don’t have either. So whether the entry exists or not makes no difference to you.)

WeakMap GC Implementation

An important consequence of the above rule is that something might be alive for reasons that require looking through a chain of WeakMap entries. You might have a WeakMap entry value that is itself a WeakMap. Or the value is used as a key in the same or another WeakMap. This complicates the simple marking GC implementation, which is: start from a set of objects known to be live, and mark everything that they contain (or can directly reach in any way) as live, recursively.

With WeakMaps, when you mark some object as being live, you don’t know whether it might be a key in some random WeakMap out there. Or perhaps you do, because you’ve cleverly set a bit on everything used as a WeakMap key—but this does you little good, because you also need to know whether any of the WeakMaps the key is in are themselves alive, and you may not have figured that out yet.

The simple solution: mark everything normally, but collect a list of all of the WeakMaps you discover to be live. Then loop through all of their entries, check each key to see if it’s alive, and if so mark its corresponding value as live.

But one loop may not be enough—if you mark any values, then you may have discovered either a new WeakMap or an object that is used in a known or not-yet-known WeakMap. So you’ll need to keep repeating until no new values are marked. In terms of computational complexity, you’ll visit up to n objects each time through the loop, and you’ll loop up to n times, for a total of O(n²) operations. Perhaps you’re not familiar with that expression? It is written in the language of computer science, where it is pronounced “oh, crap!”.

Now, normally it would be really hard to hit the worst case here. But on the Web, anything—no matter how stupid—can somehow make somebody money. Or amuse them, or whatever. Therefore somebody will do it.

Linear-time Implementation

There’s a straightforward fix, though—don’t do the above. Instead, every time you discover a live WeakMap, add all of its entries to a big hashtable. Additionally, every time you mark an object, look it up in the hashtable and if you find it, mark the values of any entries you find. If n is the number of live objects, you’ll visit each one once and do a constant amount of work, for a running time of O(n) (pronounced “oops I forgot about the constants”).

You wouldn’t actually implement it quite that way, but you can use a little optimism to skip the slow part almost all of the time. And when you can’t—well, O(n²) means for small n, it will take a few extra milliseconds. For large n, it will take until Thursday. So at least you won’t be collecting garbage until Thursday.

Sorry, what?

Do your tracing in two phases:

Start from a set of roots. Mark everything reachable from the roots, recursively. Whenever you encounter an ephemeron table (and therefore know it is live, since it is reachable from a root), iterate over all of its entries. If an entry’s key is already known to be live, trace its value. Otherwise, add it to a table of pending entries, keyed by the key. After this phase is done, you will probably have visited the vast majority of the object graph. (Everything remaining is only reachable by going through one or more ephemeron table entries.)
At this point, you have a mostly-marked graph, and a table of ephemeron keys, some of which are now marked (but weren’t when you added them to the table). Scan through the table and find all of the now-marked keys, and trace their values. However, now whenever you visit any object (the value or anything reachable from the value), immediately look up the value in the table and trace through any entries you find. (If you encounter a not-yet-seen ephemeron table during this process, do the same thing as before.)

Every object in the graph will be visited at most twice, and every operation on an object is O(1)—constant time. So the overall scan is O(fast n + slow n) = O(n).

Mozilla — No Comments
20
Dec 19

Running taskcluster tasks locally

Work right from your own home!

It can be difficult to debug failures in Taskcluster that don’t happen locally. Interactive tasks are very useful for this, but interactive tasks broke during the last migration — a relevant bug is bug 1596632, which is duped to a just-fixed bug, so maybe it works now?. I recently encountered a situation where I really needed to interactively debug something, so I decided to take the plunge and discover the answer to the question: how can I run tasks locally?

Local tasks provide not only the advantages of interactive tasks, but also allow running against your local checkout. That makes for a much faster edit-run-curse-debug cycle, and opens up possibilities for using this in a lot more situations than the usual last-ditch efforts that interactive try server tasks are usually used for. (Or at least, that’s how I use them. And mostly don’t use them.)

I’m going to walk through the process of setting up and running a taskcluster job in a local container. Note that I have no idea how generally applicable this is. I will give the steps necessary to run the SM(gdb) job, which builds the JS shell and runs some gdb prettyprinter tests against it. I have no idea how far it will get you to running something like mochitests.

Getting the image

Taskcluster normally runs Docker images. So the first step is to get your very own copy of the appropriate docker image. There’s a handy blog post by someone who actually knows what he’s talking about that I found well after the fact (of course). But I’m going to give the exact steps that I used:

Click on the task you’re trying to replicate in treeherder.
Open the full log file.
Search for a line that says something like “Downloading artifact “public/image.tar.zst” from task ID: VuFo68PeQjCH7k15tSN2Dg.” near the beginning of the file. Call that ID $IMAGEID.
Run ./mach taskcluster-load-image --task-id $IMAGEID from your Gecko checkout.

and then optionally,

Curse and flail around when something goes wrong with the docker import process, as it always seems to.
Maybe install docker in the first place. Whoops, forgot to mention that.
You probably want it to be running as well.

Getting the image up and running

mach will helpfully give you a command to run a shell in the image, something like

/usr/bin/docker-current run -ti --rm debian7-amd64-build:e2e821aea119e4a264340c22b79324ac804955b605577dd225df5f4f8e98e0cc bash

. Don’t do that. It’s a great command, but it’s a little overzealous about cleaning up after itself. But grab out that image name: IMAGE=debian7-amd64-build:e2e821aea119e4a264340c22b79324ac804955b605577dd225df5f4f8e98e0cc

Although for now, I guess it’s really not bad. Just remove the --rm option and give it a try.

If you get a shell to pop up, congratulations! Be happy! If not, try asking someone with a clue or, failing that, ask me. I’m sfink in the #developers channel on IRC, or if you’re reading that after we’ve spun up our new Matrix overlord, I’ll probably be moving there. Oh, and if you’re in the Mozilla secret club, I suppose I won’t ignore you if you hit me (@sfink) up on Slack either.

Anyway, we’re going need to download some stuff into this image, which means we need a network. Mine didn’t start with a network. I don’t know much about Docker, but this got me a network:

ifconfig to figure out your local IP address, or do it some other way. My IP was 10.0.0.14.
```
docker network create -o "com.docker.network.bridge.host_binding_ipv4"="10.0.0.14" my-network
```
, replacing the “10.0.0.14” with your own IP and, if you wish, “my-network” with something cooler-sounding. That’ll spit out some monstrous ID like 1793d9caad6d5973922b7a78ae11a2bce6005781ca18c0e253d1c2c5317f5c93 that you have to read out in Pig Latin in under 5 seconds. Or you can just ignore it.
docker ps to get the ID of your running container. (Or add -a if you’re going to be running a container you’ve created already.) Call that $CONTAINER_ID.
docker network connect my-network $CONTAINER_ID

Come to think of it, I only did that once with an old container I’m not longer using, and all of the new containers I’ve created come up with a functioning network from the get-go. So you can probably ignore all of the above.

Grafting your source into your container

Now that you have a container with a network running and everything, it’s time to throw it out and start over. I did say “don’t do that”, remember?

The next goal is to start up a container with your local source tree bind-mounted. Let’s call the absolute path to your checkout $SRCDIR.

Let’s expand your container-creating command to something like:
```
docker run -ti -v $SRCDIR:/builds/worker/source:z $IMAGE bash
```
[Note 2]
But don’t run that either. Or at least, don’t run it if you actually are trying to run the gdb task, because it requires some extra privileges in order to do the right ptrace magic.

Here’s the actual command I use:

docker run -ti -v $SRCDIR:/builds/worker/source:z --cap-add=SYS_PTRACE --security-opt seccomp=unconfined $IMAGE bash

Ignoring the gdb ptrace goop, what that’s doing is bind-mounting $SRCDIR on your host so that it shows up at /builds/worker/source within your container, and additionally does the fixup necessary for selinux to allow you to then access the data from within the container. If you’re worried about stuff running within the container messing up your source checkout, you could add ,ro to the volume portion of that command:

docker run -ti -v $SRCDIR:/builds/worker/source:z,ro --cap-add=SYS_PTRACE --security-opt seccomp=unconfined $IMAGE bash

. But honestly, I’ve never tried doing that yet.

Snarfing taskcluster initialization

Hopefully, you now have a shell open in a container that is basically identical to what runs in taskcluster. You’re home free, right?

Not so fast. Taskcluster does some magic setup, I’m not entirely sure how, to provide an environment with a bunch of important settings that don’t come with a default shell. I figured out a bunch of stuff you could do manually to replicate this environment. Here’s a list of steps that I recommend you do not take:

Go back to your push on treeherder.
Click on the Task link in the bottom left pane.
Expand the “payload” section.
Somehow convert the whole “env” section to environment variable setting commands. I used to save the whole payload as a JSON file /tmp/task.json, then run
```
perl -lne 'if (/"env"/ .. /^\s*\}/) { print "export $1='\''$2'\''" if /"(.*?)": "(.*)"/ }' /tmp/task.json
```
Cut & paste that into the shell running on your container.
Also cut & paste
```
export TASKCLUSTER_ROOT_URL=https://firefox-ci-tc.services.mozilla.com
```
to prevent it from attempting to access stuff via internal URLs that won’t work from your desktop.
Now grab the “command” key from that payload and stitch it together into a shell command to paste…

sfink @ Mozilla One more Blog.mozilla.com weblog than you need

Archives