06
Aug 25

Brief love note to jj and rr: leveraging history

Last week, I went on vacation.

That is not the point.

Closer to the point is that I returned to work this week, and resumed working on tracking down a bug that I had been looking at just before going on vacation. I knew I had captured the bug in my latest rr recording, but I had moved off to other things since making that recording and so my source checkout no longer matched what my debugged binary was built from.

rr doesn’t really care, so it let me replay the recording. I had all my old log messages, tied to the exact point in time they corresponded to. (I get that from my .gdbinit setup, but that’s not all that relevant here.) So I could jump to the points in time that I cared about. But gdb showed me the wrong source, so I found myself staring at a blank line just after a comment, and I’m pretty sure my problem does not originate from a blank line after a comment.

So what to do? I wanted to find what version of the source code I had checked out. If I were working on Firefox, it would have burned a build ID into the binary. But I wasn’t.

Long story short, jj’s operation log came to my rescue here. First, I looked at my recording’s timestamp: `ls -ltr ~/.rr/latest-trace/`. (Yours might be in `~/.local/share/rr` or somewhere. Actually, mine are too, I just have a symlink.) Each entry is associated with a time range, and presumably I had the correct version checked out when I made the recording.

I scanned through the operations log to find the operation ID from that moment in time: `jj op log -T 'id.short(8) ++ " " ++ time ++ " " ++ description ++ "\n"'`. That gave me operation ID `55731d73`.

I have two workspaces, and I can’t even remember which one I was using at the time, so I displayed my working copy for both: `jj log -r 'at_operation(55731d73, working_copies())'`. That gave me two git commit hashes, `3c79c068` and `a1a9d12b`.

With my rr replay session running, I tried `jj edit 3c79c068` and checked where I was. Nope, still floating in whitespace. (Note that I am talking about code here, not my racial background. Leave me alone.) But `jj edit a1a9d12b` did the trick! My checkpoints were in actual code! That I wrote! Incorrectly!

Anyway, nothing earth-shattering here. But it made me happy. And perhaps it will be useful to other people with a need for resurrecting their forgotten past.

03
Jul 25

Effectful Logging

These recent blog posts are veering in the “here’s a horrible thing I just did!” direction. No apologies.

Recently, I was working on a weird problem where I wanted to snapshot /proc/$pid/maps before and after a couple of mmap and madvise calls. But I didn’t particularly want to write C++ code to do it. So:


JS_LOG_FMT(debug, Info, "About to mmap at {:x}", ptr);
JS_LOG_FMT(debug, Info, "SKYNET: mkdir /tmp/js-{0}/before /tmp/js-{0}/after", getpid());
JS_LOG_FMT(debug, Info, "SKYNET: cp /proc/{0}/maps /tmp/js-{0}/before", getpid());
sleep(3);
mmap(...);
JS_LOG_FMT(debug, Info, "SKYNET: cp /proc/{0}/maps /tmp/js-{0}/after", getpid());
sleep(3);

That produces “SKYNET: …” log messages, with a pause. If only someone were reading those log messages and quickly cutting & pasting the commands… let’s give them 3 seconds to do each.

Then I run in my terminal:

MOZ_LOG=debug:5 $JS blahblah.js |& perl -lpe 'system($1) if /SKYNET: (.*)/'

Whenever one of these messages is produced, the Perl script grabs it and runs it in a new shell. Victory!

Note: the Skynet reference is from way back when we all watched Terminator and thought that it would be possible to prevent the AIs from taking over the world just by keeping one company from giving it so much power, in the form of tools and permission to do arbitrary things. We didn’t predict that in only a few decades, thousands of people would be doing exactly that on a daily basis.


16
Jun 25

small Speedometer3 performance investigation

I recently looked into a slight performance regression after landing a string-related patch. I’ve never been sure what the best process is to look into these things, so I thought I’d at least write down what I came up this time in case it’s useful to future-me or some helpful reader can tell me I’m being an idiot and there’s a Much Better Way. (Other than padenot, who has already implemented a Much Better Way for part of this. I’ve only just begun to try using it, and will perhaps do more in a later post.)

Note that the formatting in the post is really awful, and is motivating me to get around to migrating my blog to someplace I can control even though it probably means I’ll lose my Planet Mozilla syndication in the process (because I’m not sure anyone maintains it anymore?) Bye bye in advance! I plan to cross-post for a while, but given that I had 0 posts for 3 years, that has little meaning. If you can, you’re probably better off reading this post via a feed reader or something rather than going to the blog.m.o site.

The basic issue: I landed a patch, I got an alert saying I had improved 3 subtests and regressed another 3. I was curious, so I dug into the first (and largest) regression.

I first opened the “before” and “after” links in separate tabs, and added “after – ” and “before – ” prefixes to their names so I could tell them apart. (Note: my actual process was way clumsier than I’ll be describing, involving me repeatedly confusing myself over which profile was which, until I finally did what I say here.)

That done, I went to the marker chart and flipped back and forth between them to sort of eyeball-animate the change from before to after. That was confusing because the “after” seemed slightly faster. The marker chart is clogged with lots of “CSS transition” markers, so I filtered those out with -name:CSS in the filter field. (Actually, I flailed around for a while to figure out what the syntax of that field was first, and ended up creating a PR to add my findings to the documentation.)

That showed me that the profile contains 10 iterations of sp3. I tried looking at just one in each profile, but that was pointless, since each iteration goes through all of the sp3 tests and my regressing test gets lost in the noise. Instead, I used the devtools console and grabbed out the markers I wanted by name:
filteredMarkers.filter(m => m.data.name === "Charts-chartjs.Draw scatter-async")
Later on, I instead filtered by name in the UI:
-name:CSS,Draw scatter-async which later turned into Draw scatter-async,-name:-end,-name:-start
to only look at the interval markers rather than the older `-start` and `-end` markers that now just serve as noise.

But anyway, back to the console, I looked at the total duration of just that subtest:
filteredMarkers.filter(m => m.data.name === "Charts-chartjs.Draw scatter-async").map(m => m.end - m.start).reduce((acc, v) => acc + v, 0) / 10
That showed the “after” being 8.5% slower, which is close enough to 5% that I felt like I was looking at the right thing.

Staring at the data (by running the above without the reduce), I noticed that all of the “after” runs were about 350µs except for one that took 657µs, and did the math to see that the single long run was responsible for all of the difference (plus a tiny bit more). I did some further investigation (which you can see in the bug) to diagnose this further, but that’s not relevant to this post which is really about profiler gymnastics.

I guess for sake of completeness, “doing the math” was something like:

  1. Run the expression to extract my durations from filteredMarkers
  2. Left-click the result to select it, right-click and choose “Store as Global Variable”.
  3. Drop out the outlier (element 9) and recompute the mean: tmp = [...temp0]; tmp.splice(9); tmp.reduce((s,v) => s + v, 0) / tmp.length

And just because it seems I can’t bear to submit this post, here’s my first experience with Fouineur:

  • Install the add-on
  • With the profile open, click on the add-on puzzle piece to dropdown the menu that is there to hide useful stuff from me
  • Click on Fouineur to open up its edit window
  • Minimize the edit window because it prevents me from interacting with the profile
  • Filter down to just the markers I want (I did this in the UI because it seemed like it might be harder to exclude the “-start” and “-end” nuisance markers with Fouineur, now that I had learned how the profiler’s filter syntax works
  • Struggle with the Fouineur editor because it prevents adding a blank line, which is problematic when the syntax requires a blank line between match expressions and actions. Eventually discovered that shift-Enter forces the issue.
  • Used the match UserTiming and the action plot(UserTiming) to get a cool little graph! Yay!

But I was already done with the analysis, I just felt guilty about not trying out Fouineur when I knew it was meant for pretty much what I was doing. Now I know enough to try it for real next time. It looks pretty awesome so far.


07
May 25

Sinful Debugging

Recently, I was debugging my SpiderMonkey changes when running a JS test script, and got annoyed at the length of the feedback cycle: I’d make a change to the test script or the C++ code, rerun (under rr), go into the debugger, stop execution at a point where I knew what variable was what, set convenience variables to their pointer values, then run to where the interesting stuff was happening.

One part I had already solved: if I have some JS variables, say base and str, and I want to capture their pointer values in the debugger, I’ll call Math.sin(0, base, str) and set a breakpoint on math_sin. Why the leading 0? In the past, I’d run into problems when Math.sin converted its first argument to a number, which disrupted what I was trying to look at. So now I feed it a number first and put my “real” stuff after it where it’s ignored, even when it doesn’t matter (and it usually doesn’t).

But it’s painful to get set up again. It looks something like:

(rr) b math_sin
(rr) c # ontinue
(rr) p vp[3].toString()
$1 = (JSString *) 0x33315dc003c0
(rr) set $base=$
(rr) p vp[4].toString()
$2 = (JSString *) 0x33315dc01828
(rr) set $str=$

Yay, now I can do things like p $str->dump() and it will work!

But even worse, sometimes I’d want to add or remove variables. So I hacked around it:

First, instead of just passing in the variables, I pass in names along with them. An actual example:

Math.sin(0, "ND2", ND2, "TD3", TD3, "NB4", NB4, "NB5", NB5, "TD6", TD6);

(yes, those names mean something to me). Then the setup becomes more like:

(rr) b math_sin
(rr) c # ontinue
(rr) p vp[3]
$7 = $JS::Value("ND2")
(rr) set $ND2=vp[4].toString()
(rr) p vp[5]
$7 = $JS::Value("TD3")
(rr) set $ND2=vp[6].toString()

Ok, that’s longer, and requires cutting & pasting. I know, shut up.

The next set is to automate with a gdbinit script. Here’s a slightly modified version of mine:

define mlabel
  set $_VP=vp
  python
import re
argc = int(gdb.parse_and_eval("argc"))
for i in range(3, argc + 2, 2):
  namer = f"$_VP[{i}]"
  m = re.search(r'::Value\("(.*?)"',
                str(gdb.parse_and_eval(namer)))
  if not m:
    print(f"Failed to match: {namer}")
    continue
  name = m.group(1)
  setter = f"set ${name}=$_VP[{i+1}].toGCThing()"
  gdb.execute(setter)
end
end
document mlabel
Special-purpose tool for grabbing out things passed to Math.sin(0, "name1", val1, "name2", ...) and converting them to labels.
end

“mlabel” stands for “multi-label”, because it… well, it doesn’t label anything, but in my real version, it runs my own command

label name=value

that does some other magic besides setting a gdb convenience variable (yes, they’re actually called that).

I don’t remember why I went through the extra step of setting a $_VP variable rather than using vp directly. But it’s probably specific to my scenario, so you’ll have to adapt this anyway. This post is meant more to give you an idea.

The result is my JS test code talking to my gdb session and spilling its secrets. Now when I’m debugging the interesting stuff, I can do (rr) print $TD3->dump() and it will do something useful.

Here’s a log of an actual session:

--------------------------------------------------
 ---> Reached target process 3902272 at event 14.
--------------------------------------------------
(rr) Working directory /home/sfink/src/mozilla-ff/js/src/jit-test/tests/gc.
(rr) pretty
Loading JavaScript value pretty-printers; see js/src/gdb/README.
If they cause trouble, type: disable pretty-printer .* SpiderMonkey
SpiderMonkey unwinder is disabled by default, to enable it type:
	enable unwinder .* SpiderMonkey
(rr) b math_sin
Breakpoint 1 at 0x564d91b3942b: file /home/sfink/src/mozilla-ff/js/src/jsmath.cpp, line 649.
(rr) c
Continuing.

Thread 1 hit Breakpoint 1, math_sin (cx=cx@entry=0x7f20e5d3a200, argc=11, vp=0x7f20d5ba8168)
    at /home/sfink/src/mozilla-ff/js/src/jsmath.cpp:649
stopped at breakpoint 1: (N/A) -> (N/A)
(rr) mlabel
all occurrences of 0x33315dc003c0 will be replaced with $ND2 of type js::gc::Cell *
all occurrences of 0x38deaf678670 will be replaced with $TD3 of type js::gc::Cell *
all occurrences of 0x33315dc00340 will be replaced with $NB4 of type js::gc::Cell *
all occurrences of 0x33315dc00188 will be replaced with $NB5 of type js::gc::Cell *
all occurrences of 0x38deaf678658 will be replaced with $TD6 of type js::gc::Cell *
(rr) b promoteString
Breakpoint 2 at 0x564d92737698: file /home/sfink/src/mozilla-ff/js/src/gc/Tenuring.cpp, line 882.
(rr) c
Continuing.

Thread 1 hit Breakpoint 2, js::gc::TenuringTracer::promoteString (this=this@entry=0x7ffcf5f9cb40, 
    src="MY YOUNGEST MEMORY IS OF A TOE, A GIANT BLUE TOE, IT MADE FUN OF ME INCESSANTLY BUT THAT DID NOT BOTHER ME IN THE LEAST. MY MOTHER WOULD HAVE BEEN HORRIFIED, BUT SHE WAS A GOOSE AND HAD ALREADY LAID T"...)
    at /home/sfink/src/mozilla-ff/js/src/gc/Tenuring.cpp:882
stopped at breakpoint 2: (N/A) -> (N/A)
(rr) p (void*)src
$1 = (void *) $NB5
(rr)