Tamarin Tracing Internals V: Running Compiled Traces
Whew. Reading all this TT code is fascinating, but also tiring, hard work. Anyway, I’ve hit almost all the high points by now, and I’ve traced out the JITting process all the way from ABC bytecode to native compiled traces. The questions I have left are about how traces actually get run, plus some related questions I’ve avoided about what side exits really are and how they work.
Running Traces. The initial entry point into compiled code is back in Interpreter::loopedge, the same method that initiates tracing (see Part III). loopedge always checks to see if there is a compiled trace for this loop header. If so, it executes the compiled trace. (Look for the label callfrag.) Here’s the call:
lr = (*u.func)(&state, 0);
The first argument is a pointer to the interpreter state. I think the second is something used only in debug modes. The result is a pointer to GuardRecord, which is defined in Assembler.h. The comment reads: “These objects lie directly in the native code pages and are used to house state information across the edge of side exits from a fragment.”
The key member of GuardRecord is Fragment* target, which gives the destination fragment (loop header) of the exit. If the destination is not a loop header (target == 0), the destination will be made into a fragment so that it can be traced if it becomes hot. The destination fragment will then get its count incremented, and if it is now hot, tracing starts immediately.
Trace Exits: LIR. I need to back up a bit in order to fully understand how trace exits work.
During trace recording, branch instructions (e.g., IL LBRT) require special handling. The trace is linear, so we just generate straight-line LIR according to the branch that was actually taken. This is fundamental-we are guessing that since we took a certain branch now on a hot trace, we’ll probably take the same branch many times more, so the program will run fast if we generate straight-line code for this case. But of course, on any future execution, we’re not guaranteed to take the same branch again, so when we pass this point, we have to do the check again and exit the trace if the we get the opposite result. The check is called a guard, and the exit is called a side exit. Here is an example from IL->LIR trace generation debug output:
T 11D6BE BRF -8:0 -3:10AF520 -3:10AD150 -3:10AD240 -2:0 -2:0 -3:10AD240 d:10
35 imm #0
36 eq 33,#0
GG: ip 11D6C2 sp 100E0B4 rp 100616C
45 xf 36 -> 11D6C2
The IL instruction is BRF, “branch if top-of-stack is false”. In this case, the top of the stack is d:10, i.e., 10.0, so the interpreter doesn’t take the branch. But we’re more interested in tracing. Tracing of branch instructions is implemented by Interpreter::jump_if. First, jump_if emits LIR for the test, specifically to test if the top of the stack is zero. This is the “imm #0″ and “eq 33, #0″.
Now comes the scary part, calling Interpreter::guard. I would tend to consider the effect of this function more to be generation of a trace exit, but it’s called guard, probably because it generates the branch instruction for side exits. But it is also used for LIR_loop instructions, which don’t even really have guards.
Naming questions aside, for side exits, as in our example, the first thing guard does is print out the “GG” line (if in debug mode). The rest of the line shows some interpreter states, and is probably helpful if debugging TT. Next, guard generates a SideExit structure (Fragmento.h) inline with the LIR to describe the exit. The SideExit records:
- The interpreter state (frame, stack, return, and instruction pointers) as offsets from the interpreter state at the start of the trace.
- The trace.
- The target of the exit as a fragment, i.e., the (potential) start of another trace.
- The current ActionScript call depth.
This records interpreter state that is not otherwise encoded. When I went over LIR generation and optimization, I realized that the LIR contains all the store instructions needed to maintain the current interpreter stack data. (Some are optimized away in the dead store elimination pass.) But the LIR doesn’t update the interpreter state’s fp, sp, rp, or ip. At every exit we might be going back to the interpreter, so we need to recreate the full interpreter state. The SideExit contains the necessary information.
After writing the SideExit, guard generates an LIR branch instruction. In our example, we should exit if the test is false, so we generate an LIR_xf. Note the gap in instruction sequence numbers-this is because of the space taken up by the SideExit.
guard handles LIR_loop exits (jumps to the trace header) a little differently. Instead of writing a SideExit, guard emits LIR instructions that directly update the interpreter state. I’m not entirely sure why this is. I also think that in most cases, no adjustments are required, because the interpreter stack size and types should be the same every time control pases a given point. It may have something to do with recursion.
Trace Exits: Native Code. A trace exit in LIR is a LIR_xf, LIR_xt, LIR_x, or LIR_loop. These all have cases inside Assembler::gen. For xf, xt, and x, the assembler calls asm_exit to generate exit target code, then generates native JMP/JE/JNE/Jx instructions that branch to the target. For loop, the assembler just generates a JMP instruction.
asm_exit is hard to understand, but I think I have the gist of it. The key action is calling nFragExit, which generates the exit target code. This code is generated on a separate page that is allocated for trace exits at the beginning of assembly (_nExitIns is the current position). nFragExit takes the SideExit struct as its argument. The SideExit gives the target of the exit as a Fragment, which is a loop/trace header that may or may not have a compiled trace. Reading backwards, nFragExit generates code to:
- Update the interpreter state using the offsets recorded in the SideExit.
- Ensure that param 0 of the trace is stored in the standard param 0 argument passing register. This is needed if the exit code is ever set up to jump directly to another trace-that trace will expect param 0 in the usual place. (Param 0 is a pointer to the interpreter state.)
- Return a newly created GuardRecord (Assembler.h). The GuardRecord is the native code equivalent of a SideExit. Like SideExit, it is stored inline with the code (the native exit code). The GuardRecord is created by placeGuardRecord and holds the current fragment, target fragment, and call depth.
- Restore the ISA stack pointer (x86 esp).
- Jump to the trace epilog.
The trace epilog, by the way, is the same for every trace, and on x86 it pops the ISA frame pointer (efp; twice, because it is pushed twice for some alignment reason) and returns. This is just the “second half” of the standard C return-from-function sequence.
The exit code can be summarized as updating the interpreter state and then both doing the “first half” of return-from-function and preparing a function call to another trace. That way, the ending JMP can be pointed at either the main exit to the interpreter, or made to jump directly to another trace, and either works fine.
Another detail is that if the target of the exit has already been compiled to native code, instead of generating a jump to the trace epilog, nFragExit generates a jump directly to the target trace. (It also skips creating the GuardRecord). This is nice because then the code doesn’t have to return to the interpreter at all, it just keeps executing native code.
asm_exit wraps the call to nFragExit with a pair of calls to swapptrs. This is a macro defined in Native*.h that swaps the pointer to the current position in the native trace code buffer (_nIns) and the current position in the native exit code buffer (_nExitIns). This is just so the macros that generate code can always refer to _nIns as the place to store native code.
Finally, asm_exit does a bunch of fancy register allocation stuff. I don’t completely understand it, but I think it’s just needed because the register allocation algorithm is a greedy algorithm for straight line code, and it needs a little tweak when there is a branch. It looks like asm_exit first saves a copy of the allocations and then clears them out so the exit code area has a clear set to to work with, as it should (the only data passed out of the exit via registers are the return value and param 0, which the exit code does set up). Once nFragExit returns, the register tracker now has some allocations for values that are needed in the exit code if any. At this point, mergeRegisterState is called with the current register tracker and the saved tracking data to fix everything up. The fixing is basically that if the exit code expects, say, ecx, to contain a certain value, and the main trace has a different value in ecx, a move needs to be generated at the start of the exit code to get the exit code’s value into ecx.
Reentrancy. One last thing I want to think about is the issue of reentrancy. We’ve been told that TT isn’t reentrant. Specifically, a native method (implemented in C++) can’t call back into ActionScript. But I never clearly understood why this is. I’ll probably be wrong about half of this: experts, please jump in and correct.
The problem could exist at multiple levels, but I think the simplest issue is that the interpreter isn’t reentrant, for the usual reasons of having interpreter-global data structures. For example, a reentrant interpreter would need to have a mechanism for recording the reentry on some sort of stack. Also, if the native method interacts with the Forth stack, the system would need to be very careful about managing that. None of this seems fundamental, just tricky and not done yet.
The other question is what happens to tracing with reentry. One possibility is to stop tracing when entering a possibly reentrant native method, and then possibly start tracing when a native method calls back into ActionScript (i.e., consider a reentry to be a fragment header). This seems like it would work. Another possibility is to allow some declarations on native methods to describe their effects on the interpreter state, so that tracing could actually continue through the reentrant calls. Such a mechanism sounds hard to use, though, and would probably be used only on really important methods in a few places, if at all.