Main menu:

Site search

Categories

Archive

Tamarin Tracing Internals V: Running Compiled Traces

Whew. Reading all this TT code is fascinating, but also tiring, hard work. Anyway, I’ve hit almost all the high points by now, and I’ve traced out the JITting process all the way from ABC bytecode to native compiled traces. The questions I have left are about how traces actually get run, plus some related questions I’ve avoided about what side exits really are and how they work.

Running Traces. The initial entry point into compiled code is back in Interpreter::loopedge, the same method that initiates tracing (see Part III). loopedge always checks to see if there is a compiled trace for this loop header. If so, it executes the compiled trace. (Look for the label callfrag.) Here’s the call:

lr = (*u.func)(&state, 0);

The first argument is a pointer to the interpreter state. I think the second is something used only in debug modes. The result is a pointer to GuardRecord, which is defined in Assembler.h. The comment reads: “These objects lie directly in the native code pages and are used to house state information across the edge of side exits from a fragment.”

The key member of GuardRecord is Fragment* target, which gives the destination fragment (loop header) of the exit. If the destination is not a loop header (target == 0), the destination will be made into a fragment so that it can be traced if it becomes hot. The destination fragment will then get its count incremented, and if it is now hot, tracing starts immediately.

Trace Exits: LIR. I need to back up a bit in order to fully understand how trace exits work.

During trace recording, branch instructions (e.g., IL LBRT) require special handling. The trace is linear, so we just generate straight-line LIR according to the branch that was actually taken. This is fundamental-we are guessing that since we took a certain branch now on a hot trace, we’ll probably take the same branch many times more, so the program will run fast if we generate straight-line code for this case. But of course, on any future execution, we’re not guaranteed to take the same branch again, so when we pass this point, we have to do the check again and exit the trace if the we get the opposite result. The check is called a guard, and the exit is called a side exit. Here is an example from IL->LIR trace generation debug output:

T 11D6BE  BRF   -8:0 -3:10AF520 -3:10AD150 -3:10AD240 -2:0 -2:0 -3:10AD240 d:10
35 imm   #0
36 eq    33,#0
GG: ip 11D6C2 sp 100E0B4 rp 100616C
45 xf    36 -> 11D6C2

The IL instruction is BRF, “branch if top-of-stack is false”. In this case, the top of the stack is d:10, i.e., 10.0, so the interpreter doesn’t take the branch. But we’re more interested in tracing. Tracing of branch instructions is implemented by Interpreter::jump_if. First, jump_if emits LIR for the test, specifically to test if the top of the stack is zero. This is the “imm #0″ and “eq 33, #0″.

Now comes the scary part, calling Interpreter::guard. I would tend to consider the effect of this function more to be generation of a trace exit, but it’s called guard, probably because it generates the branch instruction for side exits. But it is also used for LIR_loop instructions, which don’t even really have guards.

Naming questions aside, for side exits, as in our example, the first thing guard does is print out the “GG” line (if in debug mode). The rest of the line shows some interpreter states, and is probably helpful if debugging TT. Next, guard generates a SideExit structure (Fragmento.h) inline with the LIR to describe the exit. The SideExit records:

  • The interpreter state (frame, stack, return, and instruction pointers) as offsets from the interpreter state at the start of the trace.
  • The trace.
  • The target of the exit as a fragment, i.e., the (potential) start of another trace.
  • The current ActionScript call depth.

This records interpreter state that is not otherwise encoded. When I went over LIR generation and optimization, I realized that the LIR contains all the store instructions needed to maintain the current interpreter stack data. (Some are optimized away in the dead store elimination pass.) But the LIR doesn’t update the interpreter state’s fp, sp, rp, or ip. At every exit we might be going back to the interpreter, so we need to recreate the full interpreter state. The SideExit contains the necessary information.

After writing the SideExit, guard generates an LIR branch instruction. In our example, we should exit if the test is false, so we generate an LIR_xf. Note the gap in instruction sequence numbers-this is because of the space taken up by the SideExit.

guard handles LIR_loop exits (jumps to the trace header) a little differently. Instead of writing a SideExit, guard emits LIR instructions that directly update the interpreter state. I’m not entirely sure why this is. I also think that in most cases, no adjustments are required, because the interpreter stack size and types should be the same every time control pases a given point. It may have something to do with recursion.

Trace Exits: Native Code. A trace exit in LIR is a LIR_xf, LIR_xt, LIR_x, or LIR_loop. These all have cases inside Assembler::gen. For xf, xt, and x, the assembler calls asm_exit to generate exit target code, then generates native JMP/JE/JNE/Jx instructions that branch to the target. For loop, the assembler just generates a JMP instruction.

asm_exit is hard to understand, but I think I have the gist of it. The key action is calling nFragExit, which generates the exit target code. This code is generated on a separate page that is allocated for trace exits at the beginning of assembly (_nExitIns is the current position). nFragExit takes the SideExit struct as its argument. The SideExit gives the target of the exit as a Fragment, which is a loop/trace header that may or may not have a compiled trace. Reading backwards, nFragExit generates code to:

  • Update the interpreter state using the offsets recorded in the SideExit.
  • Ensure that param 0 of the trace is stored in the standard param 0 argument passing register. This is needed if the exit code is ever set up to jump directly to another trace-that trace will expect param 0 in the usual place. (Param 0 is a pointer to the interpreter state.)
  • Return a newly created GuardRecord (Assembler.h). The GuardRecord is the native code equivalent of a SideExit. Like SideExit, it is stored inline with the code (the native exit code). The GuardRecord is created by placeGuardRecord and holds the current fragment, target fragment, and call depth.
  • Restore the ISA stack pointer (x86 esp).
  • Jump to the trace epilog.

The trace epilog, by the way, is the same for every trace, and on x86 it pops the ISA frame pointer (efp; twice, because it is pushed twice for some alignment reason) and returns. This is just the “second half” of the standard C return-from-function sequence.

The exit code can be summarized as updating the interpreter state and then both doing the “first half” of return-from-function and preparing a function call to another trace. That way, the ending JMP can be pointed at either the main exit to the interpreter, or made to jump directly to another trace, and either works fine.

Another detail is that if the target of the exit has already been compiled to native code, instead of generating a jump to the trace epilog, nFragExit generates a jump directly to the target trace. (It also skips creating the GuardRecord). This is nice because then the code doesn’t have to return to the interpreter at all, it just keeps executing native code.

asm_exit wraps the call to nFragExit with a pair of calls to swapptrs. This is a macro defined in Native*.h that swaps the pointer to the current position in the native trace code buffer (_nIns) and the current position in the native exit code buffer (_nExitIns). This is just so the macros that generate code can always refer to _nIns as the place to store native code.

Finally, asm_exit does a bunch of fancy register allocation stuff. I don’t completely understand it, but I think it’s just needed because the register allocation algorithm is a greedy algorithm for straight line code, and it needs a little tweak when there is a branch. It looks like asm_exit first saves a copy of the allocations and then clears them out so the exit code area has a clear set to to work with, as it should (the only data passed out of the exit via registers are the return value and param 0, which the exit code does set up). Once nFragExit returns, the register tracker now has some allocations for values that are needed in the exit code if any. At this point, mergeRegisterState is called with the current register tracker and the saved tracking data to fix everything up. The fixing is basically that if the exit code expects, say, ecx, to contain a certain value, and the main trace has a different value in ecx, a move needs to be generated at the start of the exit code to get the exit code’s value into ecx.

Reentrancy. One last thing I want to think about is the issue of reentrancy. We’ve been told that TT isn’t reentrant. Specifically, a native method (implemented in C++) can’t call back into ActionScript. But I never clearly understood why this is. I’ll probably be wrong about half of this: experts, please jump in and correct.

The problem could exist at multiple levels, but I think the simplest issue is that the interpreter isn’t reentrant, for the usual reasons of having interpreter-global data structures. For example, a reentrant interpreter would need to have a mechanism for recording the reentry on some sort of stack. Also, if the native method interacts with the Forth stack, the system would need to be very careful about managing that. None of this seems fundamental, just tricky and not done yet.

The other question is what happens to tracing with reentry. One possibility is to stop tracing when entering a possibly reentrant native method, and then possibly start tracing when a native method calls back into ActionScript (i.e., consider a reentry to be a fragment header). This seems like it would work. Another possibility is to allow some declarations on native methods to describe their effects on the interpreter state, so that tracing could actually continue through the reentrant calls. Such a mechanism sounds hard to use, though, and would probably be used only on really important methods in a few places, if at all.

Comments

Comment from Boris
Time: May 28, 2008, 7:16 pm

Hmm. Most of the commonly-used DOM methods can potentially reenter JS (e.g. via mutation events), though they usually do not. What does that mean in terms of tracing the calls into those methods?

Comment from Tom Reilly
Time: May 29, 2008, 6:42 am

Great posting! TT isn’t re-entrant b/c we haven’t implemented it. Ie, interpMethodEnv assumes its using the whole stack and bad things happen if interpMethodEnv is re-entered on the same thread. Initially we’ll probably just exit in process traces on re-entrance but traces recorded in the re-entrant call will work, we could do more but that would be a good start. We never plan to trace across re-entrant calls with unknown arbritrary C code between the exit of the VM and the next interpMethodEnv we can’t guarantee a coherent trace.
Philosophically re-entrance should be avoided for that reason but in practice it will be necessary for certain situations.

Comment from Rick Reitmaier
Time: May 30, 2008, 8:01 am

LIR_loop exits update the interpreter state because we don’t
have logic that hoists instructions out into a loop header.
So when the code jumps back to the top of the trace its
re-loading sp/rp etc, out of InterpState and thus we need
to keep this structure current.

On the issue of reentrancy and tracing. I can see how we
could pause/resume or halt a trace upon detecting reentrancy,
but it’s unclear to me how we can guarantee correctness when
tracing into a re-entrant call, if I understood you correctly.

Comment from dmandelin
Time: May 30, 2008, 10:26 am

LIR_loop: What I didn’t understand is that if the jump to the header is a local jump, then I think the sp/rp have to have the same values as they did on entry, so no adjustment would be needed. And I haven’t seen adjustments yet in the debugging output for my trivial examples. Under what circumstances do they occur? Recursive calls? That’s another thing that I noticed in there but didn’t totally figure out.

Reentry: I think I get the gist of what you’re saying, but it feels like there is still a big gap in my understanding. Maybe I don’t know enough about what effect native methods can have on the state–I didn’t look into them too much. Could you give an example of an effect native methods can have that break tracing?

I think at the Tamarin summit you mentioned something along the lines of splitting the native method into 2 parts, a ES or Forth part and a native part, and then put the callback in the ES/Forth part. In principle this seems like a good solution, I guess the only question is what does it actually look like with the callbacks in question.

Comment from Edwin Smith
Time: June 3, 2008, 12:02 pm

As you suspected, the reason LIR_loop updates the interp state is for recursion. If we detect recursion then you’ll be adding some amount to sp, rp, and f each time through the loop. Recursion also forms a loop when unwinding, subtracting from the pointers.

recursion is detected by isRecursive() in the trace code for ENTERABC and EXITABC; if we’re calling into or returning into a method that is already on the call stack, then we treat the ENTERABC or EXITABC as a backwards jump.

You might find it odd that SideExit has data to update the interpreter state, but the stores to the stacks are on the main trace. This is an arbitrary decision, which is worth revisiting at some point. We could store the interpState updates on the main trace too, and then SideExit would be just the target and other sundry items. Then, store removal and store sinking could enable those stores to move to the side exits the same way stack stores could be moved.

Comment from Edwin Smith
Time: June 10, 2008, 11:21 am

> Could you give an example of an effect native methods can have that break tracing?

Say the call stack is Forth -> Native C++ -> Forth

We dont have a way of inlining the native C++ code, which would be required if we want to continue tracing down into the callee Forth code.

If we could inline the native C++ code, the inlined code would construct the C++ stack frame, such that if a Forth side exit were taken by teh callee, you’d return to the interpter, which could “return” to the native C++, and eventually retrun back to the outer interpreter.

We haven’t attempted anything like that. The native C++ call is traced as a call rather than being inlined, which means the inner forth code would be interpreted even if the outer forth code was traced. (altho the inner forth code could start looping, and be independently traced). For simultaneous tracing to work, the Interpreter instance in AS3 would need to not be global but rather be stack allocated or at least one per C++ interpreter stack frame.

Comment from dmandelin
Time: June 10, 2008, 12:17 pm

So it sounds like even the limitations on tracing aren’t fundamental, but require increasingly hard engineering. Good to hear.

Comment from unutulmaz
Time: March 4, 2010, 5:48 pm

On the issue of reentrancy and tracing. I can see how we
could pause/resume or halt a trace upon detecting reentrancy,
but it’s unclear to me how we can guarantee correctness when
tracing into a re-entrant call, if I understood you correctly.