Safer refactoring in Nanojit

Recently I’ve been making a lot of refactoring changes to Nanojit that I think of as “code hygiene” — changes that clean up the code, but don’t change its behaviour at all.  (Example 1, example 2.)  When making changes like these I like to do the changes carefully and gradually to make sure that I don’t affect behaviour unintentionally.  In Nanojit’s case, this means that the generated code shouldn’t change.  But it’s always possible to make mistakes, and sometimes a particular change is disruptive enough that you can’t evolve the code from one state to another without breaking it temporarily.

In such cases, if the regression tests pass it tells me that the new code works, but it doesn’t tell me if I’ve managed to avoid changing the behaviour as I intended.  So I’ve started doing comparisons of the native code generated by Nanojit for parts of SunSpider using TraceMonkey’s verbose output, which is controlled with the TMFLAGS environment variable.  For such changes, the generated code should be identical to an unchanged version.  The way I organise my workspaces is such that I always have an unchanged repository available, which makes such comparisons easy.

Except, the generated code is never quite identical because it contains addresses and the slightest change in the executable can affect these.  So I run the verbose output through this script:

perl -p -e 's/[0-9a-fA-F]{4,}/....../g'

It just replaces numbers with four or more digits with “……”, which is enough to filter out these address differences.

I tried this technique on all of SunSpider, but it turns out it doesn’t work reliably, because crypto-aes.js is non-deterministic — it contains a call to Date.getTime() which introduces enough non-determinism that TraceMonkey sometimes traces different code.  A couple of the V8 benchmarks have similar non-deterministic behaviours.  Non-determinism is a rather undesirable feature of a benchmark suite so I filed a SunSpider bug which is yet to be acted on.

Fortunately, experience has shown me that I don’t need to do code diffs on all of SunSpider to be confident that I haven’t changed Nanojit’s behaviour — doing code diffs on two or three of the bigger benchmarks suffices, as enough code is generated that even small behavioural differences are quickly found.

In fact, this code diff technique is useful enough that I’ve started using it sometimes even when I do want to change behaviour — I split such changes into two parts, one which doesn’t change the behaviour and one which does.  For example, when I added an opcode to distinguish between 64-bit integer loads and 64-bit integer floats I landed a non-behaviour-changing patch first, then did a follow-up change that improved x86-64 code generation.  This turned out to be a good idea because I introduced a regression with the follow-up change, and the fact that it was separated from the non-behavioural changes made it easy to determine what had gone wrong, because the regressing patch was much smaller than a combined patch would have been.

I also recently wrote a script to automate the diff process, which makes it that much easier to perform, which in turn makes it that much more likely that I’ll actually do it.

In summary, it’s always worth thinking about new ways to test your code, and when you find a good one, it’s worth making it as easy as possible to run those tests.

4 Responses to Safer refactoring in Nanojit

  1. Nice to hear how you verify your NJ changes. For interpreter/tracer changes that should not change behavior, I’ve been more and more trying to do a similar two-patch strategy where the first patch computes using both the new and old way and asserts they are the same and the second patch drops the old way. Running that through trace, ref, and mochi tests brings subtle issues to light early and in a particularly easy-to-debug setting.

  2. Nick: yes, using a single clone with MQ and two patches I maintain until I am ready to commit; then I qfold, qfinish, and push. With vertically-split windows in vim, laying out the original and .rej side by side, and copious visual-mode x/y/p, I find the .rej files produced by MQ conflicts to be almost as fast to deal with as the <<>> produced by 3-way merge. I find pull –rebase with multiple patches applied to be frustrating.