table-driven register reading in rr

Many of the changes done for porting rr to x86-64 were straightforward, mechanical changes: change this type or format directive, add a template parameter to this function, search-and-replace all SYS_* occurrences and so forth. But the way register reading for rr’s GDB stub fell out was actually a marked improvement on what went before and is worth describing in a little more detail.

At first, if rr’s GDB stub wanted to know register values, it pulled them directly out of the Registers class:

enum DbgRegisterName {
  DREG_EAX,
  DREG_ECX,
  // ...and so on.
};

// A description of a register for the GDB stub.
struct DbgRegister {
  DbgRegisterName name;
  long value;
  bool defined;
};

static DbgRegister get_reg(const Registers* regs, DbgRegisterName which) {
  DbgRegister reg;
  reg.name = which;
  reg.defined = true;
  switch (which) {
  case DREG_EAX:
    reg.value = regs->eax;
    return reg;
  case DREG_ECX:
    reg.value = regs->ecx;
    return reg;
  // ...and so on.
  default:
    reg.defined = false;
    return reg;
  }
}

That wasn’t going to work well if Registers supported several different architectures at once, so we needed to push that reading into the Registers class itself. Not all registers on a target fit into a long, either, so we needed to indicate how many bytes wide each register was. Finally, we also needed to indicate whether the target knew about the register or not: GDB is perfectly capable of asking for SSE (or even AVX) registers, but the chip running the debuggee might not support those registers, for instance. So we wound up with this:

struct DbgRegister {
  unsigned int name;
  uint8_t value[DBG_MAX_REG_SIZE];
  size_t size;
  bool defined;
};

static DbgRegster get_reg(const Registers* regs, unsigned int which) {
  DbgRegister reg;
  memset(&reg, 0, sizeof(reg));
  reg.size = regs->read_register(&reg.value[0], which, &reg.defined);
  return reg;
}
...
template<typename T>
static size_t copy_register_value(uint8_t* buf, T src) {
  memcpy(buf, &src, sizeof(src));
  return sizeof(src);
}

// GDBRegister is an enum defined in Registers.
size_t Registers::read_register(uint8_t* buf, GDBRegister regno, bool* defined) const {
  *defined = true;
  switch (regno) {
  case DREG_EAX:
    return copy_register_value(buf, eax);
  case DREG_ECX:
    return copy_register_value(buf, ecx);
  // ...many more integer and segment registers...
  case DREG_XMM0:
    // XMM registers aren't supplied by Registers, but they are supplied someplace else.
    *defined = false;
    return 16;
  // ...and so on
  }
}

The above changes were serviceable, if a little repetitive. When it came time to add x86-64 support to Registers, however, writing out that same basic switch statement for x86-64 registers sounded unappealing.

And there were other things that would require the same duplicated code treatment, too: when replaying traces, rr compares the replayed registers to the recorded registers at syscalls and halts the replay if the registers don’t match (rr calls this “divergence” and it indicates some kind of bug in rr). The x86-only comparison routine looked like:

// |name1| and |name2| are textual descriptions of |reg1| and |reg2|, respectively,
// for error messages.
bool Registers::compare_register_files(const char* name1, const Registers* reg1,
                                       const char* name2, const Registers* reg2,
                                       int mismatch_behavior) {
  bool match = true;

#define REGCMP(_reg)                                     \
  do {                                                   \
    if (reg1-> _reg != reg2-> _reg) {              \
      maybe_print_reg_mismatch(mismatch_behavior, #_reg, \
                               name1, reg1-> _reg,	   \
                               name2, reg2-> _reg);	   \
      match = false;					   \
    }							   \
  } while (0)

  REGCMP(eax);
  REGCMP(ebx);
  // ...and so forth.

  return match;
}

Duplicating the comparison logic for twice as many registers on x86-64 didn’t sound exciting, either.

I didn’t have to solve both problems at once, so I focused solely on getting the registers for GDB. A switch statement with almost-identical cases indicates that you’re doing too much in code, when you should be doing things with data instead. In this case, what we really want is this: given a GDB register number, we want to know whether we know about this register, and if so, how many bytes to copy into the outparam buffer and where to copy those bytes from. A lookup table is really the data structure we should be using here:

// A data structure describing a register.
struct RegisterValue {
  // The offsetof the register in user_regs_struct.
  size_t offset;

  // The size of the register.  0 means we cannot read it.
  size_t nbytes;

  // Returns a pointer to the register in |regs| represented by |offset|.
  // |regs| is assumed to be a pointer to |struct user_regs_struct| of the appropriate
  // architecture.
  void* pointer_into(void* regs) {
    return static_cast<char*>(regs) + offset;
  }

  const void* pointer_into(const char* regs) const {
    return static_cast<const char*>(regs) + offset;
  }
};

template<typename Arch>
struct RegisterInfo;

template<>
struct RegisterInfo<rr::X86Arch> {
  static struct RegisterValue registers[DREG_NUM_LINUX_I386];
};
struct RegisterValue RegisterInfo<rr::X86Arch>::registers[DREG_NUM_LINUX_I386];

static void initialize_register_tables() {
  static bool initialized = false;

  if (initialized) {
    return;
  }

  initialized = true;
#define RV_X86(gdb_suffix, name) \
  RegisterInfo<rr::X86Arch>::registers[DREG_##gdb_suffix] = { \
    offsetof(rr::X86Arch::user_regs_struct, name), \
    sizeof(((rr::X86Arch::user_regs_struct*)0)->name) \
  }
  RV_X86(EAX, eax);
  RV_X86(ECX, ecx);
  // ...and so forth.  Registers not explicitly initialized here, e.g. DREG_XMM0, will
  // be zero-initialized and therefore have a size of 0.
}

template<typename Arch>
size_t Registers::read_register_arch(uint8_t buf, GDBRegister regno, bool* defined) const {
  initialize_register_tables();
  RegisterValue& rv = RegisterInfo::registers[regno];
  if (rv.nbytes == 0) {
    *defined = false;
  } else {
    *defined = true;
    memcpy(buf, rv.pointer_into(ptrace_registers()), rv.nbytes);
  }

  return rv.nbytes;
}

size_t Registers::read_register(uint8_t buf, GDBRegister regno, bool* defined) const {
  // RR_ARCH_FUNCTION performs the return for us.
  RR_ARCH_FUNCTION(read_register_arch, arch(), buf, regno, defined);
}

Much better, and templating enables us to (re)use the RR_ARCH_FUNCTION macro we use elsewhere in the rr codebase.

Having to initialize the tables before using them is annoying, though. And after debugging a failure or two because I didn’t ensure the tables were initialized prior to using them, I wanted something more foolproof. Carefully writing out the array definition with zero-initialized members in the appropriate places sounded long-winded, difficult to read, and prone to failures that were tedious to debug. Ideally, I’d just specify the array indices to initialize along with the corresponding values, and let zero-initialization take care of the rest.

Initially, I wanted to use designated initializers, which are a great C99 feature that provided exactly what I was looking for, readable initialization of specific array indices:

#define RV_X86(gdb_suffix, name) \
  [DREG_##gdb_suffix] = { \
    offsetof(rr::X86Arch::user_regs_struct, name), \
    sizeof(((rr::X86Arch::user_regs_struct*)0)->name) \
  }

struct RegisterValue RegisterInfo<rr::X86Arch>::registers[DREG_NUM_LINUX_I386] = {
  RV_X86(EAX, eax),
  RV_X86(ECX, ecx),
};

I assumed that C++11 (which is the dialect of C++ used in rr) would support something like this, and indeed, my first attempts worked just fine. Except when I added more initializers:

struct RegisterValue RegisterInfo<rr::X86Arch>::registers[DREG_NUM_LINUX_I386] = {
  RV_X86(EAX, eax),
  RV_X86(ECX, ecx),
  // ...other integer registers, eflags, segment registers...
  RV_X86(ORIG_EAX, orig_eax);
};

I got this curious error message from g++: “Sorry, designated initializers not supported.”

But that’s a lie! You clearly do support designated initializers, because I’ve been able to compile code with them!

Well, yes, g++ supports designated initializers for arrays. Except that the indices have to be in consecutive order from the beginning of the array. And all the integer registers, EIP, and the segment registers appeared in consecutive order in the enumeration, starting with DREG_EAX = 0. But DREG_ORIG_EAX came some 20 members after DREG_GS (the last segment register), and so there we had initialization of non-consecutive array indices, and g++ complained. In other words, designated initializers in g++ are useful for documentation and/or readability purposes, but they’re not useful for sparse arrays. Hence the explicit initialization approach described above.

Turns out there’s a nicer, C++11-ier way to handle these kind of sparse arrays, at the cost of some extra infrastructure. Instead of an explicit [] variable, you can have a class with an operator[] overload and a constructor that takes std::initializer_list<std::pair<size_t, YourActualClass>>. The first member of the pair is the index into your sparse array, and the second member is the value you want placed there. The constructor takes care of putting everything in its place; the actual code looks like this:

typedef std::pair<size_t, RegisterValue> RegisterInit;

// Templated over N because different architectures have differently-sized register files.
template<size_t N>
struct RegisterTable : std::array<RegisterValue, N> {
  RegisterTable(std::initializer_list<RegisterInit> list) {
    for (auto& ri : list) {
      (*this)[ri.first] = ri.second;
    }
  }
};

template<>
struct RegisterInfo<rr::X86Arch> {
  static const size_t num_registers = DREG_NUM_LINUX_I386;
  typedef RegisterTable<num_registers> Table;
  static Table registers;
};

RegisterInfo<rr::X86Arch>::Table RegisterInfo<rr::X86Arch>::registers = {
  // RV_X86 macro defined similarly to the previous version.
  RV_X86(EAX, eax),
  RV_X86(ECX, ecx),
  // ...many more register definitions
};

// x86-64 versions defined similarly...

There might be a better way to do this that’s even more C++11-ier. Comments welcome!

Once this table-driven scheme was in place, using it for comparing registers was straightforward: each RegisterValue structure adds a name field, for error messages, and a comparison_mask field, for applying to register values prior to comparison. Registers that we don’t care about comparing can have a register mask of zero, just like registers that we don’t want to expose to GDB can have a “size” of zero. A mask is more flexible than a bool compare/don’t compare field, because eflags in particular is not always determinstic between record and replay, so we mask off particular bits of eflags prior to comparison. The core comparison routine then looks like:

template<typename Arch>
static bool compare_registers_core(const char* name1, const Registers* reg1,
                                   const char* name2, const Registers* reg2,
                                   int mismatch_behavior) {
  bool match = true;

  for (auto& rv : RegisterInfo<Arch>::registers) {
    if (rv.nbytes == 0) {
      continue;
    }

    // Disregard registers that will trivially compare equal.
    if (rv.comparison_mask == 0) {
      continue;
    }

    // XXX correct but oddly displayed for big-endian processors.
    uint64_t val1 = 0, val2 = 0;
    memcpy(&val1, rv.pointer_into(reg1.ptrace_registers()), rv.nbytes);
    memcpy(&val2, rv.pointer_into(reg2.ptrace_registers()), rv.nbytes);
    val1 &= rv.comparison_mask;
    val2 &= rv.comparison_mask;

    if (val1 != val2) {
      maybe_print_reg_mismatch(mismatch_behavior, rv.name, name1, val1, name2,
                               val2);
      match = false;
    }
  }

  return match;
}

which I thought was a definite improvement on the status quo. Additional, architecture-specific comparisons can be layered on top of this (see, for instance, the x86-specialized versions in rr’s Registers.cc).

Eliminating or reducing code by writing data structures instead is one of those coding tasks that gives me a warm, fuzzy feeling after I’ve done it. In addition to the above, I was also able to use the RegisterValue structures to provide architecture-independent dumping of register files, which was another big win in making Registers multi-arch aware. I was quite pleased that everything worked out so well.

Tags: ,

Comments are closed.