Category Archives: Data structures

N-ary trees in C

If you want to implement an N-ary tree of integers in C, the obvious way to represent each node is something like the following.

struct t {
    int n;
    int numKids;
    struct t **kids;
}

Here kids is an array of kid pointers.  Here’s an example tree.

N-ary tree with kids array

This representation is a bit annoying because the kids array must be allocated separately from the node, and you need to reallocate the kids array every time you add a new kid to a node. Alternatively, you could instead over-allocate in anticipation of more nodes being added, but then you have to record the array capacity within the node.  (The latter representation is exactly what I used for the tree of stack traces in Massif.)

While reading through the code for trace-malloc yesterday, I learnt a better representation.

struct t {
    int n;
    struct t *kids;
    struct t *siblings;
}

This looks odd at first.  Here’s the example tree with this new representation.

N-ary tree with sibling pointers

From any single node you can only directly access its first kid, through the kids pointer. You access the second kid through kids->siblings, the third kid through kids->siblings->siblings, and so on.  In other words, instead of storing a node’s kids in an array, you store them in a list; it just so happens that this list can be embedded elegantly in the nodes themselves. (There’s an argument to be made that the fields should be called firstKid and nextSibling, though I can see the sense in both naming schemes.)

There are several nice things about this representation.

  • Each node is a fixed size;  no auxiliary arrays are needed.
  • Many algorithms can be expressed more easily because it’s just a binary tree.
  • You can treat each sibling list as a move-to-front list.  In the example above, if you accessed node 11 you would move nodes 4 and 11 to the front of their sibling lists. This may make subsequent accesses faster if you have locality in your access patterns. (You could also do this with the first representation, but you’d have to shuffle some of the pointers in the kids array along by one.)

It appears this is a moderately well-known representation — this question on Stack Overflow has two answers, one for each of the representations I described.  But I didn’t know about it, so I thought I’d share it.