N-ary trees in C

If you want to implement an N-ary tree of integers in C, the obvious way to represent each node is something like the following.

struct t {
    int n;
    int numKids;
    struct t **kids;
}

Here kids is an array of kid pointers.  Here’s an example tree.

N-ary tree with kids array

This representation is a bit annoying because the kids array must be allocated separately from the node, and you need to reallocate the kids array every time you add a new kid to a node. Alternatively, you could instead over-allocate in anticipation of more nodes being added, but then you have to record the array capacity within the node.  (The latter representation is exactly what I used for the tree of stack traces in Massif.)

While reading through the code for trace-malloc yesterday, I learnt a better representation.

struct t {
    int n;
    struct t *kids;
    struct t *siblings;
}

This looks odd at first.  Here’s the example tree with this new representation.

N-ary tree with sibling pointers

From any single node you can only directly access its first kid, through the kids pointer. You access the second kid through kids->siblings, the third kid through kids->siblings->siblings, and so on.  In other words, instead of storing a node’s kids in an array, you store them in a list; it just so happens that this list can be embedded elegantly in the nodes themselves. (There’s an argument to be made that the fields should be called firstKid and nextSibling, though I can see the sense in both naming schemes.)

There are several nice things about this representation.

  • Each node is a fixed size;  no auxiliary arrays are needed.
  • Many algorithms can be expressed more easily because it’s just a binary tree.
  • You can treat each sibling list as a move-to-front list.  In the example above, if you accessed node 11 you would move nodes 4 and 11 to the front of their sibling lists. This may make subsequent accesses faster if you have locality in your access patterns. (You could also do this with the first representation, but you’d have to shuffle some of the pointers in the kids array along by one.)

It appears this is a moderately well-known representation — this question on Stack Overflow has two answers, one for each of the representations I described.  But I didn’t know about it, so I thought I’d share it.

8 Responses to N-ary trees in C

  1. Embedding the collection’s pointers in the data struct and making all structs the same size so they can be allocated from a slab is the way the linux kernel would do it.

  2. Isn’t the 2 answers to the Stack Overflow question effectively the same? Unless I’m missing something, I don’t think it shows your 1st data structure, where the parent contains pointers to all children.

    • Nicholas Nethercote

      You’re right that the first answer doesn’t match my first structure, thanks for pointing that out.

      But it’s also not the same as my second structure. It uses a list of node pointers instead of an array, i.e. it lacks the “elegant embedding” I mentioned. So I think it’s conceptually closer to my first structure. It’s also quite ugly!

  3. I first encountered these trees in the research code library for image processing Megawave 2 (from a French university), where they are extensively used. I believed that they were now a bit “old fashioned”, and it took me a bit of time to get used to them. But in the end they’re really handy, after defining a few convenience macros. Thanks for the post!

  4. Jos van den Oever

    The DOM tree in the browser may have a similar hierarchy:
    struct Node {
    struct Node *parent;
    struct Node *firstChild;
    struct Node *nextSibling;
    struct Node *prevSibling;
    // name and attributes
    };
    The ‘n’ count is replaced by using a zero pointer to indicate the end of a list. The availability of the parent node allows moving the structure and then updating the references to it. So nodes could be placed in a number of continuous arrays that can be reorganized.

  5. Here’s how the GLib does it in GNode.

    struct GNode {
    gpointer data;
    GNode *next;
    GNode *prev;
    GNode *parent;
    GNode *children;
    };

    gpointer is a void *, but you can store integers there too with GPOINTER_TO_INT.

  6. This feels familiar to me, like I’ve probably used it in code somewhere, or worked with code that uses it, but I’d have to dig around to find it. But if I did use it I think it was just out of convenience, not necessarily thinking through all the applications. I don’t think I ever really visualized the Z shape of the resulting structure either. Thanks for the nice clean layout like that–I’m a sucker for good data structures and algorithms.

  7. I believe libxml2 also utilizes this type of data structure in their trees.