Constructing binary trees using relative branch encoding

In this third article in what is rapidly becoming a series on binary trees, we’ll have a look at another way of constructing generic binary trees from a serialised format. For this, we’ll build on some of the techniques and insights from the previous articles:

  1. Reconstructing binary trees from traversals: the initial post which dealt with constructing binary trees from pre-order and in-order sequences
  2. Creating a forest from a single seed followed up by creating different trees conforming to the same pre-order sequence.

In the first article we covered how a generic (unordered) binary tree cannot be constructed from a single depth-first search sequentialisation. Lacking structural information, a second sequentialisation is required, in-order with one or pre or post-order. This is all well and good when the values in the tree are small (measured in bytes), but as they get larger, there is a considerable overhead in this duplication of values. Compression will certainly help, but is unlikely to remove it entirely.

The second article introduced the notion of attachment loci of a tree under construction: the possible locations where the next node/value from a pre-order sequence could be placed. We only needed them for constructing tree permutations at the time, but with a few tweaks, we should be able to use them to provide the structural information to supplement the single sequentialisation.

more ...

Creating a forest from a single seed

In the previous post we explored recreating a binary tree from a pair of sequentialisations. We needed a pair of them because any single sequence by itself doesn’t uniquely describe the tree, without some additional bit of information, the sequence itself leaves a certain level of ambiguity.

But exactly how ambiguous is a single traversal result? How many different trees can we make that fit a given sequence in isolation? What sort of structure is there in them? Fun questions we can answer with code!

Seeing the trees

At this point, before we attempt to create our own forest of binary trees, it’s a good time to look into visualising the trees we plan on making. At its core, a binary tree is a specific type of graph, and there are a ton of tools out there to visualise graphs. One of the more popular open source solutions is the excellent GraphViz. There are various Python packges that provide an interface for it, with pros and cons to all of them, a review of which is well outside the scope of this post. So in short, we’ll be using PyDot, which creates graphs in GraphViz’ Dot format, which we can then have rendered to various image formats.

more ...

Reconstructing binary trees from traversals

Between work, play and other side projects, I’ve been playing around with binary trees. They’re a hugely useful data structure in many fields, but more relevantly in this case, they lend themselves to all sorts of noodling and tinkering. Among the many possible questions is the seemingly simple “how do you serialise a tree?” for inter-process communication, a web-based API or just for the hell of it. And given such a serialisation, how do you reconstruct the original tree from it?

One way is to express each node as a list of three elements: the value of the node, its left child and its right child. Each of these children is its own three-element list (or some form of empty value) and on and on it goes. The end of this serialisation will have a particular signature: ]]]]]]]. Something we have seen before.

I have nothing against brackets per se, but as some have said: “flat is better than nested.” [1] And there are wonderfully flat ways of describing binary trees. This article will cover one such way of representing as, and then constructing a binary tree from a pair of lists, without the need for additional structural information.

more ...