binary trees - Variable Scope

Constructing binary trees using relative branch encoding

In this third article in what is rapidly becoming a series on binary trees, we’ll have a look at another way of constructing generic binary trees from a serialised format. For this, we’ll build on some of the techniques and insights from the previous articles:

Reconstructing binary trees from traversals: the initial post which dealt with constructing binary trees from pre-order and in-order sequences
Creating a forest from a single seed followed up by creating different trees conforming to the same pre-order sequence.

In the first article we covered how a generic (unordered) binary tree cannot be constructed from a single depth-first search sequentialisation. Lacking structural information, a second sequentialisation is required, in-order with one or pre or post-order. This is all well and good when the values in the tree are small (measured in bytes), but as they get larger, there is a considerable overhead in this duplication of values. Compression will certainly help, but is unlikely to remove it entirely.

The second article introduced the notion of attachment loci of a tree under construction: the possible locations where the next node/value from a pre-order sequence could be placed. We only needed them for constructing tree permutations at the time, but with a few tweaks, we should be able to use them to provide the structural information to supplement the single sequentialisation.

Structured depth-first search

Let’s start with that last idea. We take the basic recursive depth-first search algorithm and extend the return value to include the attachment point of the node value returned. We’ll assign the furthest-right locus 0 and number up from there. This means adding a 1 for every time we recurse and explore a left branch.

The generator below yields tuples with an absolute branch location and the value at that location, in pre-order sequence:

def dfs_structural(node: Node, branch_id: int = 0) -> Iterator[Tuple[int, Any]]:
    if node is not None:
        yield branch_id, node.value
        yield from dfs_structural(node.left, branch_id=branch_id + 1)
        yield from dfs_structural(node.right, branch_id=branch_id)

more ...

Creating a forest from a single seed

In the previous post we explored recreating a binary tree from a pair of sequentialisations. We needed a pair of them because any single sequence by itself doesn’t uniquely describe the tree, without some additional bit of information, the sequence itself leaves a certain level of ambiguity.

But exactly how ambiguous is a single traversal result? How many different trees can we make that fit a given sequence in isolation? What sort of structure is there in them? Fun questions we can answer with code!

Seeing the trees

At this point, before we attempt to create our own forest of binary trees, it’s a good time to look into visualising the trees we plan on making. At its core, a binary tree is a specific type of graph, and there are a ton of tools out there to visualise graphs. One of the more popular open source solutions is the excellent GraphViz. There are various Python packges that provide an interface for it, with pros and cons to all of them, a review of which is well outside the scope of this post. So in short, we’ll be using PyDot, which creates graphs in GraphViz’ Dot format, which we can then have rendered to various image formats.

more ...

Reconstructing binary trees from traversals

Between work, play and other side projects, I’ve been playing around with binary trees. They’re a hugely useful data structure in many fields, but more relevantly in this case, they lend themselves to all sorts of noodling and tinkering. Among the many possible questions is the seemingly simple “how do you serialise a tree?” for inter-process communication, a web-based API or just for the hell of it. And given such a serialisation, how do you reconstruct the original tree from it?

One way is to express each node as a list of three elements: the value of the node, its left child and its right child. Each of these children is its own three-element list (or some form of empty value) and on and on it goes. The end of this serialisation will have a particular signature: ]]]]]]]. Something we have seen before.

I have nothing against brackets per se, but as some have said: “flat is better than nested.” [1] And there are wonderfully flat ways of describing binary trees. This article will cover one such way of representing as, and then constructing a binary tree from a pair of lists, without the need for additional structural information.

Depth-first search

But before we dive into the construction of trees proper, we have to briefly cover how to serialise one. A favourite among interviewers, and elegant as a recursive algorithm, depth-first search is the easiest way of traversing all the nodes in a tree, and extracting their values along the way.

A simple binary search tree

Traversal starts at the root node (4), and recursively visits the left children before the right. Small differences in the order of yielding current and child node values lead to distinctly different results.

Starting from the root of the tree, the algorithm visits each node by first visiting the left child and recursing there. After that has completed, the right child is visited, again recursing there. This corecursion creates a path that travels down along a left edge, methodically jumps back to the closest unexplored right branch, and repeats that process until all nodes have been covered.

more ...