Caterpillar tree



In graph theory, a caterpillar or caterpillar tree is a tree in which all the vertices are within distance 1 of a central path.

Caterpillars were first studied in a series of papers by Harary and Schwenk. The name was suggested by Arthur Hobbs. As colorfully write, "A caterpillar is a tree which metamorphoses into a path when its cocoon of endpoints is removed."

Equivalent characterizations
The following characterizations all describe the caterpillar trees:
 * They are the trees for which removing the leaves and incident edges produces a path graph.
 * They are the trees in which there exists a path that contains every vertex of degree two or more.
 * They are the trees in which every vertex of degree at least three has at most two non-leaf neighbors.
 * They are the trees that do not contain as a subgraph the graph formed by replacing every edge in the star graph K1,3 by a path of length two.
 * They are the connected graphs that can be drawn with their vertices on two parallel lines, with edges represented as non-crossing line segments that have one endpoint on each line.
 * They are the trees whose square is a Hamiltonian graph. That is, in a caterpillar, there exists a cyclic sequence of all the vertices in which each adjacent pair of vertices in the sequence is at distance one or two from each other, and trees that are not caterpillars do not have such a sequence. A cycle of this type may be obtained by drawing the caterpillar on two parallel lines and concatenating the sequence of vertices on one line with the reverse of the sequence on the other line.
 * They are the trees whose line graphs contain a Hamiltonian path; such a path may be obtained by the ordering of the edges in a two-line drawing of the tree. More generally the number of edges that need to be added to the line graph of an arbitrary tree so that it contains a Hamiltonian path (the size of its Hamiltonian completion) equals the minimum number of edge-disjoint caterpillars that the edges of the tree can be decomposed into.
 * They are the connected graphs of pathwidth one.
 * They are the connected triangle-free interval graphs.
 * They are n-vertex graphs whose adjacency matrices can be written in such a way that the ones of the upper triangular part form a path of length n-1 beginning at the upper right corner and going down or left.

Generalizations
A k-tree is a chordal graph with exactly n &minus; k maximal cliques, each containing k + 1 vertices; in a k-tree that is not itself a (k + 1)-clique, each maximal clique either separates the graph into two or more components, or it contains a single leaf vertex, a vertex that belongs to only a single maximal clique. A k-path is a k-tree with at most two leaves, and a k-caterpillar is a k-tree that can be partitioned into a k-path and some k-leaves, each adjacent to a separator k-clique of the k-path. In this terminology, a 1-caterpillar is the same thing as a caterpillar tree, and k-caterpillars are the edge-maximal graphs with pathwidth k.

A lobster graph is a tree in which all the vertices are within distance 2 of a central path.

Enumeration
Caterpillars provide one of the rare graph enumeration problems for which a precise formula can be given: when n ≥ 3, the number of caterpillars with n unlabeled vertices is
 * $$2^{n-4}+2^{\lfloor (n-4)/2\rfloor}.$$

For n = 1, 2, 3, ... the numbers of n-vertex caterpillars are
 * 1, 1, 1, 2, 3, 6, 10, 20, 36, 72, 136, 272, 528, 1056, 2080, 4160, ....

Computational complexity
Finding a spanning caterpillar in a graph is NP-complete. A related optimization problem is the Minimum Spanning Caterpillar Problem (MSCP), where a graph has dual costs over its edges and the goal is to find a caterpillar tree that spans the input graph and has the smallest overall cost. Here the cost of the caterpillar is defined as the sum of the costs of its edges, where each edge takes one of the two costs based on its role as a leaf edge or an internal one. There is no f(n)-approximation algorithm for the MSCP unless P = NP. Here f(n) is any polynomial-time computable function of n, the number of vertices of a graph.

There is a parametrized algorithm that finds an optimal solution for the MSCP in bounded treewidth graphs. So both the Spanning Caterpillar Problem and the MSCP have linear time algorithms if a graph is an outerplanar, a series-parallel, or a Halin graph.

Applications
Caterpillar trees have been used in chemical graph theory to represent the structure of benzenoid hydrocarbon molecules. In this representation, one forms a caterpillar in which each edge corresponds to a 6-carbon ring in the molecular structure, and two edges are incident at a vertex whenever the corresponding rings belong to a sequence of rings connected end-to-end in the structure. writes, "It is amazing that nearly all graphs that played an important role in what is now called "chemical graph theory" may be related to caterpillar trees." In this context, caterpillar trees are also known as benzenoid trees and Gutman trees, after the work of Ivan Gutman in this area.