# 5.6: Optimal Spanning Trees - Mathematics

In some applications, a graph (G) is augmented by associating a weight or cost with each edge; such a graph is called a weighted graph. In such cases, instead of being interested in just any spanning tree, we may be interested in a least cost spanning tree, that is, a spanning tree such that the sum of the costs of the edges of the tree is as small as possible. For example, this would be the least expensive way to connect a set of towns by a communication network, burying the cable in such a way as to minimize the total cost of laying the cable.

This problem is one that can be solved by a greedy algorithm. Roughly speaking, a greedy algorithm is one that makes choices that are optimal in the short run. Typically this strategy does not result in an optimal solution in the long run, but in this case this approach works.

Definition: weighted graph

A weighted graph is a graph (G) together with a cost function (ccolon E(G) o R^{>0}). If (H) is a subgraph of (G), the cost of (H) is (c(H)=sum_{ein E(H)} c(e)).

The Jarník Algorithm

Given a weighted connected graph (G), we construct a minimum cost spanning tree (T) as follows. Choose any vertex (v_0) in (G) and include it in (T). If vertices (S={v_0, v_1,ldots,v_k}) have been chosen, choose an edge with one endpoint in (S) and one endpoint not in (S) and with smallest weight among all such edges. Let (v_{k+1}) be the endpoint of this edge not in (S), and add it and the associated edge to (T). Continue until all vertices of (G) are in (T).

This algorithm was discovered by Vojtěch Jarník in 1930, and rediscovered independently by Robert C. Prim in 1957 and Edsger Dijkstra in 1959. It is often called Prim's Algorithm. The algorithm proceeds by constructing a sequence of trees (T_1, T_2,ldots,T_{n-1}), with (T_{n-1}) a spanning tree for (G). At each step, the algorithm adds an edge that will make (c(T_{i+1})) as small as possible among all trees that consist of (T_i) plus one edge. This is the best choice in the short run, but it is not obvious that in the long run, that is, by the time (T_{n-1}) is constructed, that this will turn out to have been the best choice.

Theorem 5.6.2

The Jarník Algorithm produces a minimum cost spanning tree.

Proof

Suppose (G) is connected on (n) vertices. Let (T) be the spanning tree produced by the algorithm, and (T_m) a minimum cost spanning tree. We prove that (c(T)=c(T_m)).

Let (e_1, e_2,ldots,e_{n-1}) be the edges of (T) in the order in which they were added to (T); one endpoint of (e_i) is (v_i), the other is in ({v_0,ldots,v_{i-1}}). We form a sequence of trees (T_m=T_0, T_1,ldots, T_{n-1}=T) such that for each (i), (c(T_i)=c(T_{i+1})), and we conclude that (c(T_m)=c(T)).

If (e_1) is in (T_0), let (T_1=T_0). Otherwise, add edge (e_1) to (T_0). This creates a cycle containing (e_1) and another edge incident at (v_0), say (f_1). Remove (f_1) to form (T_1). Since the algorithm added edge (e_1), (c(e_1)le c(f_1)). If (c(e_1)< c(f_1)), then (c(T_1)< c(T_0)=c(T_m)), a contradiction, so (c(e_1)=c(f_1)) and (c(T_1)=c(T_0)).

Suppose we have constructed tree (T_i). If (e_{i+1}) is in (T_i), let (T_{i+1}=T_i). Otherwise, add edge (e_{i+1}) to (T_i). This creates a cycle, one of whose edges, call it (f_{i+1}), is not in (e_1, e_2,ldots,e_i) and has exactly one endpoint in ({v_0,ldots,v_i}). Remove (f_{i+1}) to create (T_{i+1}). Since the algorithm added (e_{i+1}), (c(e_{i+1})le c(f_{i+1})). If (c(e_{i+1})< c(f_{i+1})), then (c(T_{i+1})< c(T_i)=c(T_m)), a contradiction, so (c(e_{i+1})=c(f_{i+1})) and (c(T_{i+1})=c(T_i)).

(square)

## Minimum spanning tree

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

There are many use cases for minimum spanning trees. One example is a telecommunications company trying to lay cable in a new neighborhood. If it is constrained to bury the cable only along certain paths (e.g. roads), then there would be a graph containing the points (e.g. houses) connected by those paths. Some of the paths might be more expensive, because they are longer, or require the cable to be buried deeper these paths would be represented by edges with larger weights. Currency is an acceptable unit for edge weight – there is no requirement for edge lengths to obey normal rules of geometry such as the triangle inequality. A spanning tree for that graph would be a subset of those paths that has no cycles but still connects every house there might be several spanning trees possible. A minimum spanning tree would be one with the lowest total cost, representing the least expensive path for laying the cable.

## Contents

Several pathfinding algorithms, including Dijkstra's algorithm and the A* search algorithm, internally build a spanning tree as an intermediate step in solving the problem.

In order to minimize the cost of power networks, wiring connections, piping, automatic speech recognition, etc., people often use algorithms that gradually build a spanning tree (or many such trees) as intermediate steps in the process of finding the minimum spanning tree. 

The Internet and many other telecommunications networks have transmission links that connect nodes together in a mesh topology that includes some loops. In order to avoid bridge loops and routing loops, many routing protocols designed for such networks—including the Spanning Tree Protocol, Open Shortest Path First, Link-state routing protocol, Augmented tree-based routing, etc.—require each router to remember a spanning tree.

A special kind of spanning tree, the Xuong tree, is used in topological graph theory to find graph embeddings with maximum genus. A Xuong tree is a spanning tree such that, in the remaining graph, the number of connected components with an odd number of edges is as small as possible. A Xuong tree and an associated maximum-genus embedding can be found in polynomial time. 

A tree is a connected undirected graph with no cycles. It is a spanning tree of a graph G if it spans G (that is, it includes every vertex of G) and is a subgraph of G (every edge in the tree belongs to G). A spanning tree of a connected graph G can also be defined as a maximal set of edges of G that contains no cycle, or as a minimal set of edges that connect all vertices.

### Fundamental cycles Edit

Adding just one edge to a spanning tree will create a cycle such a cycle is called a fundamental cycle. There is a distinct fundamental cycle for each edge not in the spanning tree thus, there is a one-to-one correspondence between fundamental cycles and edges not in the spanning tree. For a connected graph with V vertices, any spanning tree will have V − 1 edges, and thus, a graph of E edges and one of its spanning trees will have EV + 1 fundamental cycles (The number of edges subtracted by number of edges included in a spanning tree giving the number of edges not included in the spanning tree). For any given spanning tree the set of all EV + 1 fundamental cycles forms a cycle basis, a basis for the cycle space. 

### Fundamental cutsets Edit

Dual to the notion of a fundamental cycle is the notion of a fundamental cutset. By deleting just one edge of the spanning tree, the vertices are partitioned into two disjoint sets. The fundamental cutset is defined as the set of edges that must be removed from the graph G to accomplish the same partition. Thus, each spanning tree defines a set of V − 1 fundamental cutsets, one for each edge of the spanning tree. 

The duality between fundamental cutsets and fundamental cycles is established by noting that cycle edges not in the spanning tree can only appear in the cutsets of the other edges in the cycle and vice versa: edges in a cutset can only appear in those cycles containing the edge corresponding to the cutset. This duality can also be expressed using the theory of matroids, according to which a spanning tree is a base of the graphic matroid, a fundamental cycle is the unique circuit within the set formed by adding one element to the base, and fundamental cutsets are defined in the same way from the dual matroid. 

### Spanning forests Edit

In graphs that are not connected, there can be no spanning tree, and one must consider spanning forests instead. Here there are two competing definitions:

• Some authors consider a spanning forest to be a maximal acyclic subgraph of the given graph, or equivalently a graph consisting of a spanning tree in each connected component of the graph. 
• For other authors, a spanning forest is a forest that spans all of the vertices, meaning only that each vertex of the graph is a vertex in the forest. For this definition, even a connected graph may have a disconnected spanning forest, such as the forest in which each vertex forms a single-vertex tree. 

To avoid confusion between these two definitions, Gross & Yellen (2005) suggest the term "full spanning forest" for a spanning forest with the same connectivity as the given graph, while Bondy & Murty (2008) instead call this kind of forest a "maximal spanning forest". 

The number t(G) of spanning trees of a connected graph is a well-studied invariant.

### In specific graphs Edit

In some cases, it is easy to calculate t(G) directly:

• If G is itself a tree, then t(G) = 1 .
• When G is the cycle graphCn with n vertices, then t(G) = n .
• For a complete graph with n vertices, Cayley's formula gives the number of spanning trees as nn − 2 .
• If G is the complete bipartite graph K p , q > ,  then t ( G ) = p q − 1 q p − 1 q^> .
• For the n-dimensional hypercube graph Q n > ,  the number of spanning trees is t ( G ) = 2 2 n − n − 1 ∏ k = 2 n k ( n k ) -n-1>prod _^k^> .

### In arbitrary graphs Edit

More generally, for any graph G, the number t(G) can be calculated in polynomial time as the determinant of a matrix derived from the graph, using Kirchhoff's matrix-tree theorem. 

Specifically, to compute t(G), one constructs the Laplacian matrix of the graph, a square matrix in which the rows and columns are both indexed by the vertices of G. The entry in row i and column j is one of three values:

• The degree of vertex i, if i = j,
• −1, if vertices i and j are adjacent, or
• 0, if vertices i and j are different from each other but not adjacent.

The resulting matrix is singular, so its determinant is zero. However, deleting the row and column for an arbitrarily chosen vertex leads to a smaller matrix whose determinant is exactly t(G).

### Deletion-contraction Edit

If G is a graph or multigraph and e is an arbitrary edge of G, then the number t(G) of spanning trees of G satisfies the deletion-contraction recurrence t(G) = t(Ge) + t(G/e), where Ge is the multigraph obtained by deleting e and G/e is the contraction of G by e.  The term t(Ge) in this formula counts the spanning trees of G that do not use edge e, and the term t(G/e) counts the spanning trees of G that use e.

In this formula, if the given graph G is a multigraph, or if a contraction causes two vertices to be connected to each other by multiple edges, then the redundant edges should not be removed, as that would lead to the wrong total. For instance a bond graph connecting two vertices by k edges has k different spanning trees, each consisting of a single one of these edges.

### Tutte polynomial Edit

The Tutte polynomial of a graph can be defined as a sum, over the spanning trees of the graph, of terms computed from the "internal activity" and "external activity" of the tree. Its value at the arguments (1,1) is the number of spanning trees or, in a disconnected graph, the number of maximal spanning forests. 

The Tutte polynomial can also be computed using a deletion-contraction recurrence, but its computational complexity is high: for many values of its arguments, computing it exactly is #P-complete, and it is also hard to approximate with a guaranteed approximation ratio. The point (1,1), at which it can be evaluated using Kirchhoff's theorem, is one of the few exceptions. 

### Construction Edit

A single spanning tree of a graph can be found in linear time by either depth-first search or breadth-first search. Both of these algorithms explore the given graph, starting from an arbitrary vertex v, by looping through the neighbors of the vertices they discover and adding each unexplored neighbor to a data structure to be explored later. They differ in whether this data structure is a stack (in the case of depth-first search) or a queue (in the case of breadth-first search). In either case, one can form a spanning tree by connecting each vertex, other than the root vertex v, to the vertex from which it was discovered. This tree is known as a depth-first search tree or a breadth-first search tree according to the graph exploration algorithm used to construct it.  Depth-first search trees are a special case of a class of spanning trees called Trémaux trees, named after the 19th-century discoverer of depth-first search. 

Spanning trees are important in parallel and distributed computing, as a way of maintaining communications between a set of processors see for instance the Spanning Tree Protocol used by OSI link layer devices or the Shout (protocol) for distributed computing. However, the depth-first and breadth-first methods for constructing spanning trees on sequential computers are not well suited for parallel and distributed computers.  Instead, researchers have devised several more specialized algorithms for finding spanning trees in these models of computation. 

### Optimization Edit

In certain fields of graph theory it is often useful to find a minimum spanning tree of a weighted graph. Other optimization problems on spanning trees have also been studied, including the maximum spanning tree, the minimum tree that spans at least k vertices, the spanning tree with the fewest edges per vertex, the spanning tree with the largest number of leaves, the spanning tree with the fewest leaves (closely related to the Hamiltonian path problem), the minimum diameter spanning tree, and the minimum dilation spanning tree.  

Optimal spanning tree problems have also been studied for finite sets of points in a geometric space such as the Euclidean plane. For such an input, a spanning tree is again a tree that has as its vertices the given points. The quality of the tree is measured in the same way as in a graph, using the Euclidean distance between pairs of points as the weight for each edge. Thus, for instance, a Euclidean minimum spanning tree is the same as a graph minimum spanning tree in a complete graph with Euclidean edge weights. However, it is not necessary to construct this graph in order to solve the optimization problem the Euclidean minimum spanning tree problem, for instance, can be solved more efficiently in O(n log n) time by constructing the Delaunay triangulation and then applying a linear time planar graph minimum spanning tree algorithm to the resulting triangulation. 

### Randomization Edit

A spanning tree chosen randomly from among all the spanning trees with equal probability is called a uniform spanning tree. Wilson's algorithm can be used to generate uniform spanning trees in polynomial time by a process of taking a random walk on the given graph and erasing the cycles created by this walk. 

An alternative model for generating spanning trees randomly but not uniformly is the random minimal spanning tree. In this model, the edges of the graph are assigned random weights and then the minimum spanning tree of the weighted graph is constructed. 

### Enumeration Edit

Because a graph may have exponentially many spanning trees, it is not possible to list them all in polynomial time. However, algorithms are known for listing all spanning trees in polynomial time per tree. 

Every finite connected graph has a spanning tree. However, for infinite connected graphs, the existence of spanning trees is equivalent to the axiom of choice. An infinite graph is connected if each pair of its vertices forms the pair of endpoints of a finite path. As with finite graphs, a tree is a connected graph with no finite cycles, and a spanning tree can be defined either as a maximal acyclic set of edges or as a tree that contains every vertex. 

The trees within a graph may be partially ordered by their subgraph relation, and any infinite chain in this partial order has an upper bound (the union of the trees in the chain). Zorn's lemma, one of many equivalent statements to the axiom of choice, requires that a partial order in which all chains are upper bounded have a maximal element in the partial order on the trees of the graph, this maximal element must be a spanning tree. Therefore, if Zorn's lemma is assumed, every infinite connected graph has a spanning tree. 

In the other direction, given a family of sets, it is possible to construct an infinite graph such that every spanning tree of the graph corresponds to a choice function of the family of sets. Therefore, if every infinite connected graph has a spanning tree, then the axiom of choice is true. 

The idea of a spanning tree can be generalized to directed multigraphs.  Given a vertex v on a directed multigraph G, an oriented spanning tree T rooted at v is an acyclic subgraph of G in which every vertex other than v has outdegree 1. This definition is only satisfied when the "branches" of T point towards v.

## Mathematical Statistical Physics

### 2.3 Group structure

Besides its relation to spanning trees , there are some more fascinating properties of the set ℛ. Consider N = 2 for the sake of (extreme) simplicity. Define the operation ⊕ on ℛ by

where the ordinary + means point-wise addition. This gives rise to the following table

We recognize here the Cayley table of an abelian group, i.e., (ℛ, ⊕) is an abelian group with neutral element 22. Remark that we can define ⊕ on the whole of Ω, but (Ω, ⊕) is not a group.

We now introduce still another group (which is isomorphic to the preceding one, as we will see later). Let us introduce the addition operator ai : Ω → Ω

for i ∈ <1, …, N>. In words, aiη is the stable result of an addition at site i. Accept (or verify) for the moment that for all i, j ∈ <1, …, N>,

Later we will prove this so-called abelian property in full detail and generality. By definition of recurrence, if a configuration η is recurrent then there exist integers ni > 0 such that

The product in ( 2.3 ) is well-defined by abelianness. The fact that ni can be chosen strictly positive derives from the fact that in the course of the Markov chain one adds to every site with strictly positive probability. Call e = ∏ i = 1 N a i n i and consider

By definition A is not empty (η ∈ A), and if g = ∏ i = 1 N a i m i for some integers mi ≥ 0, then we have the implication “ζ ∈ A implies gζ ∈ A”. Indeed, by abelianness, for ζ ∈ A,

Therefore, A is a “trapping set” for the Markov chain, i.e., a subset of configurations such that once the Markov chains enters it, it never leaves it. As a consequence A ⊃ ℛ, because the Markov chain has only one recurrent class which contains the maximal configuration. Since by definition we have A ⊆ ℛ, A = ℛ. Therefore, acting onℛ, e is neutral. Since ni > 0, we can define

From ( 2.4 ) we conclude that

acting on ℛ defines an abelian group.

Of course not all the products of addition operators defining G are different. In fact, it is easily seen that the group is finite, and we will show that once again

For that, it is sufficient to show that the group acts transitively and freely on ℛ, i.e., for all η ∈ ℛ the orbit Oη = <gη : g ∈ G> = ℛ and if = g′η for some g, g∈ G, then g = g′, i.e., = g′ζ for all ζ ∈ ℛ. For the first statement, if η ∈ ℛ and g ∈ G, then can be reached from η in the Markov chain, hence gη ∈ ℛ, and Oη is clearly a trapping set for the Markov chain, hence Oη ⊃ ℛ. To prove the second statement, consider for = g′η the set

then A = ℛ with the same kind of reasoning used in the definition of inverses. Therefore, for all η, the map

is a bijection between G and ℛ.

However, there is still another way to see that |G| = N = 1. This way of reasoning will be useful because in the general case we will not so easily be able to count the recurrent configurations. The equality |G| = |ℛ| is however completely general, and that will be useful to obtain |ℛ|. Counting the number of elements of a group can become an easy task if we find a treatable isomorphic group. For this, we have to look for closure relations in G. Here is an easy one. Suppose you add two files to some commissioner. Since he has at least one file (to save his face), he will certainly get crazy and give one file to each of his neighbors (modulo the boundary conditions of course). In symbols this means

Using the toppling matrix Δ introduced in ( 2.1 ), this is summarized as

for all iV. Acting on ℛ we can bring the right hand site to the left, and obtain

for all iV. By abelianness, we infer from ( 2.9 ) that for all n : V → ℤ

for all n : V → ℤ. We will show now that, conversely, if

for some mi ℤ then there exists n : V → ℤ such that

In words, this closure relation means that the only “trivial additions” on ℛ are (integer column) multiples of the matrix Δ.

where m ∈V . Write m = m + − m − where m + and m − are non-negative integer valued. The relation ( 2.12 ) applied to a recurrent configuration η yields

In words, addition of m + or m − to η leads to the same final stable configuration, say ζ. But then there exist k + , k − non-negative integer valued functions on V such that

Arrived at this point, we can invoke a well-known theorem of elementary algebra. If you have a group G and a group H and a homomorphism

then G is isomorphic to the quotient H/Ker(Ψ) where Ker(Ψ) is the set of h ∈ H which are mapped to the neutral element of G. In our case, define

with group operation pointwise addition. Next

Then what we just discussed can be summarized in the equality

and hence we have the isomorphism

To see the last equality, note that ℤ V is the |V| dimensional hypercubic lattice, with a volume one unit cell. Δℤ V is another lattice with the columns of Δ as vectors defining the unit cell. The quotient of these two lattices can geometrically be viewed as the non-equivalent points n ∈ ℤ V of the unit cell of the lattice Δℤ V . Equivalence is here defined as

if there exists k ∈ ℤ V such that

This number of non-equivalent points is precisely the volume of the unit cell of the lattice Δℤ V , which is det(Δ) (Puzzle this out in the case N = 2 to be convinced). In general, the equality |ℤ V /AV | = det(A) (with A a symmetric matrix with integer elements and non-negative determinant) is trivial for a diagonal matrix. Indeed, in that case Aij = aiiδij and

an hence | ℤ V / A ℤ V | = ∏ i = 1 n a i i = d e t ( A ) . Since by row and column operations (i.e., addition and subtraction of columns, or permutation of columns) one can make every integer-valued matrix diagonal, see e.g. [ 22 ], we just have to remark that such operations do not change the determinant of a matrix, and do not change (up to isomorphism) the lattice ℤ V /AV .

Here is still another, geometrical proof. |ℤ V /AV | is the number of non-equivalent points in the unit cell defined by A (i.e., the parallelepiped spanned by the rows of A). We can cover ℝ |V| by disjoint copies of this unit cell. Consider now a large cube Cn = [−n, n] |V| . Let Nn denote the number of integer points (i.e., points of ℤ V ) in the cube, let xn denote the number of unit cells (copies of A) in Cn, and let y denote the number of non-equivalent points in one unit cell. Then we have

Dividing these two relations and taking the limit n → ∞ gives