Program structure tree

A program structure tree (PST) is a hierarchical diagram that displays the nesting relationship of single-entry single-exit (SESE) fragments/regions, showing the organization of a computer program. Nodes in this tree represent SESE regions of the program, while edges represent nesting regions. The PST is defined for all control flow graphs.

Bibliographical notes
These notes list important works which fueled research on parsing of programs and/or (work)flow graphs (adapted from Section 3.5 in ).


 * The connectivity properties are the basic properties of graphs and are useful when testing whether a graph is planar or when determining if two graphs are isomorphic. John Hopcroft and Robert Endre Tarjan (1973) developed an optimal (to within a constant factor) algorithm for dividing a graph into triconnected components. The algorithm is based on the depth-first search of graphs and requires $$O(|V|+|E|)$$ time and space to examine a graph with $$|V|$$ vertices and $$|E|$$ edges.
 * Robert Endre Tarjan and Jacobo Valdes (1980) used triconnected components for structural analysis of biconnected flow graphs. The triconnected components of the undirected version of a flow graph are shown to be useful for discovering structural information of directed flow graphs. The triconnected components can be discovered efficiently and form a hierarchy of SESE fragments of a flow graph.
 * Giuseppe Di Battista and Roberto Tamassia (1990) introduced SPQR-trees - a data structure which represents decomposition of a biconnected graph with respect to its triconnected components. Essentially, SPQR-trees are the parse trees of Tarjan and Valdes. The authors showed the usefulness of SPQR-trees for various on-line graph algorithms, e.g., transitive closure, planarity testing, and minimum spanning tree. In particular, the authors proposed an efficient solution to the problem of on-line maintenance of the triconnected components of a graph.
 * Richard C. Johnson et al. (1994) proposed a program structure tree (PST), a hierarchical representation of program structure based on single edge entry and single edge exit regions. The PST can be computed in $$O(|E|)$$ time for an arbitrary flow graph, where $$E$$ is the set of edges in the graph. The disadvantage of the PST is that it exploits the notion of a SESE fragment based on edge entries and exits only. Thus, the PST does not capture those SESE fragments which are based on vertex entries and exits.
 * Carsten Gutwenger and Petra Mutzel (2001) shared their practical experience on linear time computation of the triconnected components of biconnected graphs. They have identified and corrected the faulty parts of the algorithm in and applied the resulting algorithm to the computation of SPQR-trees. The implementation is publicly available.


 * Chun Ouyang et al. (2006–2009) used parsing to translate BPMN diagrams into BPEL processes. The employed notion of a fragment is similar to the notion of a region in. However, the developed parsing algorithm is non-deterministic, i.e., the parse tree is not unique for a given diagram.
 * Jussi Vanhatalo et al. (2008–2009) introduced the Refined Process Structure Tree (RPST).  Given a workflow graph, the RPST is unique, modular, and is finer grained than any other known parse tree, i.e., it discovers more SESE fragments than any other technique. In fact, the RPST captures all canonical fragments of a workflow graph which, in turn, represent all SESE fragments of the graph. The RPST can be computed for an arbitrary program/workflow graph.
 * Artem Polyvyanyy, Jussi Vanhatalo, and Hagen Voelzer (2010) proposed a simplified algorithm for computation of the RPST. This simplified algorithm can be employed in a straightforward way as a subroutine for computation of the RPST of an arbitrary program/workflow graph. Both algorithms, the original and the simplified one, allow for an efficient computation of the RPST. However, they provide different structural characterizations of canonical SESE fragments.