Regular path query

In databases and specifically in graph databases, a regular path query or RPQ is a query asking for pairs of endpoints in the database that are connected by a path satisfying a certain regular expression. A similar feature exists in the SPARQL query language as "property paths".

Definition
A graph database consists of a directed graph whose edges carry a label. A regular path query is just a regular expression over the set of labels. For instance, in a graph database where vertices represent users and there is an edge label "parent" for edges from a parent to a child, the regular path query $$\text{parent} \text{parent}^*$$ would select pairs of a node x and a descendant y of x, with a path from x to y of "parent" edges having length 1 or more.

Semantics
The answers to RPQs can consist of endpoint pairs, i.e., pairs of nodes x and y that are connected by some path satisfying the regular expression; or it can consist of the list of all paths satisfying the regular expression. However, this set of paths is generally infinite.

To ensure that the number of results is not infinite, the semantics of RPQs is sometimes defined to return only the simple paths, i.e., the paths that do not go twice via the same vertex; or the trails, i.e., the paths that do not go twice through the same edge.

Complexity
The evaluation of regular path queries (RPQ), in the sense of returning all endpoint pairs, can be performed in polynomial time. To do this, for every endpoint pair, we can see the graph database as a finite automaton, also represent the regular path query as a finite automaton, and check if a suitable path exists by checking that the intersection of both languages is nonempty (i.e., solving the emptiness problem), for instance via the product automaton construction.

Other problems
Several classical problems about queries have been studied for regular path queries, such as query containment and query rewriting.

Extensions
Database theory research has investigated more expressive variants of RPQs:


 * Two-way RPQs aka 2RPQs, which can also traverse edges in the reverse direction. More precisely, a 2RPQ is a regular expression that uses the labels of the graph together with labels corresponding to reverse edges. For instance, the RPQ $$\text{parent}^- \text{parent}$$ selects pairs of nodes x and y with a path from x to y going first backward on a parent edge, then forward on a parent edge, i.e., x and y are siblings.
 * Conjunctive regular path queries aka CRPQ, which are conjunctive queries whose atoms are RPQs. Such queries make it possible to test for more complex patterns than just paths, however they are intractable to evaluate.
 * A further extension allowing both disjunctions (like union of conjunctive queries) and two-way expressions are UC2RPQs.