RE2 (software)

RE2 is a software library which implements a regular expression engine via finite-state machines using automata theory, in contrast to almost all other regular expression libraries, which use backtracking implementations. It provides a C++ interface.

RE2 was implemented and used by Google. The library uses an "on-the-fly" deterministic finite-state automaton algorithm based on Ken Thompson's Plan 9 grep.

Comparison to PCRE
RE2 generally compares to Perl Compatible Regular Expressions (PCRE) in performance. For certain regular expression operators like  (the operator for alternation or logical disjunction) it exceeds PCRE. On the other hand, unlike PCRE which supports features such as backreferences, RE2 is only able to recognize regular languages due to its construction using the Thompson DFA algorithm. It is also slightly slower than PCRE for parenthetic capturing operations.

PCRE can use a large recursive stack with corresponding high memory usage and result in exponential runtime on certain patterns. In contrast, RE2 uses a fixed stack size and guarantees that its runtime increases linearly (not exponentially) with the size of the input. The maximum memory allocated with RE2 is configurable.

RE2 has a slightly smaller set of features than PCRE, but has very predictable run-time and a maximum memory allotment. This can make it more suitable for use in server applications, which require boundaries on memory usage and computational time. PCRE, on the other hand supports more features such as lookarounds, backreferences and recursion, but has unpredictable runtime and memory usage which can grow unbounded.

Use in Google products
RE2 is used by Google products like Gmail, Google Documents and Google Sheets. See GitHub for a documentation of the syntax: RE2 syntax.

In Google Sheets, it is used in the functions RegexMatch, RegexReplace, RegexExtract and the find and replace feature. RegexExtract, does not use grouping.

Related libraries
The RE2 algorithm has been rewritten in Rust as the package "regex". CloudFlare's web application firewall uses this package because the RE2 algorithm is immune to ReDoS.

Russ Cox also wrote RE1, an earlier regular expression based on a bytecode interpreter. OpenResty uses a RE1 fork called "sregex".