Greibach's theorem

In theoretical computer science, in particular in formal language theory, Greibach's theorem states that certain properties of formal language classes are undecidable. It is named after the computer scientist Sheila Greibach, who first proved it in 1963.

Definitions
Given a set Σ, often called "alphabet", the (infinite) set of all strings built from members of Σ is denoted by Σ*. A formal language is a subset of Σ*. If L1 and L2 are formal languages, their product L1L2 is defined as the set { w1w2 : w1 ∈ L1, w2 ∈ L2 } of all concatenations of a string w1 from L1 with a string w2 from L2. If L is a formal language and a is a symbol from Σ, their quotient L/a is defined as the set { w : wa ∈ L } of all strings that can be made members of L by appending an a. Various approaches are known from formal language theory to denote a formal language by a finite description, such as a formal grammar or a finite-state machine.

For example, using an alphabet Σ = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 }, the set Σ* consists of all (decimal representations of) natural numbers, with leading zeroes allowed, and the empty string, denoted as ε. The set Ldiv3 of all naturals divisible by 3 is an infinite formal language over Σ; it can be finitely described by the following regular grammar with start symbol S0:

Examples for finite languages are {ε,1,2} and {0,2,4,6,8}; their product {ε,1,2}{0,2,4,6,8} yields the even numbers up to 28. The quotient of the set of prime numbers up to 100 by the symbol 7, 4, and 2 yields the language {ε,1,3,4,6,9}, {}, and {ε}, respectively.
 * S0 →
 * | ε  |
 * 0 S0
 * | | 1 S2
 * | | 2 S1
 * | | 3 S0
 * | | 4 S2
 * | | 5 S1
 * | | 6 S0
 * | | 7 S2
 * | | 8 S1
 * | | 9 S0
 * S1 →
 * 0 S1
 * | | 1 S0
 * | | 2 S2
 * | | 3 S1
 * | | 4 S0
 * | | 5 S2
 * | | 6 S1
 * | | 7 S0
 * | | 8 S2
 * | | 9 S1
 * S2 →
 * 0 S2
 * | | 1 S1
 * | | 2 S0
 * | | 3 S2
 * | | 4 S1
 * | | 5 S0
 * | | 6 S2
 * | | 7 S1
 * | | 8 S0
 * | | 9 S2
 * }
 * | | 6 S2
 * | | 7 S1
 * | | 8 S0
 * | | 9 S2
 * }

Formal statement of the theorem
Greibach's theorem is independent of a particular approach to describe a formal language. It just considers a set C of formal languages over an alphabet Σ∪{#} such that
 * each language in C has a finite description,
 * each regular language over Σ∪{#} is in C,
 * given descriptions of languages L1, L2 ∈ C and of a regular language R ∈ C, a description of the products L1R and RL1, and of the union L1∪L2 can be effectively computed, and
 * it is undecidable for any member language L ∈ C with L ⊆ Σ* whether L = Σ*.

Let P be any nontrivial subset of C that contains all regular sets over Σ∪{#} and is closed under quotient by each single symbol in Σ∪{#}. Then the question whether L ∈ P for a given description of a language L ∈ C is undecidable.

Proof
Let M ⊆ Σ*, such that M ∈ C, but M ∉ P. For any L ∈ C with L ⊆ Σ*, define φ(L) = (M#Σ*) ∪ (Σ*#L). From a description of L, a description of φ(L) can be effectively computed.

Then L = Σ* if and only if φ(L) ∈ P:
 * If L = Σ*, then φ(L) = Σ*#Σ* is a regular language, and hence in P.
 * Else, some w ∈ Σ* \ L exists, and the quotient φ(L)/(#w) equals M. Therefore, by repeated application of the quotient-closure property, φ(L) ∈ P would imply M = φ(L)/(#w) ∈ P, contradicting the definition of M.

Hence, if membership in P would be decidable for φ(L) from its description, so would be L’s equality to Σ* from its description, which contradicts the definition of C.

Applications
Using Greibach's theorem, it can be shown that the following problems are undecidable:
 * Given a context-free grammar, does it describe a regular language?
 * Proof: The class of context-free languages, and the set of regular languages, satisfies the above properties of C, and P, respectively.


 * Given a context-free language, is it inherently ambiguous?
 * Proof: The class of context-free languages, and the set of context-free languages that aren't inherently ambiguous, satisfies the above properties of C, and P, respectively.

See also Context-free grammar.
 * Given a context-sensitive grammar, does it describe a context-free language?