Linear grammar

In computer science, a linear grammar is a context-free grammar that has at most one nonterminal in the right-hand side of each of its productions.

A linear language is a language generated by some linear grammar.

Example
An example of a linear grammar is G with N = {S}, Σ = {a, b}, P with start symbol S and rules
 * S &rarr; aSb
 * S &rarr; &epsilon;

It generates the language $$\{ a^ib^i \mid i \geq 0\}$$.

Relationship with regular grammars
Two special types of linear grammars are the following: Each of these can describe exactly the regular languages. A regular grammar is a grammar that is left-linear or right-linear.
 * the left-linear or left-regular grammars, in which all rules are of the form A &rarr; &alpha;w where &alpha; is either empty or a single nonterminal and w is a string of terminals;
 * the right-linear or right-regular grammars, in which all rules are of the form A &rarr; w&alpha; where w is a string of terminals and &alpha; is either empty or a single nonterminal.

Observe that by inserting new nonterminals, any linear grammar can be replaced by an equivalent one where some of the rules are left-linear and some are right-linear. For instance, the rules of G above can be replaced with
 * S &rarr; aA
 * A &rarr; Sb
 * S &rarr; &epsilon;

However, the requirement that all rules be left-linear (or all rules be right-linear) leads to a strict decrease in the expressive power of linear grammars.

Expressive power
All regular languages are linear; conversely, an example of a linear, non-regular language is { $a^{n}b^{n}$ }. as explained above. All linear languages are context-free; conversely, an example of a context-free, non-linear language is the Dyck language of well-balanced bracket pairs. Hence, the regular languages are a proper subset of the linear languages, which in turn are a proper subset of the context-free languages.

While regular languages are deterministic, there exist linear languages that are nondeterministic. For example, the language of even-length palindromes on the alphabet of 0 and 1 has the linear grammar S → 0S0 | 1S1 | ε. An arbitrary string of this language cannot be parsed without reading all its letters first which means that a pushdown automaton has to try alternative state transitions to accommodate for the different possible lengths of a semi-parsed string. This language is nondeterministic. Since nondeterministic context-free languages cannot be accepted in linear time, linear languages cannot be accepted in linear time in the general case. Furthermore, it is undecidable whether a given context-free language is a linear context-free language.

A language is linear iff it can be generated by a one-turn pushdown automaton – a pushdown automaton that, once it starts popping, never pushes again.

Positive cases
Linear languages are closed under union. Construction is the same as the construction for the union of context-free languages. Let $$L_1, L_2$$ be two linear languages, then $$L_1\cup L_2$$ is constructed by a linear grammar with $$S \to S_1|S_2$$, and $$S_1, S_2$$ playing the role of the linear grammars for $$L_1, L_2$$.

If L is a linear language and M is a regular language, then the intersection $$L \cap M$$ is again a linear language; in other words, the linear languages are closed under intersection with regular sets.

Linear languages are closed under homomorphism and inverse homomorphism.

As a corollary, linear languages form a full trio. Full trios in general are language families that enjoy a couple of other desirable mathematical properties.

Negative cases
Linear languages are not closed under intersection. For example, let $$L_1 = \{a^nb^nc^m \mid n, m \geq 0\}, L_2 = \{a^nb^mc^m \mid n, m \geq 0\}$$, then their intersection is not only not linear, but also not context-free. See pumping lemma for context-free languages.

As a corollary, linear languages are not closed under complement (as intersection can be constructed by de Morgan's laws out of union and complement).