Pumping lemma for context-free languages

In computer science, in particular in formal language theory, the pumping lemma for context-free languages, also known as the Bar-Hillel lemma, is a lemma that gives a property shared by all context-free languages and generalizes the pumping lemma for regular languages.

The pumping lemma can be used to construct a proof by contradiction that a specific language is not context-free. Conversely, the pumping lemma does not suffice to guarantee that a language is context-free; there are other necessary conditions, such as Ogden's lemma, or the Interchange lemma.

Formal statement
If a language $$L$$ is context-free, then there exists some integer $$p \geq 1$$ (called a "pumping length") such that every string $$s$$ in $$L$$ that has a length of $$p$$ or more symbols (i.e. with $$|s| \geq p$$) can be written as


 * $$s = uvwxy$$

with substrings $$u, v, w, x$$ and $$y$$, such that


 * 1. $$|vx| \geq 1$$,
 * 2. $$|vwx| \leq p$$, and
 * 3. $$uv^n wx^n y \in L$$ for all $$n \geq 0$$.

Below is a formal expression of the Pumping Lemma.

$$ \begin{array}{l} (\forall L\subseteq \Sigma^*)  \\ \quad     (\mbox{context free}(L) \Rightarrow \\                                                                                                           \quad      ((\exists p\geq 1) ((\forall s\in L) ((|s|\geq p) \Rightarrow \\ \quad ((\exists u,v,w,x,y \in \Sigma^*) (s=uvwxy \land |vx|\geq 1 \land |vwx|\leq p \land (\forall n \geq 0) (uv^nwx^ny\in L))))))) \end{array} $$

Informal statement and explanation
The pumping lemma for context-free languages (called just "the pumping lemma" for the rest of this article) describes a property that all context-free languages are guaranteed to have.

The property is a property of all strings in the language that are of length at least $$p$$, where $$p$$ is a constant—called the pumping length—that varies between context-free languages.

Say $$s$$ is a string of length at least $$p$$ that is in the language.

The pumping lemma states that $$s$$ can be split into five substrings, $$s = uvwxy$$, where $$vx$$ is non-empty and the length of $$vwx$$ is at most $$p$$, such that repeating $$v$$ and $$x$$ the same number of times ($$n$$) in $$s$$ produces a string that is still in the language. It is often useful to repeat zero times, which removes $$v$$ and $$x$$ from the string. This process of "pumping up" $$s$$ with additional copies of $$v$$ and $$x$$ is what gives the pumping lemma its name.

Finite languages (which are regular and hence context-free) obey the pumping lemma trivially by having $$p$$ equal to the maximum string length in $$L$$ plus one. As there are no strings of this length the pumping lemma is not violated.

Usage of the lemma
The pumping lemma is often used to prove that a given language $L$ is non-context-free, by showing that arbitrarily long strings $s$ are in $L$ that cannot be "pumped" without producing strings outside $L$.

For example, if $$S\subset \N$$ is infinite but does not contain an (infinite) arithmetic progression, then $$L = \{a^{n} : n\in S\}$$ is not context-free. In particular, neither the prime numbers nor the square numbers are context-free.

For example, the language $$L = \{ a^n b^n c^n | n > 0 \}$$ can be shown to be non-context-free by using the pumping lemma in a proof by contradiction. First, assume that $L$ is context free. By the pumping lemma, there exists an integer $p$ which is the pumping length of language $L$. Consider the string $$s = a^p b^p c^p$$ in $L$. The pumping lemma tells us that $s$ can be written in the form $$s = uvwxy$$, where $u, v, w, x$, and $y$ are substrings, such that $$|vx| \geq 1$$, $$|vwx| \leq p$$, and $$uv^i wx^i y \in L$$ for every integer $$i \geq 0$$. By the choice of $s$ and the fact that $$|vwx| \leq p$$, it is easily seen that the substring $vwx$ can contain no more than two distinct symbols. That is, we have one of five possibilities for $vwx$:


 * 1) $$vwx = a^j$$ for some $$j \leq p$$.
 * 2) $$vwx = a^j b^k$$ for some $j$ and $k$ with $$j+k \leq p$$
 * 3) $$vwx = b^j$$ for some $$j \leq p$$.
 * 4) $$vwx = b^j c^k$$ for some $j$ and $k$ with $$j+k \leq p$$.
 * 5) $$vwx = c^j$$ for some $$j \leq p$$.

For each case, it is easily verified that $$uv^i wx^i y$$ does not contain equal numbers of each letter for any $$i \neq 1$$. Thus, $$uv^2 wx^2 y$$ does not have the form $$a^i b^i c^i$$. This contradicts the definition of $L$. Therefore, our initial assumption that $L$ is context free must be false.

In 1960, Scheinberg proved that $$L = \{ a^n b^n a^n | n > 0 \}$$ is not context-free using a precursor of the pumping lemma.

While the pumping lemma is often a useful tool to prove that a given language is not context-free, it does not give a complete characterization of the context-free languages. If a language does not satisfy the condition given by the pumping lemma, we have established that it is not context-free. On the other hand, there are languages that are not context-free, but still satisfy the condition given by the pumping lemma, for example
 * $$L = \{ b^j c^k d^l | j, k, l \in \mathbb{N} \} \cup \{ a^i b^j c^j d^j | i, j \in \mathbb{N}, i \ge 1\}$$

for $s=b^{j}c^{k}d^{l}$ with e.g. j≥1 choose $vwx$ to consist only of bs, for $s=a^{i}b^{j}c^{j}d^{j}$ choose $vwx$ to consist only of as; in both cases all pumped strings are still in L.