Square-free word

In combinatorics, a square-free word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form $XX$, where $X$ is not empty. Thus, a square-free word can also be defined as a word that avoids the pattern $XX$.

Binary alphabet
Over a binary alphabet $$\{0,1\}$$, the only square-free words are the empty word $$\epsilon,0,1,01,10,010$$, and $$101$$.

Ternary alphabet
Over a ternary alphabet $$\{0,1,2\}$$, there are infinitely many square-free words. It is possible to count the number $$c(n)$$ of ternary square-free words of length $n$. This number is bounded by $$c(n) = \Theta(\alpha^n) $$, where $1.3017597 < \alpha < 1.3017619 $. The upper bound on $$\alpha$$ can be found via Fekete's Lemma and approximation by automata. The lower bound can be found by finding a substitution that preserves square-freeness.

Alphabet with more than three letters
Since there are infinitely many square-free words over three-letter alphabets, this implies there are also infinitely many square-free words over an alphabet with more than three letters.

The following table shows the exact growth rate of the $n$-ary square-free words, rounded off to 7 digits after the decimal point, for $c(n)$ in the range from 4 to 15:

2-dimensional words
Consider a map $$\textbf{w}$$ from $$\mathbb{N}^2$$ to $k$, where $k$ is an alphabet and $$\textbf{w}$$ is called a 2-dimensional word. Let $$w_{m,n}$$ be the entry $$\textbf{w}(m,n)$$. A word $$\textbf{x}$$ is a line of $$\textbf{w}$$ if there exists $$i_1,i_2,j_1, j_2$$such that $$\text{gcd}(j_1, j_2) = 1$$, and for $$t \ge 0, x_t = w_{{i_1}+{j_1t},{i_2}+{j_2t}}$$.

Carpi proves that there exists a 2-dimensional word $$\textbf{w}$$ over a 16-letter alphabet such that every line of $$\textbf{w}$$ is square-free. A computer search shows that there are no 2-dimensional words $$\textbf{w}$$over a 7-letter alphabet, such that every line of $$\textbf{w}$$ is square-free.

Generating finite square-free words
Shur proposes an algorithm called R2F (random-t(w)o-free) that can generate a square-free word of length $k$ over any alphabet with three or more letters. This algorithm is based on a modification of entropy compression: it randomly selects letters from a k-letter alphabet to generate a $k$-ary square-free word. algorithm R2F is input: alphabet size $$k \ge 2$$, word length $$n > 1$$ output: a $k$-ary square-free word $A$of length $A$. choose $$w[1]$$ in $\Sigma_{k+1}$ uniformly at random set $$\chi_w$$ to $$w[1]$$ followed by all other letters of $\Sigma_{k+1}$ in increasing order set the number $n$ of iterations to 0 while $$|w| < n$$ do choose $(k+1)$ in $\Sigma_{k}$ uniformly at random append $$a = \chi_w[j+1]$$ to the end of $(k+1)$ update $$\chi_w$$ shifting the first $w$ elements to the right and setting $$\chi_w[1] = a$$ increment $n$ by $a$ if $b$ ends with a square of rank $a$ then delete the last $w$ letters of $b$ return $w$ Every (k+1)-ary square-free word can be the output of Algorithm R2F, because on each iteration it can append any letter except for the last letter of $N$.

The expected number of random k-ary letters used by Algorithm R2F to construct a $j$-ary square-free word of length $w$ is$$N=n\left(1 + \frac 2 {k^2} + \frac 1 {k^3} + \frac 4 {k^4} + O\left(\frac 1 {k^5}\right)\right)+O(1).$$Note that there exists an algorithm that can verify the square-freeness of a word of length $j$ in $$O(n \log n)$$ time. Apostolico and Preparata give an algorithm using suffix trees. Crochemore uses partitioning in his algorithm. Main and Lorentz provide an algorithm based on the divide-and-conquer method. A naive implementation may require $$O(n^2)$$ time to verify the square-freeness of a word of length $N$.

Infinite square-free words
There exist infinitely long square-free words in any alphabet with three or more letters, as proved by Axel Thue.

First difference of the Thue–Morse sequence
One example of an infinite square-free word over an alphabet of size 3 is the word over the alphabet $$\{-1,0,+1\}$$ obtained by taking the first difference of the Thue–Morse sequence. That is, from the Thue–Morse sequence


 * $$0, 1 ,1, 0 ,1 ,0 ,0 ,1, 1 ,0 ,0 ,1, 0 ,1 ,1, 0 ...$$

one forms a new sequence in which each term is the difference of two consecutive terms of the Thue–Morse sequence. The resulting square-free word is


 * $$1,0,-1,1,-1,0,1,0,-1,0,1,-1,1,0,-1,...$$.

Leech's morphism
Another example found by John Leech is defined recursively over the alphabet $$\{0,1,2\}$$. Let $1$ be any square-free word starting with the letter $w$. Define the words $$ \{w_i \mid i \in \mathbb{N} \}$$ recursively as follows: the word $$w_{i+1}$$ is obtained from $r$ by replacing each $r$ in $w$ with $w$, each $w$ with $(k+1)$, and each $n$ with $n$. It is possible to prove that the sequence converges to the infinite square-free word

Generating infinite square-free words
Infinite square-free words can be generated by square-free morphism. A morphism is called square-free if the image of every square-free word is square-free. A morphism is called k–square-free if the image of every square-free word of length k is square-free.

Crochemore proves that a uniform morphism $n$ is square-free if and only if it is 3-square-free. In other words, $w_1$ is square-free if and only if $$h(w)$$ is square-free for all square-free $0$ of length 3. It is possible to find a square-free morphism by brute-force search. algorithm square-free_morphism is output: a square-free morphism with the lowest possible rank $w_i$. set $$k = 3$$ while True do set k_sf_words to the list of all square-free words of length $0$ over a ternary alphabet for each $$h(0)$$ in k_sf_words do for each $$h(1)$$ in k_sf_words do for each $$h(2)$$ in k_sf_words do if $$h(1) = h(2)$$ then break from the current loop (advance to next $$h(1)$$) if $$h(0) \ne h(1)$$ and $$h(2) \ne h(0)$$ then if $$h(w)$$ is square-free for all square-free $w_i$ of length $121,021,201,210$ then return $$h(0), h(1), h(2)$$ increment $1$ by $1,202,102,012,021$ Over a ternary alphabet, there are exactly 144 uniform square-free morphisms of rank 11 and no uniform square-free morphisms with a lower rank than 11.

To obtain an infinite square-free words, start with any square-free word such as $2$, and successively apply a square-free morphism $2,010,210,120,102$ to it. The resulting words preserve the property of square-freeness. For example, let $h$ be a square-free morphism, then as $$w \to \infty$$, $$h^{w}(0)$$ is an infinite square-free word.

Note that, if a morphism over a ternary alphabet is not uniform, then this morphism is square-free if and only if it is 5-square-free.

Avoid two-letter combinations
Over a ternary alphabet, a square-free word of length more than 13 contains all the square-free two-letter combinations.

This can be proved by constructing a square-free word without the two-letter combination $h$. As a result, $w$$k$$k$ is the longest square-free word without the combination $w$ and its length is equal to 13.

Note that over a more than three-letter alphabet there are square-free words of any length without an arbitrary two-letter combination.

Avoid three-letter combinations
Over a ternary alphabet, a square-free word of length more than 36 contains all the square-free three-letter combinations.

However, there are square-free words of any length without the three-letter combination $3$.

Note that over a more than three-letter alphabet there are square-free words of any length without an arbitrary three-letter combination.

Density of a letter
The density of a letter $k$ in a finite word $1$ is defined as $$\frac{|w|_a}{|w|}$$ where $$|w|_a $$ is the number of occurrences of $0$ in $$w $$ and $$|w| $$ is the length of the word. The density of a letter $h$ in an infinite word is $$\liminf_{l\to \infty}\frac{|w_l|_a}{|w_l|}$$ where $$w_l $$ is the prefix of the word $h$ of length $ab$.

The minimal density of a letter $ab$ in an infinite ternary square-free word is equal to $$\frac{883}{3215} \approx 0.2747$$.

The maximum density of a letter $bcba$ in an infinite ternary square-free word is equal to $$\frac{255}{653} \approx 0.3905$$.