User:Songyunc/square-free word

In combinatorics, a squarefree word is a word (a sequence of symbols) that does not contain any squares. A square is a word of the form $$XX$$, where $$X$$ is not empty. Thus, a squarefree word can also be defined as a word that avoids the pattern $$XX$$.

Binary alphabet
Over a binary alphabet $$\{0,1\}$$, the only squarefree words are the empty word $$\epsilon,0,1,01,10,010$$ and $$101$$.

Ternary alphabet
Over a ternary alphabet $$\{0,1,2\}$$, there are infinitely many squarefree words. It is possible to count the number $$c(n)$$ of ternary squarefree words of length $$n$$. This number is bounded by $$c(n) = \Theta(\alpha^n) $$, where $1.3017597 < \alpha < 1.3017619 $. The upper bound on $$\alpha$$ can be found via Fekete's Lemma and approximation by automata. The lower bound can be found by finding a substitution that preserves squarefreeness.

Alphabet with more than three letters
Since there are infinitely many squarefree words over three-letter alphabets, this implies there are also infinitely many squarefree words over an alphabet with more than three letters.

The following table shows the exact growth rate of the k-ary squarefree words:

2-dimensional words
Consider a map $$\textbf{w}$$ from $$\mathbb{N}^2$$ to $$A$$, where $$A$$ is an alphabet and $$\textbf{w}$$ is called a 2-dimensional word. Let $$w_{m,n}$$ be the entry $$\textbf{w}(m,n)$$. A word $$\textbf{x}$$ is a line of $$\textbf{w}$$ if there exists $$i_1,i_2,j_1, j_2$$such that $$\text{gcd}(j_1, j_2) = 1$$, and for $$t \ge 0, x_t = w_{{i_1}+{j_1t},{i_2}+{j_2t}}$$.

Carpi proves that there exists a 2-dimensional word $$\textbf{w}$$ over a 16-letter alphabet such that every line of $$\textbf{w}$$ is squarefree. A computer search shows that there are no 2-dimensional words $$\textbf{w}$$over a 7-letter alphabet, such that every line of $$\textbf{w}$$ is squarefree.

Generating finite squarefree words
Shur proposes an algorithm called R2F (random-t(w)o-free) that can generate a squarefree word of length $$n$$ over any alphabet with three or more letters. This algorithm is based on a modification of entropy compression: it randomly selects letters from a k-letter alphabet to generate a (k+1)-ary squarefree word. algorithm R2F is input: alphabet size $$k \ge 2$$, word length $$n > 1$$ output: a (k+1)-ary squarefree word $$w$$of length $$n$$. (Note that $\Sigma_{k+1}$ is the alphabet with letters $$\{1,...,k+1\}$$.) (For a word $$w \in \Sigma_k$$, $$\chi_w$$ is the permutation of $$\Sigma_k$$ such that $$a$$ precedes $$b$$ in $$\chi_w$$ if the      right most position of $$a$$ in $$w$$ is to the right of the rightmost position of $$b$$ in $$w$$.      For example, $$w=136263163\in \Sigma_6$$ has $$\chi_w=361245$$.) choose $$w[1]$$ in $\Sigma_{k+1}$ uniformly at random set $$\chi_w$$ to $$w[1]$$ followed by all other letters of $\Sigma_{k+1}$ in increasing order set the number $$N$$ of iterations to 0 while $$|w| < n$$ do choose $$j$$ in $\Sigma_{k}$ uniformly at random append $$a = \chi_w[j+1]$$ to the end of $$w$$ update $$\chi_w$$ shifting the first $$j$$ elements to the right and setting $$\chi_w[1] = a$$ increment $$N$$ by $$1$$ if $$w$$ ends with an square of rank $$r$$ do delete the last $$r$$ letters of $$w$$ return $$w$$

Every (k+1)-ary squarefree word can be the output of Algorithm R2F, because on each iteration it can append any letter except for the last letter of $$w$$.

The expected number of random k-ary letters used by Algorithm R2F to construct a (k+1)-ary squarefree word of length $$n$$ is$$N=n(1+2/k^2+1/k^3+4/k^4+O(1/k^5))+O(1).$$Note that there exists an algorithm that can verify the squarefreeness of a word of length $$n$$ in $$O(n \log n)$$ time. Apostolico and Preparata give an algorithm using suffix trees. Crochemore uses partitioning in his algorithm. Main and Lorentz provide an algorithm based on the divide-and-conquer method. A naive implementation may require $$O(n^2)$$ time to verify the squarefreeness of a word of length $$n$$.

Infinite squarefree words
There exist arbitrarily long squarefree words in any alphabet with three or more letters, as proved by Axel Thue.

First difference of the Thue–Morse sequence
One example of an infinite squarefree word over an alphabet of size 3 is the word over the alphabet $$\{-1,0,+1\}$$ obtained by taking the first difference of the Thue–Morse sequence. That is, from the Thue–Morse sequence


 * $$0, 1 ,1, 0 ,1 ,0 ,0 ,1, 1 ,0 ,0 ,1, 0 ,1 ,1, 0 ...$$

one forms a new sequence in which each term is the difference of two consecutive terms of the Thue–Morse sequence. The resulting squarefree word is


 * $$1,0,-1,1,-1,0,1,0,-1,0,1,-1,1,0,-1,...$$.

Leech's morphism
Another example found by John Leech is defined recursively over the alphabet $$\{0,1,2\}$$. Let $$w_1$$ be any squarefree word starting with the letter $$0$$. Define the words $$ \{w_i \mid i \in \mathbb{N} \}$$ recursively as follows: the word $$w_{i+1}$$ is obtained from $$w_i$$ by replacing each $$0$$ in $$w_i$$ with $$0121021201210$$, each $$1$$ with $$120210201202$$, and each $$2$$ with $$2010210120102$$. It is possible to prove that the sequence converges to the infinite squarefree word$$0121021201210120210201202120102101201021202102012021...$$

Generating infinite squarefree words
Infinite squarefree words can be generated by squarefree morphism. A morphism is called squarefree if the image of every squarefree word is squarefree. A morphism is called k–squarefree if the image of every squarefree word of length k is squarefree.

Crochemore proves that a uniform morphism $$h$$ is squarefree if and only if it is 3-squarefree. In other words, $$h$$ is squarefree if and only if $$h(w)$$ is squarefree for all squarefree $$w$$ of length 3. It is possible to find a squarefree morphism by brute-force search. algorithm squarefree_morphism is output: a squarefree morphism with the lowest possible rank $$k$$. set $$k = 3$$ while True '''do        set $$k\_sf\_words$$ to the list of all squarefree words of length $$k$$ over a ternary alphabet for each $$h(0)$$ in $$k\_sf\_words$$ do for each $$h(1)$$ in $$k\_sf\_words$$ do for each $$h(2)$$ in $$k\_sf\_words$$ do if $$h(1) = h(2)$$ do break from the current loop (advance to next $$h(1)$$) if $$h(0) \ne h(1)$$ and $$h(2) \ne h(0)$$ do if $$h(w)$$ is squarefree for all squarefree $$w$$ of length $$3$$ do return $$h(0), h(1), h(2)$$ increment $$k$$ by $$1$$ Over a ternary alphabet, there are exactly 144 uniform squarefree morphisms of rank 11 and no uniform squarefree morphisms with a lower rank than 11.

To obtain an infinite squarefree words, start with any squarefree word such as $$0$$, and successively apply a squarefree morphism $$h$$ to it. The resulting words preserve the property of squarefreeness. For example, let $$h$$ be a squarefree morphism, then as $$w \to \infty$$, $$h^{w}(0)$$ is an infinite squarefree word.

Note that, if a morphism over a ternary alphabet is not uniform, then this morphism is squarefree if and only if it is 5-squarefree.

Avoid two-letter combinations
Over a ternary alphabet, a squarefree word of length more than 13 contains all the squarefree two-letter combinations.

This can be proved by constructing a squarefree word without the two-letter combination $$ab$$. As a result, $$bcba$$$$cbca$$$$cbaca$$ is the longest squarefree word without the combination $$ab$$ and its length is equal to 13.

Note that over a more than three-letter alphabet there are squarefree words of any length without an arbitrary two-letter combination.

Avoid three-letter combinations
Over a ternary alphabet, a squarefree word of length more than 36 contains all the squarefree three-letter combinations.

However, there are squarefree words of any length without the three-letter combination $$aba$$.

Note that over a more than three-letter alphabet there are squarefree words of any length without an arbitrary three-letter combination.

Density of a letter
The density of a letter $$a$$ in a finite word $$w$$ is defined as $$\frac{|w|_a}{|w|}$$ where $$|w|_a $$ is the number of occurrences of $$a$$ in $$w $$ and $$|w| $$ is the length of the word. The density of a letter $$a$$ in an infinite word is $$\liminf_{l\to \infty}\frac{|w_l|_a}{|w_l|}$$ where $$w_l $$ is the prefix of the word $$w$$ of length $$l$$.

The minimal density of a letter $$a$$ in an infinite ternary squarefree word is equal to $$\frac{883}{3215}$$.

The maximum density of a letter $$a$$ in an infinite ternary squarefree word is equal to $$\frac{255}{653}$$.

Related concepts
A cube-free word is one with no occurrence of www for a factor w. The Thue-Morse sequence is an example of a cube-free word over a binary alphabet. This sequence is not squarefree but is "almost" so: the critical exponent is 2. The Thue–Morse sequence has no overlap or overlapping square, instances of 0X0X0 or 1X1X1: it is essentially the only infinite binary word with this property. Dejean's theorem characterizes the minimum possible critical exponents for each alphabet size.

The Thue number of a graph G is the smallest number k such that G has a k-coloring for which the sequence of colors along every simple path is squarefree.

The Kolakoski sequence is an example of a cube-free sequence.

An abelian p-th power is a subsequence of the form $$w_1 \cdots w_p$$ where each $$w_i$$ is a permutation of $$w_1$$. There is no abelian-squarefree infinite word over an alphabet of size three: indeed, every word of length eight over such an alphabet contains an abelian square. There is an infinite abelian-squarefree word over an alphabet of size five.