Epsilon-induction

In set theory, $$\in$$-induction, also called epsilon-induction or set-induction, is a principle that can be used to prove that all sets satisfy a given property. Considered as an axiomatic principle, it is called the axiom schema of set induction.

The principle implies transfinite induction and recursion. It may also be studied in a general context of induction on well-founded relations.

Statement
The schema is for any given property $$\psi$$ of sets and states that, if for every set $$x$$, the truth of $$\psi(x)$$ follows from the truth of $$\psi$$ for all elements of $$x$$, then this property $$\psi$$ holds for all sets. In symbols:
 * $$\forall x. \Big(\big(\forall (y \in x). \psi(y)\big)\,\to\,\psi(x)\Big)\,\to\,\forall z. \psi(z)$$

Note that for the "bottom case" where $$x$$ denotes the empty set $$\{\}$$, the subexpression $$\forall(y\in x).\psi(y)$$ is vacuously true for all propositions and so that implication is proven by just proving $$\psi(\{\})$$.

In words, if a property is persistent when collecting any sets with that property into a new set (and this also requires establishing the property for the empty set), then the property is simply true for all sets. Said differently, persistence of a property with respect to set formation suffices to reach each set in the domain of discourse.

In terms of classes
One may use the language of classes to express schemata. Denote the universal class $$\{x\mid x=x\}$$ by $${\mathbb U}$$. Let $$\Psi$$ be $$\{x\mid \psi(x)\}$$ and use the informal $$\Psi={\mathbb U}$$ as abbreviation for $$\forall z. z\in\Psi$$. The principle then says that for any $$\Psi$$,
 * $$\forall (x\subseteq \Psi). x\in \Psi\,\,\leftrightarrow\,\,\Psi={\mathbb U}$$

Here the quantifier ranges over all sets. In words this says that any class that contains all of its subsets is simply just the class of all sets.

Assuming bounded separation, $${\mathbb U}$$ is a proper class. So the property $$\forall (x\subseteq \Psi). x\in \Psi$$ is exhibited only by the proper class $${\mathbb U}$$, and in particular by no set. Indeed, note that any set is a subset of itself and under some more assumptions, already the self-membership will be ruled out.

For comparison to another property, note that for a class $$\Sigma$$ to be $$\in$$-transitive means
 * $$\forall (x\in \Sigma). x\subseteq\Sigma$$

There are many transitive sets - in particular the set theoretical ordinals.

Related notions of induction
Exportation proves $$(A\to(B\to C))\leftrightarrow(B\to(A\to C))$$. If $$\psi(x)$$ is $$(x\in\Sigma)\to P(x)$$ for some predicate $$P$$, it thus follows that
 * $$\forall (x\in\Sigma). \Big(\big(\forall (y \in (x\cap\Sigma)). P(y)\big)\,\to\,P(x)\Big)\,\to\,\forall (z\in\Sigma). P(z)$$

where $$y\in x\cap\Sigma$$ is defined as $$y\in x\land y\in\Sigma$$. If $$\Sigma$$ is the universal class, then this is again just an instance of the schema. But indeed if $$\Sigma$$ is any $$\in$$-transitive class, then still $$\forall(x\in\Sigma).(x\cap\Sigma=x)$$ and a version of set induction for $$P$$ holds inside of $$\Sigma$$.

Ordinals
Ordinals may be defined as transitive sets of transitive sets. The induction situation in the first infinite ordinal $$\omega$$, the set of natural numbers, is discussed in more detail below. As set induction allows for induction in transitive sets containing $$\omega$$, this gives what is called transfinite induction and definition by transfinite recursion using, indeed, the whole proper class of ordinals. With ordinals, induction proves that all sets have ordinal rank and the rank of an ordinal is itself.

The theory of Von Neumann ordinals describes such sets and, there, $$y\in x$$ models the order relation $$y < x$$, which classically is provably trichotomous and total. Of interest there is the successor operation $$x\mapsto x\cup\{x\}$$ that maps ordinals to ordinals. In the classical case, the induction step for successor ordinals can be simplified so that a property must merely be preserved between successive ordinals (this is the formulation that is typically understood as transfinite induction.) The sets are $$\in$$-well-founded.

Well-founded relations
For a binary relation $$R_D$$ on a set $$D$$, well-foundedness can be defined by requiring a tailored induction property: $$y\in x$$ in the condition is abstracted to $$R_D(y, x)$$, i.e. one always assumes $$R_D(y, x)\land y\in D$$ in place of the intersection $$y\in(x\cap D)$$ used in the statement above. It can be shown that for a well-founded relation $$R_D$$, there are no infinite descending $$R_D$$-sequences and so also $$\forall y.\neg R_D(y, y)$$. Further, function definition by recursion with $$R_D$$ can be defined on their domains, and so on.

Classically, well-foundedness of a relation on a set can also be characterized by the strong property of minimal element existence for every subset. With dependent choice, it can also be characterized by the weak property of non-existence of infinite descending chains.

For negative predicates
This section concerns the case of set induction and its consequences for predicates which are of a negated form, $$\psi(x) := \neg S(x)$$. Constructively, the resulting statements are generally weaker than set induction for general predicates. To establish equivalences, valid principles such as
 * $$\forall x.\big(A(x)\to \neg B(x)\big)\,\leftrightarrow\,\neg \exists x.\big(A(x)\land B(x)\big)$$,

is commonly made use of, both sides saying that two predicates $$A$$ and $$B$$ can not, for any value, be validated simultaneously. The situation when double-negation elimination is permitted is discussed in the section right after.

Denoting the class $$\{x\mid S(x)\}$$ by $$\Sigma$$, this amounts to the special case of above with, for any $$x$$, $$P(x)$$ equal to the false statement $$x\neq x$$. One has $$x\cap \Sigma=\{\}$$ denoting $$\neg\exists(y\in\Sigma).y\in x$$. Writing $$\Sigma=\{\}$$ for the statement that all sets are not members of the class $$\Sigma$$, the induction schema reduces to
 * $$\neg\exists(x\in\Sigma). x\cap \Sigma=\{\}\,\,\leftrightarrow\,\,\Sigma=\{\}$$

In words, a property (a class) such that there is no $$\in$$-minimal set for it is simply the false property (the empty set). (A minimal $$x$$ for a relation $$R$$ is one for which there does not exist another $$y$$ with $$R(y, x)$$. Here the membership relation restricted to $$\Sigma$$ is considered, i.e. a minimal element with respect to $$\Sigma$$ is one without a $$y\in x\cap \Sigma$$.)

Infinite descending chains
The antecedent in the above implication may be expressed as $$\forall(x\in\Sigma). \neg\neg\exists (y\in\Sigma). y\in x$$. It holds for empty set vacuously. In the presence of any descending membership chain as a function on $$\omega$$, the axiom of replacement proves existence of a set $$\Sigma$$ that also fulfills this. So assuming the induction principle makes the existence of such a chain contradictory.

In this paragraph, assume the axiom of dependent choice in place of the induction principle. Any consequences of the above antecedent is also implied by the $$\forall\exists$$-statement obtained by removing the double-negation, which constructively is a stronger condition. Consider a set $$\Sigma$$ with this $$\forall\exists$$-property. Assuming the set is inhabited, dependent choice implies the existence of an infinite descending membership chain as sequence, i.e. a function $$\omega\to\Sigma$$ on the naturals. So establishing (or even postulating) the non-existence of such a chain for a set with the $$\forall\exists$$-property implies the assumption was wrong, i.e. also $$\Sigma=\{\}$$.

So set induction relates to the postulate of non-existence of infinite descending chains. But given the extra assumptions required in the latter case, the mere non-existence postulate is relatively weak in comparison.

Self-membership
For a contradiction, assume there exists an inhabited set $$s$$ with the particular property that it is equal to its own singleton set, $$s=\{s\}$$. Formally, $$\forall y. (y\in s\leftrightarrow y=s)$$, from which it follows that $$s\in s$$, and also that all members of $$s$$ share all its properties, e.g. $$\forall (y\in s). s\in y$$. From the previous form of the principle it follow that $$s=\{\}$$, a contradiction.

Discussed using the other auxiliary terminologies above, one studies set induction for the class $$\Psi$$ of sets that are not equal to such an $$s$$. So in terms of the negated predicate, $$S(x)$$ is the predicate $$x=s$$, meaning a set that exhibits $$S$$ has the defining properties of $$s$$. Using the set builder notation, one is concerned with $$\Sigma=\{s\}$$. Assuming the special property of $$s$$, any empty intersection statement $$x\cap s=\{\}$$ simplifies to just $$s\notin x$$. The principle in the formulation in terms of $$\Sigma$$ reduces to $$s\notin s$$, again a contradiction. Back to the very original formulation, it is concluded that $$\forall z.z\neq s$$ and $$\Psi$$ is simply the domain of all sets. In a theory with set induction, a $$s$$ with the described recursive property is not actually a set in the first place.

A similar analysis may be applied also to more intricate scenarios. For example, if $$u=\{0,v\}$$ and $$v=\{1,u\}$$ were both sets, then the inhabited $$\{v,u\}$$ would exists by pairing, but this also has the $$\forall\exists$$-property.

Contrapositive
The contrapositive of the form with negation is constructively even weaker but it is only one double negation elimination away from the regularity claim for $$\Sigma$$,
 * $$\Sigma\neq \{\}\,\to\,\neg\neg\exists(x\in\Sigma). x\cap \Sigma=\{\}$$

With double-negations in antecedent and conclusion, the antecedent may equivalently be replaced with $$\exists z.(z\in \Sigma)$$.

Disjunctive form
The excluded middle statement for a universally quantified predicate can classically be expressed as follows: Either it holds for all terms, or there exist a term for which the predicate fails
 * $$\forall z. P(z) \,\lor\, \exists x. \neg P(x)$$

With this, using the disjunctive syllogism, ruling out the possibility of counter-examples classically proves a property for all terms. This purely logical principle is unrelated to other relations between terms, such elementhood (or succession, see below). Using that $$(B\lor\neg A)\to(A\to B)$$ is classically an equivalence, and also using double-negation elimination, the induction principle can be translated to the following statement:
 * $$\forall z. P(z) \,\lor\, \exists x. \Big(\neg P(x)\,\land\,\forall(y\in x). P(y)\Big)$$

This expresses that, for any predicate $$P$$, either it holds for all sets, or there is some set $$x$$ for which $$P$$ does not hold while $$P$$ is at the same time true for all elements of $$x$$. Relating it back to the original formulation: If one can, for any set $$x$$, prove that $$\forall(y\in x).P(y)$$ implies $$P(x)$$, which includes a proof of the bottom case $$P(\{\})$$, then the failure case is ruled out and, then, by the disjunctive syllogism the disjunct $$\forall z. P(z)$$ holds.

For the task of proving $$P$$ by ruling out the existence of counter-examples, the induction principle thus plays a similar role as the excluded middle disjunction, but the former is commonly also adopted in constructive frameworks.

Relation to regularity
The derivation in a previous section shows that set induction classically implies
 * $$\Sigma\neq\{\}\,\to\,\exists(x\in \Sigma). x\cap \Sigma=\{\}$$

In words, any property that is exhibited by some set is also exhibited by a "minimal set" $$x$$, as defined above. In terms of classes, this states that every non-empty class $$\Sigma$$ has a member $$x$$ that is disjoint from it.

In first-order set theories, the common framework, the set induction principle is an axiom schema, granting an axiom for any predicate (i.e. class). In contrast, the axiom of regularity is a single axiom, formulated with a universal quantifier only over elements of the domain of discourse, i.e. over sets. If $$\Sigma$$ is a set and the induction schema is assumed, the above is the instance of the axiom of regularity for $$\Sigma$$. Hence, assuming set induction over a classical logic (i.e. assuming the law of excluded middle), all instances of regularity hold.

In a context with an axiom of separation, regularity also implies excluded middle (for the predicates allowed in ones separation axiom). Meanwhile, the set induction schema does not imply excluded middle, while still being strong enough to imply strong induction principles, as discussed above. In turn, the schema is, for example, adopted in the constructive set theory CZF, which has type theoretic models. So within such a set theory framework, set induction is a strong principle strictly weaker than regularity. When adopting the axiom of regularity and full Separation, CZF equals standard ZF.

History
Because of its use in the set theoretical treatment of ordinals, the axiom of regularity was formulated by von Neumann in 1925. Its motivation goes back to Skolem's 1922 discussion of infinite descending chains in Zermelo set theory $${\mathsf Z}$$, a theory without regularity or replacement.

The theory $${\mathsf Z}$$ does not prove all set induction instances. Regularity is classically equivalent to the contrapositive of set induction for negated statements, as demonstrated. The bridge from sets back to classes is demonstrated below.

Set induction from regularity and transitive sets
Assuming regularity, one may use classical principles, like the reversal of a contrapositive. Moreover, an induction schema stated in terms of a negated predicate $$\neg S$$ is then just as strong as one in terms of a predicate variable $$P$$, as the latter simply equals $$\neg(\neg P)$$. As the equivalences with the contrapositive of set induction have been discussed, the task is to translate regularity back to a statement about a general class $$\Sigma$$. It works in the end because the axiom of separation allows for intersection between sets and classes. Regularity only concerns intersection inside a set and this can be flattened using transitive sets.

The proof is by manipulation of the regularity axiom instance
 * $$s\neq\{\}\,\to\,\exists(x\in s). x\cap s=\{\}$$

for a particular subset $$s\subseteq\Sigma$$ of the class $$\Sigma$$. Observe that given a class $$\Sigma$$ and any transitive set $$t$$, one may define $$s=t\cap\Sigma$$ which has $$x\in s\to (x\in\Sigma\land x\subseteq t)$$ and also $$(x\subseteq t)\to (x\cap s = x\cap \Sigma)$$. With this, the set $$s$$ may always be replaced with the class $$\Sigma$$ in the conclusion of the regularity instance.

The remaining aim is to obtain a statement that also has it replaced in the antecedent, that is, establish the principle holds when assuming the more general $$\Sigma\neq\{\}$$. So assume there is some $$z\in\Sigma$$, together with the existence of some transitive set $$t$$ that has $$z$$ as subset. An intersection $$s_z$$ may be constructed as described and it also has $$(z\cap\Sigma)\subseteq s_z$$. Consider excluded middle for whether or not $$t$$ is disjoint from $$\Sigma$$, i.e. $$s_z=\{\}$$. If $$s_z$$ is empty, then also $$z\cap\Sigma=\{\}$$ and $$x=z$$ itself always fulfills the principle. Otherwise, $$\exists(x\in s_z)$$ by regularity and one can proceed to manipulate the statement by replacing $$s_z$$ with $$\Sigma$$ as discussed. In this case, one even obtains a slightly stronger statement than the one in the previous section, since it carries the sharper information that $$x\in s_z$$ and not just $$x\in\Sigma$$.

Transitive set existence
The proof above assumes the existence of some transitive set containing any given set. This may be postulated, the transitive containment axiom.

Existence of the stronger transitive closure with respect to membership, for any set, can also be derived from some stronger standard axioms. This needs the axiom of infinity for $$\omega$$ as a set, recursive functions on $$\omega$$, the axiom of replacement on $$\omega$$ and finally the axiom of union. I.e. it needs many standard axioms, just sparing the axiom of powerset. In a context without strong separation, suitable function space principles may have to be adopted to enable recursive function definition. $${\mathsf {ZF}}$$ minus infinity also only proves the existence of transitive closures when Regularity is promoted to Set induction.

Comparison of epsilon and natural number induction
The transitive von Neumann model $$\omega$$ of the standard natural numbers is the first infinite ordinal. There, the binary membership relation "$$\in$$" of set theory exactly models the strict ordering of natural numbers "$$<$$". Then, the principle derived from set induction is complete induction.

In this section, quantifiers are understood to range over the domain of first-order Peano arithmetic $${\mathsf{PA}}$$ (or Heyting arithmetic $${\mathsf{HA}}$$). The signature includes the constant symbol "$$0$$", the successor function symbol "$$S$$" and the addition and multiplication function symbols "$$+$$" resp "$$*$$". With it, the naturals form a semiring, which always come with a canonical non-strict preorder "$$\le$$", and the irreflexive $$<$$ may be defined in terms of that. Similarly, the binary ordering relation $$k < n$$ is also definable as $$\exists m. k + Sm = n$$.

For any predicate $$Q$$, the complete induction principle reads
 * $$\forall n. \Big(\big(\forall (k < n). Q(k)\big)\,\to\,Q(n)\Big)\,\to\,\forall m. Q(m)$$

Making use of $$\big(\forall (k < Sn). Q(k)\big)\,\,\leftrightarrow\,\,\big(\forall (k < n). Q(k)\big)\land Q(n)$$, the principle is already implied by standard form of the mathematical induction schema. The latter is expressed not in terms of the decidable order relation "$$<$$" but the primitive symbols,
 * $$\Big(\phi(0)\,\land\,\forall n.\big(\phi(n)\,\to\,\phi(Sn)\big)\Big)\,\to\,\forall m. \phi(m)$$

Lastly, a statement may be proven that merely makes use of the successor symbol and still mirrors set induction: Define a new predicate $$Q_\mathrm{-1}(n)$$ as $$(n = 0) \lor \exists p.\big(Sp = n\land Q(p)\big)$$. It holds for zero by design and so, akin to the bottom case in set induction, the implication $$Q_\mathrm{-1}(0)\,\to\, Q(0)$$ is equivalent to just $$Q(0)$$. Using induction, $${\mathsf{PA}}$$ proves that every $$n$$ is either zero or has a computable unique predecessor, a $$q$$ with $$Sq = n$$. Hence $$Q_\mathrm{-1}(Sq)\leftrightarrow Q(q)$$. When $$n$$ is the successor of $$n-1$$, then $$Q_\mathrm{-1}(n)$$ expresses $$Q(n-1)$$. By case analysis, one obtains
 * $$\forall n. \big(Q_\mathrm{-1}(n) \,\to\, Q(n)\big)\,\to\,\forall m. Q(m)$$

Classical equivalents
Using the classical principles mentioned above, the above may be expressed as
 * $$\forall m. Q(m)\,\lor\,\exists n. \big(\neg Q(n)\,\land\,\forall (k < n). Q(k)\big)$$

It expresses that, for any predicate $$Q$$, either $$Q$$ hold for all numbers, or there is some natural number $$n$$ for which $$Q$$ does not hold despite $$Q$$ holding for all predecessors.

Instead of $$\forall (k < n). Q(k)$$, one may also use $$Q_\mathrm{-1}(n)$$ and obtain a related statement. It constrains the task of ruling out counter-examples for a property of natural numbers: If the bottom case $$Q(0)$$ is validated and one can prove, for any number $$n$$, that the property $$Q$$ is always passed on to $$Sn$$, then this already rules out a failure case. Moreover, if a failure case exists, one can use the least number principle to even prove the existence of a minimal such failure case.

Least number principle
As in the set theory case, one may consider induction for negated predicates and take the contrapositive. After use of a few classical logical equivalences, one obtains a conditional existence claim.

Let $$\Theta$$ denote the set of natural numbers $$\{n\in\omega\mid T(n)\}$$ validating a property $$T$$. In the Neumann model, a natural number $$n$$ is extensionally equal to $$\{k\mid k< n\}$$, the set of numbers smaller than $$n$$. The least number principle, obtained from complete induction, here expressed in terms of sets, reads
 * $$\Theta\neq\{\}\,\to\,\neg\neg\exists (n\in \Theta). n\cap \Theta=\{\}$$

In words, if it cannot be ruled out that some number has the property $$T$$, then it can also not be consistently ruled out that a least such number $$n$$ exists. In classical terms, if there is any number validating $$T$$, then there also exists a least such number validating $$T$$. Least here means that no other number $$k<n$$ is validating $$T$$. This principle should be compared with regularity.

For decidable $$T$$ and any given $$m$$ with $$T(m)$$, all $$k<m$$ can be tested. Moreover, adopting Markov's principle in arithmetic allows removal of double-negation for decidable $$T$$ in general.