Endgame tablebase



In chess, the endgame tablebase, or simply tablebase, is a computerised database containing precalculated evaluations of endgame positions. Tablebases are used to analyse finished games, as well as by chess engines to evaluate positions during play. Tablebases are typically exhaustive, covering every legal arrangement of a specific selection of pieces on the board, with both White and Black to move. For each position, the tablebase records the ultimate result of the game (i.e. a win for White, a win for Black, or a draw) and the number of moves required to achieve that result, both assuming perfect play. Because every legal move in a covered position results in another covered position, the tablebase acts as an oracle that always provides the optimal move.

Tablebases are generated by retrograde analysis, working backward from checkmated or drawn positions. By 2005, tablebases for all positions having up to six pieces, including the two kings, had been created. By August 2012, tablebases had solved chess for almost every position with up to seven pieces, with certain subclasses omitted due to their assumed triviality; these omitted positions were included by August 2018. , work is still underway to solve all eight-piece positions.

Tablebases have profoundly advanced the chess community's understanding of endgame theory. Some positions which humans had analysed as draws were proven to be winnable; in some cases, tablebase analysis found a mate in more than five hundred moves, far beyond the ability of humans, and beyond the capability of a computer during play. This caused the fifty-move rule to be called into question, since many positions were discovered that were winning for one side but drawn during play because of this rule. Initially, some exceptions to the fifty-move rule were introduced, but when more extreme cases were later discovered, these exceptions were removed. Tablebases also facilitate the composition of endgame studies.

While endgame tablebases exist for other board games, such as checkers, nine men's morris, and some chess variants, the term endgame tablebase is usually assumed to refer to chess tablebases.

Background
Physical limitations of computer hardware aside, in principle it is possible to solve any game under the condition that the complete state is known and there is no random chance. Strong solutions, i.e. algorithms that can produce perfect play from any position, are known for some simple games such as Tic Tac Toe/Noughts and crosses (draw with perfect play) and Connect Four (first player wins). Weak solutions exist for somewhat more complex games, such as checkers (with perfect play on both sides the game is known to be a draw, but it is not known for every position created by less-than-perfect play what the perfect next move would be). Other games, such as chess and Go, have not been solved because their game complexity is far too vast for computers to evaluate all possible positions. To reduce the game complexity, researchers have modified these complex games by reducing the size of the board, or the number of pieces, or both.

Computer chess is one of the oldest domains of artificial intelligence, having begun in the early 1930s. Claude Shannon proposed formal criteria for evaluating chess moves in 1949. In 1951, Alan Turing designed a primitive chess-playing program, which assigned values for material and mobility; the program "played" chess based on Turing's manual calculations. However, even as competent chess programs began to develop, they exhibited a glaring weakness in playing the endgame. Programmers added specific heuristics for the endgame – for example, the king should move to the center of the board. However, a more comprehensive solution was needed.

In 1965, Richard Bellman proposed the creation of a database to solve chess and checkers endgames using retrograde analysis. Instead of analyzing forward from the position currently on the board, the database would analyze backward from positions where one player was checkmated or stalemated. Thus, a chess computer would no longer need to analyze endgame positions during the game because they were solved beforehand. It would no longer make mistakes because the tablebase always played the best possible move.

In 1970, Thomas Ströhlein published a doctoral thesis with analysis of the following classes of endgame: KQK, KRK, KPK, KQKR, KRKB, and KRKN. In 1977, Ken Thompson's KQKR tablebase was used in a match against Grandmaster Walter Browne.

Thompson and others helped extend tablebases to cover all four- and five-piece endgames, including KBBKN, KQPKQ, and KRPKR. Lewis Stiller published a thesis with research on some six-piece tablebase endgames in 1991.

More recent contributors include:
 * John Nunn, foremost data miner of chess endgames and prolific endgame author.
 * Eugene Nalimov, after whom the popular Nalimov tablebases are named. Their total size is about 1.2 TB.
 * Eiko Bleicher, who has adapted the tablebase concept to a program called "Freezer"
 * Guy Haworth, an academic at the University of Reading, who has published extensively in the ICGA Journal and elsewhere;
 * Marc Bourzutschky and Yakov Konoval, who have collaborated to analyze endgames with seven pieces on the board;
 * Peter Karrer, who constructed a specialized seven-piece tablebase (KQPPKQP) for the endgame of the Kasparov versus The World online match;
 * Vladimir Makhnychev and Victor Zakharov from Moscow State University, who completed the 4+3 DTM tablebases (525 endings including KPPPKPP) in July 2012 and the 5+2 DTM-tablebases (350 endings including KPPPPKP) in August 2012. They were generated on a supercomputer named Lomonosov. Their total size is about 140 TB. They were attacked by a ransomware in 2021, and have been offline since then.
 * Ronald de Man and Bojun Guo, who generated the seven man DTZ tablebase called the Syzygy tablebase in 2018. They were able to reduce the size of seven-man tablebases from 140 TB to 18.4 TB.

The tablebases of all endgames with up to seven pieces are available for free download, and may also be queried using web interfaces. Research on creating an eight-piece tablebase started in 2021. During an interview with Google in 2010, Garry Kasparov said that "maybe" the limit will be 8 pieces. Because the starting position of chess is the ultimate endgame, with 32 pieces, he claimed that chess can not be solved by computers.

Metrics
Before creating a tablebase, a programmer must choose a metric of optimality which means they must define at what point a player has "won" the game. Every position solved by the tablebase will either have a distance (i.e. the number of moves or plies) from this specific point or will get classified as a draw. To date, three different metrics have been used: DTZ is the only metric which supports the fifty-move rule as it determines the distance to a "zeroing-move" (i.e. a move which resets the move count to zero under the fifty-move rule). By definition, all "won" positions will always have DTZ $$\leq$$ DTC $$\leq$$ DTM. In pawnless positions or positions with only blocked pawns, DTZ is identical to DTC.
 * Depth to mate (DTM) - The game can only be won by checkmate.
 * Depth to conversion (DTC) - The game can be won by checkmate, capturing material or promoting a pawn. For example, in KQKR, conversion occurs when White captures the Black rook.
 * Depth to zeroing (DTZ) - The game can be won by checkmate, capturing material or moving a pawn. For example, in KRPKR, zeroing occurs when White moves his pawn closer to the eighth rank.

The difference between DTC and DTM can be understood by analyzing the diagram at the right. The optimal play depends on which metric is used.

According to the DTC metric, White should capture the rook because that leads immediately to a position which will certainly win (DTC = 1), but it will take two more moves actually to checkmate (DTM = 3). In contrast according to the DTM metric, White mates in two moves, so DTM = DTC = 2.

This difference is typical of many endgames. DTC is always smaller than or equal to DTM, but the DTM metric always leads to the quickest checkmate. Incidentally, DTC = DTM in the unusual endgame of two knights versus one pawn because capturing the pawn (the only material Black has) results in a draw, unless the capture is also checkmate.

Step 1: Generating all possible positions
Once a metric is chosen, the first step is to generate all the positions with a given material. For example, to generate a DTM tablebase for the endgame of king and queen versus king (KQK), the computer must describe approximately 40,000 unique legal positions.

Levy and Newborn explain that the number 40,000 derives from a symmetry argument. The Black king can be placed on any of ten squares: a1, b1, c1, d1, b2, c2, d2, c3, d3, and d4 (see diagram). On any other square, its position can be considered equivalent by symmetry of rotation or reflection. Thus, there is no difference whether a Black king in a corner resides on a1, a8, h8, or h1. Multiply this number of 10 by at most 60 (legal remaining) squares for placing the White king and then by at most 62 squares for the White queen. The product 10×60×62 = 37,200. Several hundred of these positions are illegal, impossible, or symmetrical reflections of each other, so the actual number is somewhat smaller.

For each position, the tablebase evaluates the situation separately for White-to-move and Black-to-move. Assuming that White has the queen, almost all the positions are White wins, with checkmate forced in no more than ten moves. Some positions are draws because of stalemate or the unavoidable loss of the queen.

Each additional piece added to a pawnless endgame multiplies the number of unique positions by about a factor of sixty which is the approximate number of squares not already occupied by other pieces.

Endgames with one or more pawns increase the complexity because the symmetry argument is reduced. Since pawns can move forward but not sideways, rotation and vertical reflection of the board produces a fundamental change in the nature of the position. The best calculation of symmetry is achieved by limiting one pawn to 24 squares in the rectangle a2-a7-d7-d2. All other pieces and pawns may be located in any of the 64 squares with respect to the pawn. Thus, an endgame with pawns has a complexity of 24/10 = 2.4 times a pawnless endgame with the same number of pieces.

Step 2: Evaluating positions using retrograde analysis
Tim Krabbé explains the process of generating a tablebase as follows:

"'The idea is that a database is made with all possible positions with a given material [note: as in the preceding section]. Then a subdatabase is made of all positions where Black is mated. Then one where White can give mate. Then one where Black cannot stop White giving mate next move. Then one where White can always reach a position where Black cannot stop him from giving mate next move. And so on, always a ply further away from mate until all positions that are thus connected to mate have been found. Then all of these positions are linked back to mate by the shortest path through the database. That means that, apart from 'equi-optimal' moves, all the moves in such a path are perfect: White's move always leads to the quickest mate, Black's move always leads to the slowest mate.'"

The retrograde analysis is only necessary from the checkmated positions, because every position that cannot be reached by moving backward from a checkmated position must be a draw.

Figure 1 illustrates the idea of retrograde analysis. White can force mate in two moves by playing 1. Kc6, leading to the position in Figure 2. There are only two legal moves for black from this position, both of which lead to checkmate: if 1...Kb8 2. Qb7#, and if 1...Kd8 2. Qd7# (Figure 3).

Figure 3, before White's second move, is defined as "mate in one ply." Figure 2, after White's first move, is "mate in two ply," regardless of how Black plays. Finally, the initial position in Figure 1 is "mate in three ply" (i.e., two moves) because it leads directly to Figure 2, which is already defined as "mate in two ply." This process, which links a current position to another position that could have existed one ply earlier, can continue indefinitely.

Each position is evaluated as a win or loss in a certain number of moves. At the end of the retrograde analysis, positions which are not designated as wins or losses are necessarily draws.

Step 3: Verification
After the tablebase has been generated, and every position has been evaluated, the result must be verified independently. The purpose is to check the self-consistency of the tablebase results.

For example, in Figure 1 above, the verification program sees the evaluation "mate in three ply (Kc6)." It then looks at the position in Figure 2, after Kc6, and sees the evaluation "mate in two ply." These two evaluations are consistent with each other. If the evaluation of Figure 2 were anything else, it would be inconsistent with Figure 1, so the tablebase would need to be corrected.

Captures, pawn promotion, and special moves
A four-piece tablebase must rely on three-piece tablebases that could result if one piece is captured. Similarly, a tablebase containing a pawn must be able to rely on other tablebases that deal with the new set of material after pawn promotion to a queen or other piece. The retrograde analysis program must account for the possibility of a capture or pawn promotion on the previous move.

Tablebases assume that castling is not possible for two reasons. First, in practical endgames, this assumption is almost always correct. (However, castling is allowed by convention in composed problems and studies.) Second, if the king and rook are on their original squares, castling may or may not be allowed. Because of this ambiguity, it would be necessary to make separate evaluations for states in which castling is or is not possible.

The same ambiguity exists for the en passant capture, since the possibility of en passant depends on the opponent's previous move. However, practical applications of en passant occur frequently in pawn endgames, so tablebases account for the possibility of en passant for positions where both sides have at least one pawn.

Using a priori information
According to the method described above, the tablebase must allow the possibility that a given piece might occupy any of the 64 squares. In some positions, it is possible to restrict the search space without affecting the result. This saves computational resources and enables searches which would otherwise be impossible.

An early analysis of this type was published in 1987, in the endgame KRP(a2)KBP(a3), where the Black bishop moves on the dark squares (see example position at right). In this position, we can make the following a priori assumptions: The result of this simplification is that, instead of searching for 48 * 47 = 2,256 permutations for the pawns' locations, there is only one permutation. Reducing the search space by a factor of 2,256 facilitates a much quicker calculation.
 * 1) If a piece is captured, we can look up the resulting position in the corresponding tablebase with five pieces. For example, if the Black pawn is captured, look up the newly created position in KRPKB.
 * 2) The White pawn stays on a2; capture moves are handled by the 1st rule.
 * 3) The Black pawn stays on a3; capture moves are handled by the 1st rule.

Bleicher has designed a commercial program called "Freezer," which allows users to build new tablebases from existing Nalimov tablebases with a priori information. The program could produce a tablebase for positions with seven or more pieces with blocked pawns, even before tablebases for seven pieces became available.

Correspondence chess
In correspondence chess, a player may consult a chess computer for assistance, provided that the etiquette of the competition allows this. Some correspondence organizations draw a distinction in their rules between utilizing chess engines which calculate a position in real time and the use of a precomputed database stored on a computer. Use of an endgame tablebase might be permitted in a live game even if engine use is forbidden. Players have also used tablebases to analyze endgames from over-the-board play after the game is over. A six-piece tablebase (KQQKQQ) was used to analyze the endgame that occurred in the correspondence game Kasparov versus The World.

Competitive players must know that some tablebases ignore the fifty-move rule. According to that rule, if fifty moves have passed without a capture or a pawn move, either player may claim a draw. FIDE changed the rules several times, starting in 1974, to allow one hundred moves for endgames where fifty moves were insufficient to win. In 1988, FIDE allowed seventy-five moves for KBBKN, KNNKP, KQKBB, KQKNN, KRBKR, and KQPKQ with the pawn on the seventh rank, because tablebases had uncovered positions in these endgames requiring more than fifty moves to win. In 1992, FIDE canceled these exceptions and restored the fifty-move rule to its original standing. Thus a tablebase may identify a position as won or lost, when it is in fact drawn by the fifty-move rule. Such a position is sometimes termed a "cursed win" (where mate can be forced, but it runs afoul of the 50-move rule), or a "blessed loss" from the perspective of the other player.

In 2013, ICCF changed the rules for correspondence chess tournaments starting from 2014; a player may claim a win or draw based on six-man tablebases. In this case the fifty-move rule is not applied, and the number of moves to mate is not taken into consideration. In 2020, this was increased to seven-man tablebases.

Computer chess
The knowledge contained in tablebases allows the computer a tremendous advantage in the endgame. Not only can computers play perfectly within an endgame, but they can simplify to a winning tablebase position from a more complicated endgame. For the latter purpose, some programs use "bitbases" which give the game-theoretical value of positions without the number of moves until conversion or mate – that is, they only reveal whether the position is won, lost or draw. Sometimes even this data is compressed and the bitbase reveals only whether a position is won or not, making no difference between a lost and a drawn game. Shredderbases, for example, used by the Shredder program, are a type of bitbase, which fits all 3-, 4- and 5-piece bitbases in 157 MB. This is a mere fraction of the 7.05 GB that the Nalimov tablebases require.

Some computer chess experts have observed practical drawbacks to the use of tablebases. In addition to ignoring the fifty-move rule, a computer in a difficult position might avoid the losing side of a tablebase ending even if the opponent cannot practically win without himself knowing the tablebase. The adverse effect could be a premature resignation, or an inferior line of play that loses with less resistance than a play without tablebase might offer. Another drawback is that tablebases require a lot of memory to store trillions of positions. The Nalimov tablebases, which use advanced compression techniques, require 7.05 GB of hard disk space for all 5-piece endings and 1.2 TB for 6-piece endings. The 7-piece Lomonosov tablebase requires 140 TB of storage space. Some computers play better overall if their memory is devoted instead to the ordinary search and evaluation function. Modern engines play endgames significantly better, and using tablebases only results in a very minor improvement to their performance.

Syzygy tablebases were developed by Ronald de Man and released in April 2013 in a form optimized for use by a chess program during search. This variety consists of two tables per endgame: a smaller WDL (win/draw/loss) table which contains knowledge of the 50-move rule, and a larger DTZ table (distance to zero ply, i.e., pawn move or capture). The WDL tables were designed to be small enough to fit on a solid-state drive for quick access during search, whereas the DTZ form is for use at the root position to choose the game-theoretically quickest distance to resetting the 50-move rule while retaining a winning position, instead of performing a search. Syzygy tablebases are available for all 6-piece endings, and are now supported by many top engines, including Stockfish, Leela, Dragon, and Torch. Since August 2018, all 7-piece Syzygy tables are also available.

In 2020, Ronald de Man estimated that 8-man tablebases would be economically feasible within 5 to 10 years, as just 2 PB of disk space would store them in Syzygy format, and they could be generated using existing code on a conventional server with 64 TB of RAM.

Endgame theory
In contexts where the fifty-move rule may be ignored, tablebases have answered longstanding questions about whether certain combinations of material are wins or draws. The following interesting results have emerged:
 * KBBKN — Bernhard Horwitz and Josef Kling (1851) proposed that Black can draw by entering a defensive fortress, but tablebases demonstrated a general win, with maximum DTC = 66 and maximum DTM = 78. (Also see pawnless chess endgame.)
 * KNNKP – Maximum DTC = DTM = 115 moves.
 * KNNNNKQ – The knights win in 62.5 percent of positions, with maximum DTM = 85 moves.
 * KQRKQR – Despite the equality of material, the player to move wins in 67.74% of positions. The maximum DTC is 92, and the maximum DTM is 117. In both this endgame and KQQKQQ, the first player to check usually wins.
 * KRNKNN and KRBKNN — Friedrich Amelung had analyzed these two endgames in the 1900s. KRNKNN and KRBKNN are won for the stronger side in 78% and 95% of the cases, respectively. Stiller's DTC tablebase revealed several lengthy wins in these endgames. The longest win in KRBKNN has a DTC of 223 and a DTM of 238 moves (not shown). Even more interesting is the position at right, where White wins starting with 1. Ke6! Stiller reported the DTC as 243 moves, and the DTM was later found to be 262 moves.

For some years, a "mate-in-200" position (first diagram below) held the record for the longest computer-generated forced mate. (Otto Blathy had composed a "mate in 292 moves" problem in 1889, albeit from an illegal starting position. ) In May 2006, Bourzutschky and Konoval discovered a KQNKRBN position with a DTC of 517 moves, whose DTM was later found to be 545 moves. In 2012, when the Lomonosov 7-piece tablebase was being completed, a position was found with a record DTM of 549 moves (third diagram below). It was initially assumed that a 1000-move mate in one of the 8-man endgames would be found. However, cursory targeted research has currently only found a position with DTC 584, which was discovered in 2021 by Bourzutschky. Assuming this projection holds true, Haworth’s Law (which states that the number of moves roughly doubles for each piece added) breaks down at this point.

Many positions are winnable despite seeming to be non-winnable by force at first glance. For example, the position in the middle diagram is a win for Black in 154 moves (the white pawn is captured after around 80 moves).

Endgame studies
Since many composed endgame studies deal with positions that exist in tablebases, their soundness can be checked using the tablebases. Some studies have been proved unsound by the tablebases. That can be either because the composer's solution does not work, or else because there is an equally effective alternative that the composer did not consider. Another way tablebases cook studies is a change in the evaluation of an endgame. For instance, the endgame with a queen and bishop versus two rooks was thought to be a draw, but tablebases proved it to be a win for the queen and bishop, so almost all studies based on this endgame are unsound.

For example, Erik Pogosyants composed the study at right, with White to play and win. His intended main line was 1. Ne3 Rxh2 2. 0-0-0#! A tablebase discovered that 1. h4 also wins for White in 33 moves, even though Black can capture the pawn (which is not the best move – in case of capturing the pawn black loses in 21 moves, while Kh1-g2 loses in 32 moves). Incidentally, the tablebase does not recognize the composer's solution because it includes castling.

While tablebases have cooked some studies, they have assisted in the creation of other studies. Composers can search tablebases for interesting positions, such as zugzwang, using a method called data mining. For all three- to five-piece endgames and pawnless six-piece endgames, a complete list of mutual zugzwangs has been tabulated and published.

There has been some controversy whether to allow endgame studies composed with tablebase assistance into composing tournaments. In 2003, the endgame composer and expert John Roycroft summarized the debate:

"[N]ot only do opinions diverge widely, but they are frequently adhered to strongly, even vehemently: at one extreme is the view that since we can never be certain that a computer has been used it is pointless to attempt a distinction, so we should simply evaluate a 'study' on its content, without reference to its origins; at the other extreme is the view that using a 'mouse' to lift an interesting position from a ready-made computer-generated list is in no sense composing, so we should outlaw every such position."

Roycroft himself agrees with the latter approach. He continues, "One thing alone is clear to us: the distinction between classical composing and computer composing should be preserved for as long as possible: if there is a name associated with a study diagram that name is a claim of authorship."

Mark Dvoretsky, an International Master, chess trainer, and author, took a more permissive stance. He was commenting in 2006 on a study by Harold van der Heijden, published in 2001, which reached the position at right after three introductory moves. The drawing move for White is 4. Kb4!! (and not 4. Kb5), based on a mutual zugzwang that may occur three moves later.

Dvoretsky comments: "Here, we should touch on one delicate question. I am sure that this unique endgame position was discovered with the help of Thompson’s famous computer database. Is this a 'flaw,' diminishing the composer's achievement? Yes, the computer database is an instrument, available to anyone nowadays. Out of it, no doubt, we could probably extract yet more unique positions – there are some chess composers who do so regularly. The standard for evaluation here should be the result achieved. Thus: miracles, based upon complex computer analysis rather than on their content of sharp ideas, are probably of interest only to certain aesthetes."

"Play chess with God"
On the Bell Labs website, Ken Thompson once maintained a link to some of his tablebase data. The headline read, "Play chess with God."

Regarding Stiller's long wins, Tim Krabbé struck a similar note: Playing over these moves is an eerie experience. They are not human; a grandmaster does not understand them any better than someone who has learned chess yesterday. The knights jump, the kings orbit, the sun goes down, and every move is the truth. It's like being revealed the Meaning of Life, but it's in Estonian.

Nomenclature
Originally, an endgame tablebase was called an "endgame data base" or "endgame database". This name appeared in both EG and the ICCA Journal starting in the 1970s, and is sometimes used today. According to Haworth, the ICCA Journal first used the word "tablebase" in connection with chess endgames in 1995. According to that source, a tablebase contains a complete set of information, but a database might lack some information.

Haworth prefers the term "Endgame Table", and has used it in the articles he has authored. Roycroft has used the term "oracle database" throughout his magazine, EG. Nonetheless, the mainstream chess community has adopted "endgame tablebase" as the most common name.

Books
John Nunn has written three books based on detailed analysis of endgame tablebases: