Transposition-driven scheduling

Transposition driven scheduling (TDS) is a load balancing algorithm for parallel computing. It was developed at the Vrije Universiteit in Amsterdam, The Netherlands as an algorithm to solve puzzles. The algorithm provides near-linear speedup with some problems and scales extremely well. It was published about by John Romein, Aske Plaat, Henri Bal and Jonathan Schaeffer.

Transposition based puzzle solving
In a puzzle, all possible plays can be represented in a tree with board positions corresponding to the nodes, moves corresponding to the edges, the initial position as the root of the tree and the solutions as leaves. Cycles in a path, i.e. moves that yield a position that is already encountered higher up in the tree, are left out of the tree because they can never lead to an optimal solution.

In most puzzles, different ordering of actions can lead to the same position of the puzzle. In puzzles where previous actions do not influence the solution, you need to only evaluate this position once to get a solution for both paths. To avoid evaluating the same position more than once (and thus wasting computation cycles), programs written to solve these kinds of puzzles use transposition tables. A transposition is a puzzle state that can be reached by different paths but has the same solution. Every time such a program starts evaluating a position, it first looks up in a table if the position has already been evaluated. If it has, the solution is taken from the table instead of calculated, saving large amounts of time.

However, in parallel computing, this approach has a serious drawback. To make full use of the advantages of transposition lookups, all computers in the network have to communicate their solutions to the other computers one way or the other, or you run the risk of redundantly solving some positions. This incurs a severe communication overhead, meaning that a lot of all computers' time is spent communicating with the others instead of solving the problem.

The traditional setup
To solve this drawback, an approach has been taken that integrates solving the problem and load balancing. To begin, a function is defined that assigns a unique value to every board position. Every computer in the network is assigned a range of board positions for which it holds authority. Every computer has its own transposition table and a job queue. Whenever a computer is done with its current job it fetches a new job from the queue. It then computes all possible distinct positions that can be reached from the current position in one action. This is all traditional transposition based problem solving. However, in the traditional method, the computer would now, for every position just computed, ask the computer that holds authority over that position if it has a solution for it. If not, the computer computes the solution recursively and forwards the solution to the computer whose authority it falls under. This is what causes a lot of communication overhead.

The TDS-step
What TDS does is, instead of asking someone else if it has the solution, is appending the problem to someone else's job queue. In other words, every time a computer has a board position for which it wants a solution, it simply sends it over the network to the responsible computer and does not worry about it anymore. Only if a problem falls within its own authority range will a computer try to look up if it has a solution stored in its own table. If not, it simply appends it to its own queue. If it does have a solution, it does not have to compute anything anymore and fetches a new job from the queue.

The difference
What makes the big difference between traditional transposition based problem solving and TDS is that asking some computer if it has solved a problem follows a request-response approach, in which the computer asking the other computer has to wait for a response. In TDS, forwarding a job to another computer does not involve any waiting, because you know (by design) that the other computer will accept the job and try to solve it. This means that latency (the main cause of delays in request-response models) is not an issue, because a computer simply never waits for an answer to come back. The hardware or operating system can guarantee that the message arrives at its destination so the program does not have to worry about anything anymore after it forwards the job.

Speedup
TDS yields spectacular results compared to traditional algorithms, even attaining superlinear speedup (although only in one sense of the word). This property is attained because computers have a limited amount of memory and for large problems, not all transpositions can be stored. Therefore, some transpositions will be calculated more than once. Because 16 computers have 16 times as much memory as 1 (assuming all computers are identical), 16 computers with TDS can store more transpositions than 1 and therefore have to compute less. When the one computer gets 16 times as much memory as each of the group of 16, speedup is just below linear.

Scalability
Because the communication scheme in TDS uses only point-to-point communication and no broadcasting or other group communication, the amount of total communication is proportional to the number of computers participating the computation. Because of this, TDS scales really well to parallel systems with more computers. Also, because latency is not an issue, TDS is scalable in a geographical sense as well