User:ByteHamster/sandbox-k-way-merge

k-way merge
The k-way merge problem consists of merging k sorted arrays to produce a single sorted array with the same elements. Denote by n the total number of elements. n is equal to the size of the output array and the sum of the sizes of the k input arrays. For simplicity, we assume that none of the input arrays is empty. As a consequence k < n, which simplifies the reported running times. The problem can be solved in O(n log k) running time with O(n) space. Several algorithms that achieve this running time exist.

Iterative 2-Way merge
The problem can be merged by iteratively merging two of the k arrays using a 2-way merge until only a single array is left. If the arrays are merged in arbitrary order, then the resulting running time is only O(kn). This is suboptimal.

The running time can be improved by iteratively merging the first with the second, the third with the fourth, and so on. As the number of arrays is halved in each iteration, there are only Θ(log k) iterations. In each iteration every element is moved exactly once. The running time per iteration is therefore in Θ(n) as n is the number of elements. The total running time is therefore in Θ(n log k).

We can further improve upon this algorithm, by iteratively merging the two shortest arrays. It is clear that this minimizes the running time and can therefore not be worse than the strategy described in the previous paragraph. The running time is therefore in O(n log k). Fortunately, in border cases the running time can be better. Consider for example the degenerate case, where all but one array contain only one element. The strategy explained in the previous paragraph needs Θ(n log k) running time, while the improved one only needs Θ(n) running time.

Direct k-way merge
The basic idea of a direct k-way merge consists of efficiently computing the minimum element of all k arrays and then transferring it to the output array.

A straightforward implementation would scan all k arrays to determine the minimum. This straightforward implementation results in a running time of Θ(kn). Note that this is mentioned only as a possibility, for the sake of discussion. Although it would work, it is not efficient.

We can improve upon this by computing the smallest element faster. By using either heaps, tournament trees, or splay trees, the smallest element can be determined in O(log k) time. The resulting running times are therefore in O(n log k).

The heap is more commonly used, allthough a tournament tree is faster in practice. A heap uses approximately 2*log k comparisons in each step because it works from the root down to the bottom and needs to compare both children of each node. Meanwhile, a tournament tree only needs log k comparisons because it starts on the bottom of the tree and works up to the root, only making a single comparison in each layer. The tournament tree should therefore be preferred.

Heap
The heap algorithm allocates a min-heap of pointers into the input arrays. Initially these pointers point to the smallest elements of the input array. The pointers are sorted by the value that they point to. In an O(k) preprocessing step the heap is created using the standard heapify procedure. Afterwards, the algorithm iteratively transfers the element that the root pointer points to, increases this pointer and executes the standard decrease key procedure upon the root element. The running time of the increase key procedure is bounded by O(log k). As there are n elements, the total running time is O(n log k).

Note that the operation of replacing the key and iteratively doing decrease-key or sift-down are not supported by many Priority Queue libraries such as C++ stl and Java. Doing an extract-min and insert function is less efficient.

Tournament Tree
The Tournament Tree is based on an elimination tournament, like it can be found in sports. In each game, two of the input elements compete. The winner is promoted to the next round. Therefore, we get a binary tree of games. The list is sorted in ascescending order, so the winner of a game is the smaller one of both elements.



For k-way merging, it is more efficient to only store the loser of each game (see image). The data structure is therefore called a loser tree. When building the tree or replacing an element with the next one from its list, we still promote the winner of the game to the top. The tree is filled like in a sports match but the nodes only store the loser. Usually, an additional node above the root is added that represents the overall winner. Every leaf stores a pointer to one of the input arrays. Every inner node stores a value and an index. The index of an inner node indicates which input array the value comes from. The value contains a copy of the first element of the corresponding input array.

The algorithm iteratively appends the minimum element to the result and then removes the element from the corresponding input list. It updates the nodes on the path from the updated leaf to the root (replacement selection). The removed element is the overall winner. Therefore, it has won each game on the path from the input array to the root. When selecting a new element from the input array, the element needs to compete against the previous losers on the path to the root. When using a loser tree, the partner for replaying the games is already stored in the nodes. The loser of each replayed game is written to the node and the winner is iteratively promoted to the top. When the root is reached, the new overall winner was found and can be used in the next round of merging.

Algorithm
A tournament tree can be represented as a balanced binary tree by adding sentinels to the input lists and by adding lists until the number of lists is a power of two. The balanced tree can be stored in a single array. The parent element can be reached by dividing the current index by two.

When one of the leaves is updated, all games from the leaf to the root are replayed. In the following pseudocode, an object oriented tree is used instead of an array because it is easier to understand. Additionally, the number of lists to merge is assumed to be a power of two.

function merge(L1, …, Ln) buildTree(heads of L1,…,Ln) while tree has elements winner := tree.winner output winner.value new := winner.index.next replayGames(winner, new) // Replacement selection

function replayGames(node, new) loser, winner := playGame(node, new) node.value := loser.value node.index := loser.index if node != root replayGames(node.parent, winner)

function buildTree(elements) nextLayer := new Array while elements not empty el1 := elements.take el2 := elements.take loser, winner := playGame(el1, el2) parent := new Node(el1, el2, loser) nextLayer.add(parent) if nextLayer.size == 1 return nextLayer // only root else return buildTree(nextLayer)

Running time
In the beginning, the tree is first created in time Θ(k). In each step of merging, only the games on the path from the new element to the root need to be replayed. In each layer, only one comparison is needed. As the tree is balanced, the path from one of the input arrays to the root contains only Θ(log k) elements. In total, there are n elements that need to be transferred. The resulting total running time is therefore in Θ(n log k).

Example
The following section contains a detailed example for the replacement selection step and one example for a complete merge containing multiple replacement selections.

Replacement selection
Games are replayed from the bottom to the top. In each layer of the tree, the currently stored element of the node and the element that was provided from the layer below compete. The winner is promoted to the top until we found the new overall winner. The loser is stored in the node of the tree.



Merge
To execute the merge itself, the overall smallest element is repeatedly replaced with the next input element. After that, the games to the top are replayed.

This example uses four sorted arrays as input.

{2, 7, 16} {5, 10, 20} {3, 6, 21} {4, 8, 9}

The algorithm is initiated with the heads of each input list. Using these elements, a binary tree of losers is built. For merging, the lowest list element 2 is determined by looking at the overall minimum element at the top of the tree. That value is then popped off, and its leaf is refilled with 7, the next value in the input list. The games on the way to the top are replayed like in the previous section about replacement selection. The next element that is removed is 3. Starting from the next value in the list, 6, the games are replayed up until the root. This is being repeated until the minimum of the tree equals infinity.



Lower Running Time Bound
One can show that no comparison-based k-way merge algorithm exists with a running time in O(n f(k)) where f grows asymptotically slower than a logarithm. (Excluding data with desirable distributions such as disjoint ranges.) The proof is a straightforward reduction from comparison-based sorting. Suppose that such an algorithm existed, then we could construct a comparison-based sorting algorithm with running time O(n f(n)) as follows: Chop the input array into n arrays of size 1. Merge these n arrays with the k-way merge algorithm. The resulting array is sorted and the algorithm has a running time in O(n f(n)). This is a contradiction to the well-known result that no comparison-based sorting algorithm with a worst case running time below O(n log n) exists.