User:ParAlgMergeSort/sandbox/Merge algorithm

Parallel merge
A parallel version of the binary merge algorithm can serve as a building block of a parallel merge sort. The following pseudocode demonstrates this algorithm in a parallel divide-and-conquer style (adapted from Cormen et al. ). It operates on two sorted arrays $A$ and $B$ and writes the sorted output to array $C$. The notation A[i...j] denotes the part of $A$ from index $i$ through $j$, exclusive.

algorithm merge(A[i...j], B[k...ℓ], C[p...q]) is inputs A, B, C : array i, j, k, ℓ, p, q : indices let m_A = j - i,        m_B = ℓ - k     if m_A < m_B then swap A and B // ensure that A is the larger array: i, j still belong to A; k, ℓ to B swap m_A and m_B if m_A ≤ 0 then return // base case, nothing to merge let r = ⌊(i + j)/2⌋ let s = binary-search(A[r], B[k...ℓ]) let t = p + (r - i) + (s - k)    C[t] = A[r] in parallel do merge(A[i...r], B[k...s], C[p...t]) merge(A[r+1...j], B[s...ℓ], C[t+1...q])

The algorithm operates by splitting either $A$ or $B$, whichever is larger, into (nearly) equal halves. It then splits the other array into a part with values smaller than the midpoint of the first, and a part with larger or equal values. (The binary search subroutine returns the index in $B$ where $A[r]$ would be, if it were in $B$; that this always a number between $k$ and $ℓ$.) Finally, each pair of halves is merged recursively, and since the recursive calls are independent of each other, they can be done in parallel. Hybrid approach, where serial algorithm is used for recursion base case has been shown to perform well in practice

The work performed by the algorithm for two arrays holding a total of $n$ elements, i.e., the running time of a serial version of it, is $O(n)$. This is optimal since $n$ elements need to be copied into $C$. To calculate the span of the algorithm, it is necessary to derive a Recurrence relation. Since the two recursive calls of P-Merge are in parallel, we only need to consider the costlier of the two calls. In the worst case, the maximum number of elements in one of the recursive calls is at most $\frac 3 4 n$ since the array with more elements is perfectly split in half. Adding the $$\Theta\left( \log(n)\right)$$ cost of the Binary Search, we obtain this recurrence as an upper bound:

$$T_{\infty}^\text{merge}(n) = T_{\infty}^\text{merge}\left(\frac {3} {4} n\right) + \Theta\left( \log(n)\right)$$

The solution is $$T_{\infty}^\text{merge}(n) = \Theta\left(\log(n)^2\right)$$, meaning that it takes that much time on an ideal machine with an unbounded number of processors.

Note: The routine is not stable: if equal items are separated by splitting $A$ and $B$, they will become interleaved in $C$; also swapping $A$ and $B$ will destroy the order, if equal items are spread among both input arrays. As a result, when used for sorting, this algorithm produces a sort that is not stable.