Largest differencing method

In computer science, the largest differencing method is an algorithm for solving the partition problem and the multiway number partitioning. It is also called the Karmarkar–Karp algorithm after its inventors, Narendra Karmarkar and Richard M. Karp. It is often abbreviated as '''LDM. '''

The algorithm
The input to the algorithm is a set S of numbers, and a parameter k. The required output is a partition of S into k subsets, such that the sums in the subsets are as nearly equal as possible. The main steps of the algorithm are:


 * 1) Order the numbers from large to small.
 * 2) Replace the largest and second-largest numbers by their difference.
 * 3) If two or more numbers remain, return to step 1.
 * 4) Using backtracking, compute the partition.

Two-way partitioning
For k=2, the main step (2) works as follows.


 * Take the two largest numbers in S, remove them from S, and insert their difference (this represents a decision to put each of these numbers in a different subset).
 * Proceed in this way until a single number remains. This single number is the difference in sums between the two subsets.

For example, if S = {8,7,6,5,4}, then the resulting difference-sets are {6,5,4,1} after taking out the largest two numbers {8,7} and inserting the difference 8-7=1 back; Repeat the steps and then we have {4,1,1}, then {3,1} then {2}.

Step 3 constructs the subsets in the partition by backtracking. The last step corresponds to {2},{}. Then 2 is replaced by 3 in one set and 1 in the other set: {3},{1}, then {4},{1,1}, then {4,5}, {1,6}, then {4,7,5}, {8,6}, where the sum-difference is indeed 2.

The runtime complexity of this algorithm is dominated by the step 1 (sorting), which takes O(n log n).

Note that this partition is not optimal: in the partition {8,7}, {6,5,4} the sum-difference is 0. However, there is evidence that it provides a "good" partition:


 * If the numbers are uniformly distributed in [0,1], then the expected difference between the two sums is $$n^{-\Theta(\log(n)))}$$. This also implies that the expected ratio between the maximum sum and the optimal maximum sum is $$1+n^{-\Theta(\log(n)))}$$.
 * When there are at most 4 items, LDM returns the optimal partition.
 * LDM always returns a partition in which the largest sum is at most 7/6 times the optimum. This is tight when there are 5 or more items. 
 * On random instances, this approximate algorithm performs much better than greedy number partitioning. However, it is still bad for instances where the numbers are exponential in the size of the set.

Multi-way partitioning
For any k ≥ 2, the algorithm can be generalized in the following way. 


 * Initially, for each number i in S, construct a k-tuple of subsets, in which one subset is {i} and the other k-1 subsets are empty.
 * In each iteration, select two k-tuples A and B in which the difference between the maximum and minimum sum is largest, and combine them in reverse order of sizes, i.e.: smallest subset in A with largest subset in B, second-smallest in A with second-largest in B, etc.
 * Proceed in this way until a single partition remains.

Examples:


 * If S = {8,7,6,5,4} and k=2, then the initial partitions are ({8},{}), ({7},{}), ({6},{}), ({5},{}), ({4},{}). After the first step we get ({6},{}), ({5},{}), ({4},{}), ({8},{7}). Then ({4},{}), ({8},{7}), ({6},{5}). Then ({4,7},{8}), ({6},{5}), and finally ({4,7,5},{8,6}), where the sum-difference is 2; this is the same partition as described above.
 * If S = {8,7,6,5,4} and k=3, then the initial partitions are ({8},{},{}), ({7},{},{}), ({6},{},{}), ({5},{},{}), ({4},{},{}). After the first step we get ({8},{7},{}), ({6},{},{}), ({5},{},{}), ({4},{},{}). Then ({5},{},{}), ({4},{},{}), ({8},{7},{6}). Then ({5},{4},{}), ({8},{7},{6}), and finally ({5,6},{4,7},{8}), where the sum-difference is 3.
 * If S = {5,5,5,4,4,3,3,1} and k=3, then after 7 iterations we get the partition ({4,5},{1,4,5},{3,3,5}).  This solution is not optimal; a better partitioning is provided by the grouping ({5,5},{3,3,4},{1,4,5}).

There is evidence for the good performance of LDM: 


 * Simulation experiments show that, when the numbers are uniformly random in [0,1], LDM always performs better (i.e., produces a partition with a smaller largest sum) than greedy number partitioning. It performs better than the multifit algorithm when the number of items n is sufficiently large. When the numbers are uniformly random in [o, o+1], from some o>0, the performance of LDM remains stable, while the performance of multifit becomes worse as o increases. For o>0.2, LDM performs better.
 * Let f* be the optimal largest sum. If all numbers are larger than f*/3, then LDM returns the optimal solution. Otherwise, LDM returns a solution in which the difference between largest and smallest sum is at most the largest number which is at most f*/3.
 * When there are at most k+2 items, LDM is optimal.
 * When the number of items n is between k+2 and 2k, the largest sum in the LDM partition is at most $$\frac{4}{3}-\frac{1}{3 (n-k-1)}$$ times the optimum,
 * In all cases, the largest sum in the LDM partition is at most $$\frac{4}{3}-\frac{1}{3 k}$$ times the optimum, and there are instances in which it is at least $$\frac{4}{3}-\frac{1}{3 (k-1)}$$ times the optimum.
 * For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is $$n^{-\Theta(\log n)}$$.

Balanced two-way partitioning
Several variants of LDM were developed for the balanced number partitioning problem, in which all subsets must have the same cardinality (up to 1).

PDM (Paired Differencing Method) works as follows.

PDM has average properties worse than LDM. For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is $$\Theta(1/n)$$.
 * 1) Order the numbers from large to small.
 * 2) Replace numbers #1 and #2 by their difference; #3 and #4 by their difference; etc.
 * 3) If two or more numbers remain, return to step 1.
 * 4) Using backtracking, compute the partition.

RLDM (Restricted Largest Differencing Method) works as follows.


 * 1) Order the numbers from large to small.
 * 2) Replace numbers #1 and #2 by their difference; #3 and #4 by their difference; etc.
 * 3) Sort the list of n/2 differences from large to small.
 * 4) Assign each pair in turn to different sets: the largest in the pair to the set with the smallest sum, and the smallest in the pair to the set with the largest sum.

For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is $$O(\log{n}/n^2)$$.

BLDM (Balanced Largest Differencing Method) works as follows.


 * 1) Order the numbers from large to small.
 * 2) Replace numbers #1 and #2 by their difference; #3 and #4 by their difference; etc.
 * 3) Run LDM on the set of differences.

BLDM has average properties similar to LDM. For two-way partitioning, when inputs are uniformly-distributed random variables, the expected difference between largest and smallest sum is $$n^{-\Theta(\log n)}$$.

For multi-way partitioning, when c=ceiling(n/k) and each of the k subsets must contain either ceiling(n/k) or floor(n/k) items, the approximation ratio of BLDM for the minimum largest sum is exactly 4/3 for c=3, 19/12 for c=4, 103/60 for c=5, 643/360 for c=6, and 4603/2520 for c=7. The ratios were found by solving a mixed integer linear program. In general (for any c), the approximation ratio is at least $$2-\sum_{j=0}^{c-1}\frac{j!}{c!}$$ and at most $$2-\frac{1}{c-1}$$. The MILP results for 3,4,5,6,7 correspond to the lower bound. When the parameter is the number of subsets (k), the approximation ratio is exactly $$2-\frac{1}{k}$$.

Min-max subsequence problem
In the min-max subsequence problem, the input is a multiset of n numbers and an integer parameter k, and the goal is to order the numbers such that the largest sum of each block of adjacent k numbers is as small as possible. The problem occurs in the design of video servers. This problem can be solved in polytime for k=2, but it is strongly NP-hard for k≥3. A variance of the differencing method can applied to this problem.

An exact algorithm
The complete Karmarkar–Karp algorithm (CKK) finds an optimal solution by constructing a tree of degree $$k!$$.


 * In the case k=2, each level corresponds to a pair of numbers, and the two branches correspond to taking their difference (i.e. putting them in different sets), or taking their sum (i.e. putting them in the same set).
 * For general k, each level corresponds to a pair of k-tuples, and each of the $$k!$$ branches corresponds to a different way of combining the subsets in these tuples.

For k=2, CKK runs substantially faster than the Complete Greedy Algorithm (CGA) on random instances. This is due to two reasons: when an equal partition does not exist, CKK often allows more trimming than CGA; and when an equal partition does exist, CKK often finds it much faster and thus allows earlier termination. Korf reports that CKK can optimally partition 40 15-digit double-precision numbers in about 3 hours, while CGA requires about 9 hours. In practice, with k=2, problems of arbitrary size can be solved by CKK if the numbers have at most 12 significant digits; with k=3, at most 6 significant digits.

CKK can also run as an anytime algorithm: it finds the KK solution first, and then finds progressively better solutions as time allows (possibly requiring exponential time to reach optimality, for the worst instances).

Combining CKK with the balanced-LDM algorithm (BLDM) yields a complete anytime algorithm for solving the balanced partition problem.

Previous mentions
An algorithm equivalent to the Karmarkar-Karp differencing heuristic is mentioned in ancient Jewish legal texts by Nachmanides and Joseph ibn Habib. The algorithm is used to combine different testimonies about the same loan.

Implementations

 * Python: The prtpy library contains implementations of the Karmarkar-Karp algorithm and complete Karmarkar-Karp algorithm.