Balanced number partitioning

Balanced number partitioning is a variant of multiway number partitioning in which there are constraints on the number of items allocated to each set. The input to the problem is a set of n items of different sizes, and two integers m, k. The output is a partition of the items into m subsets, such that the number of items in each subset is at most k. Subject to this, it is required that the sums of sizes in the m subsets are as similar as possible.

An example application is identical-machines scheduling where each machine has a job-queue that can hold at most k jobs. The problem has applications also in manufacturing of VLSI chips, and in assigning tools to machines in flexible manufacturing systems.

In the standard three-field notation for optimal job scheduling problems, the problem of minimizing the largest sum is sometimes denoted by "P | # ≤ k | Cmax". The middle field "# ≤ k" denotes that the number of jobs in each machine should be at most k. This is in contrast to the unconstrained version, which is denoted by "$$P\parallel C_\max$$".

Two-way balanced partitioning
A common special case called two-way balanced partitioning is when there should be two subsets (m = 2). The two subsets should contain floor(n/2) and ceiling(n/2) items. It is a variant of the partition problem. It is NP-hard to decide whether there exists a partition in which the sums in the two subsets are equal; see problem [SP12]. There are many algorithms that aim to find a balanced partition in which the sum is as nearly-equal as possible.


 * Coffman, Frederickson and Lueker present a restricted version of the LPT algorithm (called RLPT), in which inputs are assigned in pairs.  When inputs are uniformly-distributed random variables, the expected largest sum of RLPT is exactly $$\frac{n}{4}+\frac{1}{2n+2}$$. The expected work-difference (difference between largest and smallest sum) is $$\Theta(1/n)$$.
 * Lueker presents a variant of the LDM algorithm (called the pairwise differencing method (PDM)). Its expected work-difference is $$\Theta(1/n)$$.
 * Tsai presents an algorithm called Restricted Largest Difference (RLD). Its work-difference is $$O(\log n/n^2)$$ almost surely.
 * Yakir presents a balanced variant of the LDM algorithm for m = 2, called BLDM. Its expected work-difference is $$n^{-\Theta(\log n)}$$.
 * Mertens presents a complete anytime algorithm for balanced two-way partitioning. It combines the BLDM algorithm with the complete-Karmarkar–Karp algorithm.

Balanced triplet partitioning
Another special case called 3-partitioning is when the number of items in each subset should be at most 3 (k = 3). Deciding whether there exists such a partition with equal sums is exactly the 3-partition problem, which is known to be strongly NP-hard. There are approximation algorithms that aim to find a partition in which the sum is as nearly-equal as possible.


 * Kellerer and Woeginger adapt the LPT algorithm to triplet partitioning (where there are at most 3*m items, and each subset should contain at most 3 items). Their algorithm is called modified-LPT or MLPT. It orders the items from large to small, and puts each item in turn into the bin with the smallest sum among those bins that contain fewer than 3 items. They show that the MLPT algorithm attains at most $$\frac{4 m-1}{3 m}$$ of the minimum largest sum, which is the same approximation ratio that LPT attains for the unconstrained problem. The bound is tight for MLPT.
 * Chen, He and Lin show that, for the same problem, MLPT attains at least $$\frac{3 m-1}{4 m-2}$$ of the maximum smallest sum, which is again the same ratio that LPT attains for the unconstrained problem.
 * Kellerer and Kotov present a different algorithm (for the case with exactly 3*m items), that attains at most $$7/6$$ of the minimum largest sum.

Balanced partitioning with larger cardinalities
A more general case, called k-partitioning, is when the number of items in each subset should be at most k, where k can be any positive integer..
 * Babel, Kellerer and Kotov study a variant in which there are k×m items (for some integer k), and each of the m sets must contain exactly k items. They present several heuristic algorithms for approximating the minimum largest sum:
 * Folding algorithm: optimal for m = 2, and in general has tight approximation ratio $$2-\frac{1}{m}$$.
 * Exchange algorithm: tight approximation ratio $$2-\frac{2}{m+1}$$. It is not known if it runs in polynomial time.
 * Primal-dual algorithm (a combination of LPT and MultiFit): approximation ratio at most $$4/3$$. It is tight for k = 4 when m is sufficiently large; the precise lower bound is $$\frac{4 m}{3 m+1}$$).
 * The also conjecture that Modified-LPT has an approximation ratio $$\frac{4 m-1}{3 m}$$ . Currently, this conjecture is known to be true only for k = 3. For k > 3, it is known that its approximation ratio is at most 2.
 * Michiels, Korst, Aarts, van Leeuwen and Spieksma study a variant in which each of the m sets must contain either ceiling(n/m) or floor(n/m) items (so k = ceiling(n/m)). They extend the Balanced-LDM (BLDM) from m=2 to general m. The generalized algorithm runs in time $$O(n\log n)$$. They prove that its approximation ratio for the minimum largest sum is exactly 4/3 for k = 3, 19/12 for k = 4, 103/60 for k = 5, 643/360 for k = 6, and 4603/2520 for k = 7. The ratios were found by solving a mixed integer linear program. In general (for any k), the approximation ratio is at least $$2-\sum_{j=0}^{k-1}\frac{j!}{k!}$$ and at most $$2-\frac{1}{k-1}$$. The exact MILP results for 3,4,5,6,7 correspond to the lower bound. For k>7, no exact results are known, but the difference between the lower and upper bound is less than 0.3%.  When the parameter is the number of subsets (m), the approximation ratio is exactly $$2-\frac{1}{m}$$.
 * Zhang, Mouratidis and Pang show that BLDM might yield partitions with a high work-difference (difference between highest and lowest sum), both when the inputs are distributed uniformly, and when their distribution is skewed. They propose two alternative heuristics: LRM reduces the work-difference to 1/3 the work-difference of BLDM when the distribution is uniform; Meld reduces the work-difference when the distribution is skewed. A hybrid algorithm combines BLDM, LRM and Meld and adapts dynamically to different data distributions.
 * When k is fixed, a PTAS of Hochbaum and Shmoys can be used for balanced partitioning. When k is part of the input, no PTAS is currently known.
 * Dell'Amico and Martello study the problem of minimizing the largest sum when the number of items in all sets is at most k. They show that the linear-program relaxation of this variant has the same optimal value as the LP relaxation of the unconstrained variant. The expression $$\max((\sum x_i)/m, x_1)$$, where xi are the inputs ordered from large to small, is a lower bound for the optimal largest-sum, and its worst-case ratio is 1/2 in both variants. The improved expression $$\max((\sum x_i)/m, x_1, x_k+x_{m+1})$$ has worst-case ratio 2/3 in the unconstrained variant and 1/2 in the constrained variant. The approximation ratio of the modified list scheduling is 1/2 for the unconstrained variant, but it is 0 for the constrained variant (it can be arbitrarily bad). The approximation ratio of the modified LPT algorithm is at most 2.  They also show that the lower bound of has a tight worst-case performance ratio of 3/4, and that their PD algorithm has a tight performance ratio of 4/3 (when m is sufficiently large).
 * He, Tan, Zhu and Yao consider the problem of maximizing the smallest sum. They show that the FOLDING algorithm has tight approximation ratio $$\max\left(\frac{2}{k}, \frac{1}{m}\right)$$. They present a new algorithm, HARMONIC1, with worst-case ratio at least $$\max\left(\frac{1}{k}, \frac{1}{\lceil \sum_{i=1}^m \frac{1}{i}\rceil+1}\right)$$. Both these algorithms are ordinal – they partition the items based only on the order between them rather than their exact values. They prove that any ordinal algorithm has ratio at most $$O(1/\ln{m})$$for maximizing the smallest sum. This indicates that HARMONIC1 is asymptotically optimal. For any fixed k, any ordinal algorithm has ratio at most the smallest root of the equation $$\sum_{i=1}^m \left\lfloor\left\lfloor \frac{k+i-1}{i}\right\rfloor x \right\rfloor = k$$. When k tends to infinity, this upper bound approaches 0.

Relations between balanced and unconstrained problems
There are some general relations between approximations to the balanced partition problem and the standard (unconstrained) partition problem.


 * Babel, Kellerer and Kotov prove that the ratio between the unconstrained optimum and the constrained optimum is at most $$2-\frac{2}{m}$$, and it is tight.
 * Kellerer and Kotov prove that every heuristic for balanced partitioning with capacity k and approximation-ratio r (for the minimum largest sum) can be employed to obtain a heuristic for unconstrained partitioning with approximation-ratio $$\max\left(r, \frac{k+2}{k+1}-\frac{1}{m(k+1)}\right)$$. In particular, their $$7/6$$-approximation algorithm for triplet partitioning (k = 3) can be used to obtain a heuristic for unconstrained partitioning with approximation-ratio $$\max\left(\frac{7}{6}, \frac{5}{4}-\frac{1}{4 m}\right)$$.

Different cardinality constraints
The cardinality constraints can be generalized by allowing a different constraint on each subset. This variant is introduced in the "open problems" section of, who call the ki-partitioning problem. He, Tan, Zhu and Yao present an algorithm called HARMONIC2 for maximizing the smallest sum with different cardinality constraints. They prove that its worst-case ratio is at least $$\max\left(\frac{1}{k_m}, \frac{k_1}{k_m}\frac{1}{\left\lceil \sum_{i=1}^m \frac{1}{i}\right\rceil+1}\right)$$.

Categorized cardinality constraints
Another generalization of cardinality constraints is as follows. The input items are partitioned into k categories. For each category h, there is a capacity constraint kh. Each of the m subsets may contain at most kh items from category h. In other words: all m subsets should be independent set of a particular partition matroid. Two special cases of this problem have been studied.

Kernel partitioning
In the kernel balanced-partitioning problem, some m pre-specified items are kernels, and each of the m subsets must contain a single kernel (and an unlimited number of non-kernels). Here, there are two categories: the kernel category with capacity 1, and the non-kernel category with unlimited capacity.
 * Chen, He and Yao prove that the problem is NP-hard even for k = 3 (for k = 2 it can be solved efficiently by finding a maximum weight matching). They then present an algorithm called Kernel-LPT (KLPT): it assigns a kernel to each subset, and then runs the modified LPT algorithm (puts each item into the subset with the smallest sum among those that have fewer than k items). They prove that, with k = 3, KLPT has an approximation ratio $$\frac{4 m-1}{3 m}$$ for the minimum largest sum. However, Chen, He and Lin  claim that its tight approximation ratio is $$\frac{3 m-1}{2 m}$$ for the minimum largest sum, and $$\frac{2 m-1}{3 m-2}$$ for the maximum smallest sum.

One-per-category partitioning
In another variant of this problem, there are some k categories of size m, and each subset should contain exactly one item from each category. That is, kh = 1 for each category h.


 * Wu and Yao presented the layered LPT algorithm – a variant of the LPT algorithm. They prove that its approximation ratio is $$2-\frac{1}{m}$$ for minimizing the largest sum; $$\frac{1}{m}$$ for maximizing the smallest sum in the general case; and in some special cases, can be improved to $$\frac{m}{2 m-1}$$ for general k and $$\frac{m-1}{2 m-3}$$ for k = 3.
 * Li and Li presented different algorithms for the same problem. For minimizing the largest sum, they present an EPTAS for constant k, and FPTAS for constant m. For maximizing the smallest sum, they present a 1/(k − 1) approximation algorithm for the general case, and an EPTAS for constant k. They also study a more general objective: minimizing the lp-norm of the vector of sums. They prove that the layered-LPT algorithm is a 2-approximation algorithm for all norms.
 * Dell'Olmo, Hansen, Pallottino and Storchi study 32 different objectives for this problem. For each of the four operators max, min, sum, diff, one operator can be applied to the k items inside each subset, and then one operator can be applied to the m results for the different subsets. Each of these 16 objectives can be either maximized or minimized, for a total of 32. They show that 21 of these problems can be solved in linear time; 7 require more complex, but still polynomial-time, algorithms; 3 are NP-hard: maximizing (min, sum), minimizing (max, sum) and minimizing (diff, sum). They left open the status of minimizing (diff, diff).