Quadratic unconstrained binary optimization

Quadratic unconstrained binary optimization (QUBO), also known as unconstrained binary quadratic programming (UBQP), is a combinatorial optimization problem with a wide range of applications from finance and economics to machine learning. QUBO is an NP hard problem, and for many classical problems from theoretical computer science, like maximum cut, graph coloring and the partition problem, embeddings into QUBO have been formulated. Embeddings for machine learning models include support-vector machines, clustering and probabilistic graphical models. Moreover, due to its close connection to Ising models, QUBO constitutes a central problem class for adiabatic quantum computation, where it is solved through a physical process called quantum annealing.

Definition
The set of binary vectors of a fixed length $$n>0$$ is denoted by $$\mathbb{B}^n$$, where $$\mathbb{B}=\lbrace 0,1\rbrace$$ is the set of binary values (or bits). We are given a real-valued upper triangular matrix $$Q\in\mathbb{R}^{n\times n}$$, whose entries $$Q_{ij}$$ define a weight for each pair of indices $$i,j\in\lbrace 1,\dots,n\rbrace$$ within the binary vector. We can define a function $$f_Q: \mathbb{B}^n\rightarrow\mathbb{R}$$ that assigns a value to each binary vector through
 * $$f_Q(x) = x^\top Qx = \sum_{i=1}^n \sum_{j=i}^n Q_{ij} x_i x_j$$

Intuitively, the weight $$Q_{ij}$$ is added if both $$x_i$$ and $$x_j$$ have value 1. When $$i=j$$, the values $$Q_{ii}$$ are added if $$x_i=1$$, as $$x_ix_i=x_i$$ for all $$x_i\in\mathbb{B}$$.

The QUBO problem consists of finding a binary vector $$x^*$$ that is minimal with respect to $$f_Q$$, namely
 * $$\forall x\in\mathbb{B}^n: ~f_Q(x^*)\leq f_Q(x)$$

In general, $$x^*$$ is not unique, meaning there may be a set of minimizing vectors with equal value w.r.t. $$f_Q$$. The complexity of QUBO arises from the number of candidate binary vectors to be evaluated, as $$|\mathbb{B}^n|=2^n$$ grows exponentially in $$n$$.

Sometimes, QUBO is defined as the problem of maximizing $$f_Q$$, which is equivalent to minimizing $$f_{-Q}=-f_Q$$.

Properties
QUBO is scale invariant for positive factors $$\alpha>0$$, which leave the optimum $$x^*$$ unchanged:
 * $$f_{\alpha Q}(x) = \sum_{i\leq j}(\alpha Q_{ij})x_ix_j = \alpha\sum_{i\leq j}Q_{ij}x_ix_j = \alpha f_Q(x)$$

In its general form, QUBO is NP-hard and cannot be solved efficiently by any polynomial-time algorithm. However, there are polynomially-solvable special cases, where $$Q$$ has certain properties, for example:


 * If all coefficients are positive, the optimum is trivially $$x^*=(0,\dots,0)$$. Similarly, if all coefficients are negative, the optimum is $$x^*=(1,\dots,1)$$.
 * If $$Q$$ is diagonal, the bits can be optimized independently, and the problem is solvable in $$\mathcal{O}(n)$$. The optimal variable assignments are simply $$x^*_i=1$$ if $$Q_{ii}<0$$, and $$x^*_i=0$$ otherwise.
 * If all off-diagonal elements of $$Q$$ are non-positive, the corresponding QUBO problem is solvable in polynomial time.

QUBO can be solved using integer linear programming solvers like CPLEX or Gurobi Optimizer. This is possible since QUBO can be reformulated as a linear constrained binary optimization problem. To achieve this, substitute the product $$x_ix_j$$ by an additional binary variable $$z_{ij}\in\{0,1\}$$ and add the constraints $$x_i\ge z_{ij}$$, $$x_j\ge z_{ij}$$ and $$x_i+x_j-1\le z_{ij}$$. Note that $$z_{ij}$$ can also be relaxed to continuous variables within the bounds zero and one.

Applications
QUBO is a structurally simple, yet computationally hard optimization problem. It can be used to encode a wide range of optimization problems from various scientific areas.

Cluster Analysis
As an illustrative example of how QUBO can be used to encode an optimization problem, we consider the problem of cluster analysis. Here, we are given a set of 20 points in 2D space, described by a matrix $$D\in\mathbb{R}^{20\times 2}$$, where each row contains two cartesian coordinates. We want to assign each point to one of two classes or clusters, such that points in the same cluster are similar to each other. For two clusters, we can assign a binary variable $$x_i\in\mathbb{B}$$ to the point corresponding to the $$i$$-th row in $$D$$, indicating whether it belongs to the first ($$x_i=0$$) or second cluster ($$x_i=1$$). Consequently, we have 20 binary variables, which form a binary vector $$x\in\mathbb{B}^{20}$$ that corresponds to a cluster assignment of all points (see figure).

One way to derive a clustering is to consider the pairwise distances between points. Given a cluster assignment $$x$$, one of $$x_ix_j$$ or $$(1-x_i)(1-x_j)$$ evaluates to 1 if points $$i$$ and $$j$$ are in the same cluster. Similarly, one of $$x_i(1-x_j)$$ or $$(1-x_i)x_j$$ indicates that they are in different clusters. Let $$d_{ij}\geq 0$$ denote the Euclidean distance between points $$i$$ and $$j$$. In order to define a cost function to minimize, when points $$i$$ and $$j$$ are in the same cluster we add their positive distance $$d_{ij}$$, and subtract it when they are in different clusters. This way, an optimal solution tends to place points which are far apart into different clusters, and points that are close into the same cluster. The cost function thus comes down to
 * $$\begin{align}

f(x) &= \sum_{i<j}d_{ij}(x_ix_j + (1-x_i)(1-x_j))-d_{ij}(x_i(1-x_j)+(1-x_i)x_j) \\ &= \sum_{i<j}\left[4d_{ij}x_ix_j-2d_{ij}x_i-2d_{ij}x_j+d_{ij}\right] \end{align}$$

From the second line, the QUBO parameters can be easily found by re-arranging to be:
 * $$\begin{align}

Q_{ij} &= \begin{cases} d_{ij} &\text{if } i\neq j \\ -\left(\sum\limits_{k=1}^{i-1} d_{ki} +\sum\limits_{\ell=i+1}^n d_{i\ell}\right)&\text{if } i=j \end{cases} \end{align}$$

Using these parameters, the optimal QUBO solution will correspond to an optimal cluster w.r.t. above cost function.

Connection to Ising models
QUBO is very closely related and computationally equivalent to the Ising model, whose Hamiltonian function is defined as
 * $$H(\sigma) = -\sum_{\langle i~j\rangle} J_{ij} \sigma_i \sigma_j - \mu \sum_j h_j \sigma_j$$

with real-valued parameters $$h_j, J_{ij}, \mu$$ for all $$i,j$$. The spin variables $$\sigma_j$$ are binary with values from $$\lbrace -1,+1\rbrace$$ instead of $$\mathbb{B}$$. Moreover, in the Ising model the variables are typically arranged in a lattice where only neighboring pairs of variables $$\langle i~j\rangle$$ can have non-zero coefficients. Applying the identity $$\sigma\mapsto 2x-1$$ yields an equivalent QUBO problem:
 * $$\begin{align}f(x) &= \sum_{\langle i~j\rangle} -J_{ij}(2x_i-1)(2x_j-1) +\sum_{j}\mu h_j(2x_j-1) \\

&= \sum_{\langle i~j\rangle} (-4J_{ij}x_ix_j +2J_{ij}x_i +2J_{ij}x_j -J_{ij}) +\sum_{j}(2\mu h_jx_j-\mu h_j) &&\text{using } x_j=x_jx_j\\ &= \sum_{\langle i~j\rangle} (-4J_{ij}x_ix_j) + \sum_{\langle i~j\rangle}2J_{ij}x_i + \sum_{\langle i~j\rangle}2J_{ij}x_j +\sum_{j}2\mu h_jx_j -\sum_{\langle i~j\rangle}J_{ij} -\sum_{j}\mu h_j\\ &= \sum_{\langle i~j\rangle} (-4J_{ij}x_ix_j) + \sum_{\langle j~i\rangle}2J_{ji}x_j + \sum_{\langle i~j\rangle}2J_{ij}x_j +\sum_{j}2\mu h_jx_j -\sum_{\langle i~j\rangle}J_{ij} -\sum_{j}\mu h_j &&\text{using } \sum_{\langle i~j\rangle}=\sum_{\langle j~i\rangle}\\ &= \sum_{\langle i~j\rangle} (-4J_{ij}x_ix_j) + \sum_j\sum_{\langle k=j~i\rangle}2J_{ki}x_j + \sum_j\sum_{\langle i~k=j\rangle}2J_{ik}x_j +\sum_{j}2\mu h_jx_j -\sum_{\langle i~j\rangle}J_{ij} -\sum_{j}\mu h_j\\ &= \sum_{\langle i~j\rangle} (-4J_{ij}x_ix_j) + \sum_j \left(\sum_{\langle i~k=j\rangle}(2J_{ki} + 2J_{ik}) + 2\mu h_j \right)x_j -\sum_{\langle i~j\rangle}J_{ij} -\sum_{j}\mu h_j &&\text{using } \sum_{\langle k=j~i\rangle}=\sum_{\langle i~k=j\rangle}\\ &= \sum_{i=1}^n\sum_{j=1}^i Q_{ij}x_ix_j + C\end{align}$$ where
 * $$\begin{align}Q_{ij} &= \begin{cases}-4J_{ij} &\text{if } i\neq j \\

\sum_{\langle i~k=j\rangle}(2J_{ki} + 2J_{ik}) + 2\mu h_j &\text{if } i=j\end{cases} \\ C &= -\sum_{\langle i~j\rangle}J_{ij} -\sum_{j}\mu h_j\end{align}$$ and using the fact that for a binary variable $$ x_j = x_j x_j $$.

As the constant $$C$$ does not change the position of the optimum $$x^*$$, it can be neglected during optimization and is only important for recovering the original Hamiltonian function value.