Correlation gap

In stochastic programming, the correlation gap is the worst-case ratio between the cost when the random variables are correlated to the cost when the random variables are independent.

As an example, consider the following optimization problem. A teacher wants to know whether to come to class or not. There are n potential students. For each student, there is a probability of 1/n that the student will attend the class. If at least one student attends, then the teacher must come and his cost is 1. If no students attend, then the teacher can stay at home and his cost is 0. The goal of the teacher is to minimize his cost. This is a stochastic-programming problem, because the constraints are not known in advance – only their probabilities are known. Now, there are two cases regarding the correlation between the students: The correlation gap is the cost in case #2 divided by the cost in case #1, which is $$e/(e-1)$$.
 * Case #1: the students are uncorrelated: each student decides whether to come to class or not by tossing a coin with probability $$1/n$$, independently of the others. The expected cost in this case is $$1-(1-1/n)^n \approx 1-1/e$$.
 * Case #2: the students are correlated: one student is selected at random and comes to class, while the others stay at home. Note that the probability of each student to come is still $$1/n$$. However, now the cost is 1.

prove that the correlation gap is bounded in several cases. For example, when the cost function is a submodular set function (as in the above example), the correlation gap is at most $$e/(e-1)$$ (so the above example is a worst-case).

An upper bound on the correlation gap implies an upper bound on the loss that results from ignoring the correlation. For example, suppose we have a stochastic programming problem with a submodular cost function. We know the marginal probabilities of the variables, but we do not know whether they are correlated or not. If we just ignore the correlation and solve the problem as if the variables are independent, the resulting solution is a $$(e-1)/e$$-approximation to the optimal solution.

Applications
The correlation gap was used to bound the loss of revenue when using a Bayesian-optimal pricing instead of a Bayesian-optimal auction.