User:SonnaGab



In mathematics, Monte Carlo integration is numerical integration using random numbers. That is, Monte Carlo integration methods are algorithms for the approximate evaluation of definite integrals, usually multidimensional ones. The usual algorithms evaluate the integrand at a regular grid. Monte Carlo methods, however, randomly choose the points at which the integrand is evaluated.

Informally, to estimate the area of a domain D, first pick a simple domain E whose area is easily calculated and which contains D. Now pick a sequence of random points that fall within E.  Some fraction of these points will also fall within D.  The area of D is then estimated as this fraction multiplied by the area of E.

The traditional Monte Carlo algorithm distributes the evaluation points uniformly over the integration region. Adaptive algorithms such as VEGAS and MISER use importance sampling and stratified sampling techniques to get a better result.

Plain Monte Carlo integration algorithm
The plain Monte Carlo integration algorithm computes an estimate of a multidimensional definite integral of the form,


 * $$I = \int_{a_1}^{b_1}dx_1\int_{a_2}^{b_2}dx_2\dots\int_{a_n}^{b_n}dx_n \,

f(x_1, x_2, \dots, x_n) \equiv \int_{V}f(\bar{\mathbf{x}}) \, d\bar{\mathbf{x}}$$

where $$\bar{\mathbf{x}}=\{x_1,\dots,x_n\}$$ and the hypercube $$V$$ is the region of integration,
 * $$\{ \bar{\mathbf{x}} ; a_1\le x_1\le b_1, a_2\le x_2\le b_2,\dots, a_n\le x_n\le b_n\}$$.

But below we shall use $$V$$ to denote the measure of this region.

The algorithm samples points uniformly from the integration region to estimate the integral and its error. Suppose that the sample has size $$N$$ and denote the points in the sample by $$\bar{\mathbf{x}}_1, \dots, \bar{\mathbf{x}}_N$$. Then the estimate for the integral is given by


 * $$ I \approx Q_N \equiv V\frac{1}{N} \sum_{i=1}^N f(\bar{\mathbf{x}}_i) = V \langle f \rangle $$,

where $$\langle f \rangle$$ denotes the sample mean of the integrand. $$ I = \lim_{N \to \infty} Q_N $$ follows from the fact that $$ \{ \bar{\mathbf{x}}_1, \bar{\mathbf{x}}_2, \bar{\mathbf{x}}_3, \ldots \} $$ is an equidistributed sequence in the integration region (ignoring implementation issues such as pseudorandom number generators and limited floating point precision).

The sample variance of the integrand can be estimated using
 * $$ \mathrm{Var}(f)\equiv\sigma_N^2 = \frac{1}{N-1} \sum_{i=1}^N (f(\bar{\mathbf{x}}_i) - \langle f \rangle)^2, $$

where $$N-1$$ is used instead of $$N$$ in order to get an unbiased estimate of the variance.

Because the following is valid for any independent stochastic variables $$Y_i$$,
 * $$\mathrm{Var}\left(\sum_{i=1}^N Y_i\right)=\sum_{i=1}^N \mathrm{Var}(Y_i)$$,

and because, for a constant $$ a $$ the variance has the following property,
 * $$\mathrm{Var}(a f) = a^2 \mathrm{Var}(f)$$,

then the variance of the estimate of the integral is
 * $$ \mathrm{Var}(Q_N) = \frac{V^2}{N^2} \sum_{i=1}^N \mathrm{Var}(f)  =V^2 \frac{\mathrm{Var}(f)}{N} = V^2\frac{\sigma_N^2}{N}$$.

As long as the sequence $$ \{ \sigma_1^2, \sigma_2^2, \sigma_3^2, \ldots \} $$ is bounded, this variance decreases asymptotically to zero as $$\frac{1}{N}$$. The error estimate,
 * $$\delta Q_N\approx\sqrt{\mathrm{Var}(Q_N)}=V\frac{\sigma_N}{\sqrt{N}},$$

decreases as $$1/\sqrt{N}$$. The familiar law of random walk applies: to reduce the error by a factor of 10 requires a 100-fold increase in the number of sample points.

The above expression provides a statistical estimate of the error on the result. This error estimate is not a strict error bound &mdash; random sampling of the region may not uncover all the important features of the function, resulting in an underestimate of the error.

Recursive stratified sampling
Recursive stratified sampling is a generalization of one-dimensional adaptive quadratures to multi-dimensional integrals. On each recursion step the integral and the error are estimated using a plain Monte Carlo algorithm. If the error estimate is larger than the required accuracy the integration volume is divided into sub-volumes and the procedure is recursively applied to sub-volumes.

The ordinary 'dividing by two' strategy does not work for multi-dimensions as the number of sub-volumes grows way too fast to keep track of. Instead one estimates along which dimension a subdivision should bring the most dividends and only subdivides the volume along this dimension.

Here is a typical algorithm for recursive stratified sampling:

Sample $$N$$ random points; Estimate the average and the error; If the error is acceptable : Return the average and the error; Else : For each dimension : Subdivide the volume in two along the dimension; Estimate the sub-averages in the two sub-volumes; Pick the dimension with the largest sub-average; Subdivide the volume in two along this dimension; Dispatch two recursive calls to each of the sub-volumes; Estimate the grand average and grand variance; Return the grand average and grand variance;

The stratified sampling algorithm concentrates the sampling points in the regions where the variance of the function is largest thus reducing the grand variance and making the sampling more effective, as shown on the illustration.

The points for the illustration have been generated by the following JavaScript-1.8 implementation of the above algorithm,

The popular MISER routine implements a similar algorithm.

MISER Monte Carlo
The MISER algorithm of Press and Farrar is based on recursive stratified sampling. This technique aims to reduce the overall integration error by concentrating integration points in the regions of highest variance.

The idea of stratified sampling begins with the observation that for two disjoint regions a and b with Monte Carlo estimates of the integral $$E_a(f)$$ and $$E_b(f)$$ and variances $$\sigma_a^2(f)$$ and $$\sigma_b^2(f)$$, the variance $$Var(f)$$ of the combined estimate $$E(f) = (1/2) (E_a(f) + E_b(f))$$ is given by,


 * $$\mathrm{Var}(f) = (\sigma_a^2(f) / 4 N_a) + (\sigma_b^2(f) / 4 N_b)$$

It can be shown that this variance is minimized by distributing the points such that,


 * $$N_a / (N_a + N_b) = \sigma_a / (\sigma_a + \sigma_b)$$

Hence the smallest error estimate is obtained by allocating sample points in proportion to the standard deviation of the function in each sub-region.

The MISER algorithm proceeds by bisecting the integration region along one coordinate axis to give two sub-regions at each step. The direction is chosen by examining all d possible bisections and selecting the one which will minimize the combined variance of the two sub-regions. The variance in the sub-regions is estimated by sampling with a fraction of the total number of points available to the current step. The same procedure is then repeated recursively for each of the two half-spaces from the best bisection. The remaining sample points are allocated to the sub-regions using the formula for N_a and N_b. This recursive allocation of integration points continues down to a user-specified depth where each sub-region is integrated using a plain Monte Carlo estimate. These individual values and their error estimates are then combined upwards to give an overall result and an estimate of its error.

This routines uses the MISER Monte Carlo algorithm to integrate the function f over the dim-dimensional hypercubic region defined by the lower and upper limits in the arrays xl and xu, each of size dim. The integration uses a fixed number of function calls, and obtains random sampling points using the random number generator r. A previously allocated workspace s must be supplied. The result of the integration is returned in result, with an estimated absolute error abserr.

Configurable Parameters
The MISER algorithm has several configurable parameters.

estimate_frac
This parameter specifies the fraction of the currently available number of function calls which are allocated to estimating the variance at each recursive step. In the GNU Scientific Library's (GSL) implementation, the default value is 0.1.

min_calls
This parameter specifies the minimum number of function calls required for each estimate of the variance. If the number of function calls allocated to the estimate using estimate_frac falls below min_calls then min_calls are used instead. This ensures that each estimate maintains a reasonable level of accuracy. In the GNU Scientific Library's implementation, the default value of min_calls is 16 * dim.

min_calls_per_bisection
This parameter specifies the minimum number of function calls required to proceed with a bisection step. When a recursive step has fewer calls available than min_calls_per_bisection it performs a plain Monte Carlo estimate of the current sub-region and terminates its branch of the recursion. In the GNU Scientific Library's implementation, the default value of this parameter is 32 * min_calls.

alpha
This parameter controls how the estimated variances for the two sub-regions of a bisection are combined when allocating points. With recursive sampling the overall variance should scale better than 1/N, since the values from the sub-regions will be obtained using a procedure which explicitly minimizes their variance. To accommodate this behavior the MISER algorithm allows the total variance to depend on a scaling parameter \alpha,


 * $$\mathrm{Var}(f) = {\sigma_a \over N_a^\alpha} + {\sigma_b \over N_b^\alpha}$$

The authors of the original paper describing MISER recommend the value $$\alpha = 2$$ as a good choice, obtained from numerical experiments, and this is used as the default value in the GNU Scientific Library's implementation.

dither
This parameter introduces a random fractional variation of size dither into each bisection, which can be used to break the symmetry of integrands which are concentrated near the exact center of the hypercubic integration region. In the GNU Scientific Library's implementation, the default value of dither is zero, so no variation is introduced. If needed, a typical value of dither is around 0.1.

VEGAS Monte Carlo
The VEGAS algorithm of G.P.Lepage is based on importance sampling. It samples points from the probability distribution described by the function $$|f|$$, so that the points are concentrated in the regions that make the largest contribution to the integral.

In general, if the Monte Carlo integral of $$f$$ is sampled with points distributed according to a probability distribution described by the function $$g$$, we obtain an estimate $$E_g(f; N)$$,

$$E_g(f; N) = E(f/g; N)$$

with a corresponding variance,

$$Var_g(f; N) = Var(f/g; N)$$

If the probability distribution is chosen as $$g = |f|/I(|f|)$$ then it can be shown that the variance $$V_g(f; N)$$ vanishes, and the error in the estimate will be zero. In practice it is not possible to sample from the exact distribution $$g$$ for an arbitrary function, so importance sampling algorithms aim to produce efficient approximations to the desired distribution.

The VEGAS algorithm approximates the exact distribution by making a number of passes over the integration region while histogramming the function $$f$$. Each histogram is used to define a sampling distribution for the next pass. Asymptotically this procedure converges to the desired distribution. In order to avoid the number of histogram bins growing like $$K^d$$ the probability distribution is approximated by a separable function: $$g(x_1, x_2, \ldots) = g_1(x_1) g_2(x_2) \ldots$$ so that the number of bins required is only $$Kd$$. This is equivalent to locating the peaks of the function from the projections of the integrand onto the coordinate axes. The efficiency of VEGAS depends on the validity of this assumption. It is most efficient when the peaks of the integrand are well-localized. If an integrand can be rewritten in a form which is approximately separable this will increase the efficiency of integration with VEGAS.

VEGAS incorporates a number of additional features, and combines both stratified sampling and importance sampling. The integration region is divided into a number of "boxes", with each box getting a fixed number of points (the goal is 2). Each box can then have a fractional number of bins, but if bins/box is less than two, Vegas switches to a kind variance reduction (rather than importance sampling).

This routines uses the VEGAS Monte Carlo algorithm to integrate the function $$f$$ over the dim-dimensional hypercubic region defined by the lower and upper limits in the arrays $$xl$$ and $$xu$$, each of size $$dim$$. The integration uses a fixed number of function calls, and obtains random sampling points using the random number generator $$r$$. A previously allocated workspace $$s$$ must be supplied. The result of the integration is returned in $$result$$, with an estimated absolute error $$abserr$$. The result and its error estimate are based on a weighted average of independent samples. The chi-squared per degree of freedom for the weighted average is returned via the state struct component, $$s\to chisq$$, and must be consistent with 1 for the weighted average to be reliable.

The VEGAS algorithm computes a number of independent estimates of the integral internally, according to the iterations parameter described below, and returns their weighted average. Random sampling of the integrand can occasionally produce an estimate where the error is zero, particularly if the function is constant in some regions. An estimate with zero error causes the weighted average to break down and must be handled separately. In the original Fortran implementations of VEGAS the error estimate is made non-zero by substituting a small value (typically 1e-30). The implementation in GSL differs from this and avoids the use of an arbitrary constant -- it either assigns the value a weight which is the average weight of the preceding estimates, or discards it according to the following procedure:


 * Current estimate has zero error, weighted average has finite error The current estimate is assigned a weight which is the average weight of the preceding estimates.
 * Current estimate has finite error, previous estimates had zero error The previous estimates are discarded and the weighted averaging procedure begins with the current estimate.
 * Current estimate has zero error, previous estimates had zero error The estimates are averaged using the arithmetic mean, but no error is computed.

Configurable Parameters
The VEGAS algorithm is configurable.

chisq
This parameter gives the chi-squared per degree of freedom for the weighted estimate of the integral. The value of chisq should be close to 1. A value of chisq which differs significantly from 1 indicates that the values from different iterations are inconsistent. In this case the weighted error will be under-estimated, and further iterations of the algorithm are needed to obtain reliable results.

alpha
The parameter alpha controls the stiffness of the rebinning algorithm. It is typically set between one and two. A value of zero prevents rebinning of the grid. In the GNU Scientific Library's implementation, the default value is 1.5.

iterations
The number of iterations to perform for each call to the routine. In the GNU Scientific Library's implementation, the default value is 5 iterations.

stage
Setting this determines the stage of the calculation. Normally, stage = 0 which begins with a new uniform grid and empty weighted average. Calling vegas with stage = 1 retains the grid from the previous run but discards the weighted average, so that one can "tune" the grid using a relatively small number of points and then do a large run with stage = 1 on the optimized grid. Setting stage = 2 keeps the grid and the weighted average from the previous run, but may increase (or decrease) the number of histogram bins in the grid depending on the number of calls available. Choosing stage = 3 enters at the main loop, so that nothing is changed, and is equivalent to performing additional iterations in a previous call.

mode
The possible choices are GSL_VEGAS_MODE_IMPORTANCE, GSL_VEGAS_MODE_STRATIFIED, GSL_VEGAS_MODE_IMPORTANCE_ONLY. This determines whether VEGAS will use importance sampling or stratified sampling, or whether it can pick on its own. In low dimensions VEGAS uses strict stratified sampling (more precisely, stratified sampling is chosen if there are fewer than 2 bins per box).

References and further reading
The following reference about Monte Carlo and quasi-Monte Carlo methods in general (with a description of the variance reduction techniques) is excellent to start with: Nice survey on arXiv, based on lecture for graduate students in high energy physics: The MISER algorithm is described in the following article, The VEGAS algorithm is described in the following papers, Early works: A general discussion, including both MISER and VEGAS and comparing them, is at
 * R. E. Caflisch, Monte Carlo and quasi-Monte Carlo methods, Acta Numerica vol. 7, Cambridge University Press, 1998, pp. 1-49.
 * S. Weinzierl, Introduction to Monte Carlo methods,
 * W.H. Press, G.R. Farrar, Recursive Stratified Sampling for Multidimensional Monte Carlo Integration, Computers in Physics, v4 (1990), pp190-195.
 * G.P. Lepage, A New Algorithm for Adaptive Multidimensional Integration, Journal of Computational Physics 27, 192-203, (1978)
 * G.P. Lepage, VEGAS: An Adaptive Multi-dimensional Integration Program, Cornell preprint CLNS 80-447, March 1980
 * J. M. Hammersley, D.C. Handscomb (1964) Monte Carlo Methods. Methuen. ISBN 0-416-52340-4

''Based on the GNU Scientific Library's manual, which is published under the GFDL (and hence free to use for Wikipedia). Original available here.''