Talk:Importance sampling

On January 21st, 2006, this article was expanded by Izar(199.111.224.202; I was logged out during editing). I am a 4th year undergraduate student in electrical engineering major, and my concentraion is communications and signal processing. As an undergraduate researcher, I am interested in information theory and coding theory.

Before I expand this article, it was a mathematics stub, so I added some sections. However, it still can be expanded in a lot of aspects. For example, for the biasing methods, there are many other kinds of them such as 'exponential twisting', so I think those can be explained briefly or with some details. Or, some applications using this importance sampling technique may be discussed in a different section. Izar 05:06, 21 January 2006 (UTC)

Introduction
I don't see why importance sampling is a variance reduction technique in MC estimation. It a technique for estimating E[X|A] when all you have is E[X|B]. If I remember correctly, it is the 'weighted importance sampling' techniques that have reduced variance compared to the standard 'importance sampling' technique, at the expense of becoming biased. --Olethros 16:15, 9 February 2006 (UTC)

Mathematical approach
Why is this talking about the binomial distribution and event (X>t)? Just talking about the expectation of a general random variable would have been much simpler and much more general. The X>t treatment is confusing and overlong.--Olethros 16:15, 9 February 2006 (UTC)


 * I totally agree! The edits by 199.111.224.202 made the article very difficult to understand. Someone should really perform some cleanup here. --Fredrik Orderud 17:51, 9 February 2006 (UTC)

What's $$q$$?
This has been unfixed for more than a year, so I have simply commented out the offending paragraph which mentions q. Better that it not be there than that it be gibberish.

--Rpgoldman (talk) 19:56, 2 April 2010 (UTC)

Rewrite
I my opinion the article here is fundamentally flawed. If X is the outcome of some experiment, I don't have f(X) or F(x), so I can't possibly calculate W(x). I only have f(x) for the input parameters of the simulation. So there has to be a distinction being made between the distributions of the input parameters and the distribution of the simulation result. Furthermore, as was already pointed out before, the restriction to E[X>t] is unnecessary, and that there is some binomial distribution for this event is true, but completely off topic. What I would propose for a rewrite is along the following lines:

Let $$S(Z)$$ be some simulation depending on some input parameters $$Z$$ where $$Z$$ itself is a random variable with some known (and favorably analytical) distribution $$f_Z(z)$$. The problem is to determine some estimate for the expected value of a function of the solution e.g. $$E[\phi(S)]$$, where e.g. $$\phi$$ could be something like $$\phi(S)=[\alpha\leq S]$$ (which would correspond to the current version of the article i.e. $$E[S>t]$$). Now we have


 * $$E[\phi(S)]=\int \phi(s) dF_S(s)=\int \phi(s) f_S(s) ds$$


 * $$=\int \phi(S(z)) f_Z(z) dz=E[\phi(S(Z))]$$


 * $$\approx \frac{1}{N}\sum_{i=1}^N \phi(S(z_i)) $$

where the $$z_i$$ are i.i.d. according to $$f_Z$$.

Now we rewrite this to


 * $$E[\phi(S)]=\int \phi(S(z^*)) \frac{f_Z(z^*)}{f_{Z^*}(z^*)} f_{Z^*}(z^*)dz^*=E[\phi(S(Z^*))W(Z^*)]$$


 * $$\approx \frac{1}{N} \sum_{i=1}^N \phi(S(z^*_i)) W(z^*_i)$$

with $$W(Z^*)=\frac{f_Z(z^*)}{f_{Z^*}(z^*)}$$ and the $$z_i^*$$ i.i.d. according to $$f_{Z^*}$$.

Example: Let $$S(Z)=Z^2$$ be the simulation outcome (ok, a bit simple for a simulation, but its just an example, right?), the distribution of input values uniform on $$[0,1]$$ (i.e. $$Z \sim U[0,1]$$ and the function of interest $$\phi_\alpha(S)=\Chi_{[\alpha,\infty]}(S)$$. Then the normal estimator for $$E[\phi(S)]$$ would be


 * $$\hat E[\phi_\alpha(S)]= \frac{1}{N}\sum_{i=1}^N [z_i^2\geq \alpha] $$

Where the $$n_i$$ are $$U[0,1]$$ distributed and $$[b]$$ is 1 if b is true and 0 otherwise (see Knuth: A short note on notation). Taking for $$Z^*$$ a normal distribution $$N(\mu,\sigma^2)$$ (with $$\mu=(1+\sqrt{\alpha}))/2$$ and $$\sigma=0.3*\sqrt{1-\sqrt{\alpha}}$$ giving quite good results) we get


 * $$\hat E[\phi_\alpha(S)]= \frac{1}{N}\sum_{i=1}^N [n_i^2\geq \alpha] \frac{[0\leq n_i \leq 1]}{\frac1{\sqrt{2\pi}\sigma}e^{-(n_i-\mu)^2/2\sigma^2}} $$

and the $$n_i$$ are $$N(\mu,\sigma)$$ distributed.

Some Matlab code illustrating the example (should run with Octave too)

Prints:

134.169.77.186 14:46, 8 March 2007 (UTC) (ezander)

OK, I have re-written it, basically adding a short introduction plus the basic theory on top. The material I added was verified from the IS lecture notes which I linked to, and the 'Monte Carlo Methods In Practice Book'.

The rest of the stuff that was there before can be thought of as an application, so I left it in. In particular it was an application to simulation. I guess the binomial thing made sense in the context of an 'event simulator', but it does not really help anywhere else. Someone more knowledgeable with the application of MC and IS methods to simulation should try and fix those sections, or perhaps they could be moved to another article.

I currently don't have time to update the probabilistic inference section, but all that'd be required would be some links to other sections. I might do that eventually if nobody else does it. --Olethros 17:37, 21 May 2007 (UTC)

Why unweighted average for importance sampling estimator ?
In the "Mathematical approach" section, why is an straight arithmetic average used for $$\hat p_t$$? This is supposed to be an estimation of an expected value, correct? Normally, wouldn't this be, for instance:
 * $$\hat{\operatorname{E}} [x] = \sum_{x} x \operatorname{p}(x) $$

So I would expect the estimator to be:
 * $$ \hat p_t = \sum_{i=1}^K 1(X_i \ge t) W(X_i) f_*(X_i),\,\quad \quad X_i \sim f_*$$

 B. Mearns * , KSC 19:10, 5 December 2012 (UTC)

So this makes me think that $$f_*$$ is a uniform distribution, but that's not the case, is it?
 * Ok, so this is just Monte Carlo estimation: the expected value of $$1(X_i \ge t) W(X_i)$$ is equal to $$p_t$$, so we estimate that expected value simply by averaging lots of values of the random variable (the random variable being $$1(X_i \ge t) W(X_i)$$). Since we're drawing values $$X_i$$ according to the $$f_*$$, values for $$X_i$$ that are more likely according to $$f_*$$ should show up more often, and so they will tend to naturally be more heavily weighted in the average.


 * In other words, the expected value is really defined as :
 * $$\operatorname{E} [x] = \sum_{x \in X} x \operatorname{p}(x)$$
 * But instead of using the probability of each item in X, we're using the sampled frequency of each item in X which, for sufficient number of samples, closely approximates the probability:
 * $$\hat{\operatorname{E}} [x] = \frac{1}{N} \sum_i^N x_i = \frac{1}{N} \sum_{x \in X} x \operatorname{C}(x) = \sum_{x \in X} x \frac{\operatorname{C}(x)}{N}$$
 * Where $$\operatorname{C}(x)$$ is the number of times the specific value $$x$$ shows up in the sampled values, which we expect to be proportional to the probability, $$\operatorname{p}(x)$$ for a large number of samples, N.
 *  B. Mearns * , KSC 19:10, 5 December 2012 (UTC)

Basic theory section
I think the symbols need to be better defined. For example, at the introduction of L, L(omega) was not at all defined. What is omega? I found the article difficult to read beyond this point. I think this section generally needs better clarification of the notation, and better explanation of the concepts. Srfahmy1 (talk) 18:27, 23 January 2015 (UTC)

In the first section (Basic Theory)
In:


 * If we have random samples $$x_1, \ldots, x_n$$, generated according to P, then an empirical estimate of E[X;P] is



\widehat{\mathbf{E}}_{n}[X;P] = \frac{1}{n} \sum_{i=1}^n x_i. $$

I think this is wrong. If $$x_1, \ldots, x_n$$ is generated according to P, then I think the correct would be: $$ \widehat{\mathbf{E}}_{n}[X;P] = \frac{1}{n} \sum_{i=1}^n X(x_i). $$

189.103.24.199 (talk) 15:38, 21 November 2015 (UTC)