User:Orimosenzon/notes

Expectation according to joint distribution equals single distribution Expectation
$$ E_{p(x_1,x_2)}(X_1) = \sum_{x_1,x_2}{p(x_1,x_2)(x_1)} = \sum_{x_1} \sum_{x_2} p(x_1)p(x_2|x_1)x_1 = $$

$$ \sum_{x_1}p(x_1)x_1 \sum_{x_2}p(x_2|x_1) = \sum_{x_1}p(x_1)x_1 \cdot 1 = E_{p(x_1)}(X_1) $$

Hence:

$$E_{p(x_1,x_2)}(X_1) = E_{p(x_1)}(X_1)$$

Also:

$$E_{p(x_1,x_2)}(f(X_1)) = E_{p(x_1)}(f(X_1))$$

linearity
$$ E(X_1+X_2) = \sum_{x_1,x_2}{p(x_1,x_2)(x_1+x_2)} = $$

$$ \sum_{x_1,x_2}{p(x_1,x_2)x_1}+\sum_{x_1,x_2}{p(x_1,x_2)x_2} = $$

$$ \sum_{x_1}{p(x_1)x_1}+\sum_{x_2}{p(x_2)x_2} = E(X_1)+E(X_2) $$

hence:

$$E(X_1+X_2)=E(X_1)+E(X_2)$$

$$E(\lambda X) = \sum_x p(x)\lambda x = \lambda \sum_x p(x) x = \lambda E(X)$$

hence:

$$E(\lambda X) = \lambda E(X)$$

definitions
$$ V(X) =def= E([X-E(X)]^2) $$

$$ \sigma(X) =def= \sqrt{V(X)} $$

The meaning of standard deviation
One way to look at standard deviation is as an approximation of the "expected drift" from the expectation. The "expected drift" could be defined as:

$$ ED(X) =def?= E(|X-E(X)|) $$

This value is probably not easy to manipulate.

Suppose that X can have only the two values $$k$$ and $$-k$$ and that $$E(X)=0$$. Then:

$$ V(X) =def= E([X-E(X)]^2) = E(X^2) = k^2 $$

and

$$ \sigma(X) = \sqrt{V(X)} = k $$

and

$$ ED(X) = E(|X-E(X)|) = E(|X|) = E(k) = k = \sigma(X) $$

$$ V $$, $$ \sigma $$ and $$ ED $$ doesn't change by adding a constant so any random variable $$ X $$ that all its drifts are of the same absolute value $$ k $$ has $$ \sigma(X) = ED(X) $$.

Whenever the drift values are not the same, $$ \sigma $$ averages with bigger weights to bigger values while $$ ED $$ keep fair plane average. *todo*: show why

Example:Suppose you are performing the following experiment: you flip a coin, if it's head you go 5 meters to the left, if it is tail, you go 5 meters to the right. The variance in this case is 25 and the standard deviation is 5. The expected drift is also 5 (all the drift values are equal). More on that example, see here.

Alternative definition of variance
$$ V(X) =def= E((X-E(X))^2) = E(X^2+E^2(X)-2X E(X)) = $$

$$ E(X^2)+E^2(X) - 2E^2(X) = E(X^2)-E^2(X) $$

hence:
 * $$ V(X)=E(X^2)-E^2(X) $$

variance (and sd) doesn't change by adding a constant
$$ V(X+c) = E([X+c-E(X+c)]^2) = E([X+c-E(X)-E(c)]^2) = E([X-E(X)]^2) = V(X) $$

variance of multiplication
$$ V(\lambda X) = E((\lambda X)^2) - E^2(\lambda X) = \lambda^2E(X^2)-\lambda^2E^2(X) = \lambda^2(E(X^2)-E^2(X)) = \lambda^2V(X)$$

hence:
 * '''$$V(\lambda X) = \lambda^2V(X) $$

SD of multiplication
$$ \sigma(\lambda X) = \sqrt{V(\lambda X)} = \sqrt{\lambda^2V(X)} = \lambda \sqrt{V(X)} = \lambda \sigma(X)$$

hence:
 * $$\sigma(\lambda X) = \lambda \sigma(X)$$

Variance of sum of random variables
$$ V(X_1+X_2) = E((X_1+X_2)^2)-E^2(X_1+X_2) = $$
 * $$E(X_1^2)+E(X_2^2)+2E(X_1 X_2) - $$
 * $$(E^2(X_1) + E^2(X_2) + 2E(X_1)E(X_2))= \cdot$$
 * $$V(X_1)+V(X_2)+2Cov(X_1,X_2) \cdot$$

hence:

$$V(X_1+X_2) = V(X_1)+V(X_2)+2Cov(X_1,X_2)$$

When $$X_1$$ and $$X_2$$ are independent, $$Cov(X_1,X_2)=0$$ and hence:

$$ X_1, X_2 $$ Independent $$ \Rightarrow V(X_1+X_2) = V(X_1)+V(X_2)$$

When $$X_1$$ and $$X_2$$ are i.i.d (identically independent distributed) then:

$$ X_1, X_2 $$ i.i.d $$ \Rightarrow V(X_1+X_2) = V(X_1)+V(X_2) = 2V(X_1) $$

Or more generally:

$$ X_1, X_2, \ldots, X_n $$ i.i.d $$ \Rightarrow V(\sum_{i=1}^n{X_i}) = \sum_{i=1}^n V(X_i) = n V(X_1)$$

hence:

$$ X_1, X_2, \ldots, X_n $$ i.i.d $$ \Rightarrow \sigma (\sum_{i=1}^n{X_i}) = \sqrt{V(\sum_{i=1}^n{X_i})} = \sqrt{ n V(X_1) } = \sqrt{n} \cdot \sigma (X_1) $$

Note the difference from summing the variable with itself (identically distributed but not independent): $$ V(X_1 + X_1) = V(2 X_1) = $$ $$4$$ $$ V(X_1) $$

and

$$ \sigma(X_1 + X_1) = \sigma (2 X_1) = $$ $$2$$ $$ \sigma(X_1) $$

more on the last result
We've showed that:

$$ X_1, X_2, \ldots, X_n $$ i.i.d $$ \Rightarrow \sigma (\sum_{i=1}^n{X_i}) = \sqrt{n} \cdot \sigma (X_1) $$

Why is this important?

$$ \sigma $$ is a measure for expected drift. The last result shows that the expected drift goes as square root (less than linear) with successive experiments... this means that the mean drift tends to zero:

$$ \lim_{n\to\infty}\frac{\sigma(\sum_{i=1}^n X_i)}{n} = \lim_{n\to\infty}\frac{\sqrt{n} \cdot \sigma (X_1)}{n} = 0 $$

Recall the example of the random walk +-5. Now, suppose You repeat the process $$ n $$ times. What is the expected drift?

The standard deviation, which can be considered as a measure to that drift is: $$ \sqrt{n} \cdot 5 $$

The mean drift is: $$ 5 \frac{\sqrt{n}}{n} $$

For example, for 10000 iterations, the mean drift is: $$ 5 \frac{\sqrt{100000}}{10000} = 0.05 $$ meter. Instead of 5 meter in each step it is 5 centimeter. The total drift is only 500 instead of 50,000.


 * todo:*...example of random walk +-5 gnuplot picture. the relation to the law of big numbers... the fact that frequent ration converges is an assumption in probability theory or a result?..

misc
$$ V(X) = 0 \Leftrightarrow E([X-E(X)]^2) = 0 \Leftrightarrow \forall{x}, x-E(X) = 0 \Leftrightarrow X $$ is constant.

hence:

$$ V(X) = 0 \Leftrightarrow X $$ is constant.

Alternative definition
$$ Cov(X_1,X_2) = E( (X_1 - E(X_1)) (X_2-E(X_2)) ) = E(X_1 X_2)+E(X_1)E(X_2)-2E(X_1)E(X_2) = E(X_1 X_2) - E(X_1)E(X_2) $$

hence:

$$ Cov(X_1,X_2) = E(X_1 X_2) - E(X_1)E(X_2) $$

A special case is a covariance of two of the same random variable: $$ Cov(X,X) = E(X X) - E(X)E(X) = V(X)$$

Covariance of independent variables
Assume that $$X_1$$ and $$X_2$$ are independent:

$$ E(X_1 X_2) = \sum_{x_1,x_2}{p(x_1,x_2)x_1 x_2} = \sum_{x_1,x_2}{p(x_1) p(x_2)x_1 x_2}$$ $$ \sum_{x_1} p(x_1) x_1 \sum_{x_2} p(x_2)x_2 = E(X_1) E(X_2) $$

And hence:

$$ X_1, X_2 $$ independent $$ \implies Cov(X_1,X_2) = 0 $$

The contrary is not true, however. For example, if X is a constant random variable then

$$ COV(X,X)= V(X) = 0 $$

But of course, X and X are very much dependent.

Wiener processes
(also known as "Brownian motion")

Let Z be a stochastic process with the following properties: 1. The change $$ \delta Z $$ in a small period of time $$ \delta t $$ is

$$ \delta Z = \epsilon \cdot \sqrt{\delta t} $$

where:

$$ \epsilon  \sim \phi(0,1) $$

Expectation

 * $$E_{p(x_1,x_2)}(X_1) = E_{p(x_1)}(X_1)$$
 * $$E(X_1+X_2)=E(X_1)+E(X_2)$$
 * $$E(\lambda X) = \lambda E(X)$$

Variance and standard deviation

 * $$ V(X)=E(X^2)-E^2(X) $$
 * $$V(\lambda X) = \lambda^2V(X) $$
 * $$\sigma(\lambda X) = \lambda \sigma(X)$$
 * $$V(X_1+X_2) = V(X_1)+V(X_2)+2Cov(X_1,X_2)$$
 * $$ X_1, X_2 $$ Independent $$ \Rightarrow V(X_1+X_2) = V(X_1)+V(X_2)$$
 * $$ X_1, X_2, \ldots, X_n $$ i.i.d $$ \Rightarrow V(\sum_{i=1}^n{X_i}) = n V(X_1)$$
 * $$ X_1, X_2, \ldots, X_n $$ i.i.d $$ \Rightarrow \sigma (\sum_{i=1}^n{X_i}) = \sqrt{n} \cdot \sigma (X_1) $$

Covariance

 * $$ Cov(X_1,X_2) = E(X_1 X_2) - E(X_1)E(X_2) $$

Determinant is the area of the Parallelogram
Let $$\mathbf{x_1}=(x_1,y_1)$$ and $$\mathbf{x_2}=(x_2,y_2)$$ be two vectors in $$ \mathbb{R}^2 $$. We will show that the determinant $$ x_1 \cdot y_2 - x_2 \cdot y_1$$ is equal to the area of the Parallelogram.

short way
let $$ \mathbf{a} $$ be a vector orthogonal to $$\mathbf{x_1}$$ and of norm equal to 1:

$$ \mathbf{a} = \frac{(-y1,x1)}{\|\mathbf{x_1}\|} $$

(a word about left/right systems? why we didn't choose $$(y1,-x1)$$ ?)

Let $$S$$ be the area of the Parallelogram:

$$ S = \langle \mathbf{a}, \mathbf{x_2} \rangle \cdot \|\mathbf{x_1}\| = \langle \|\mathbf{x_1}\| \mathbf{a}, \mathbf{x_2} \rangle = \langle (-y_1,x_1),(x_2,y_2) \rangle = -y_1x_2 + x_1y_2 $$

longer way


Let $$\mathbf{N}$$ be the vector that resembles the height of the Parallelogram:

$$ \mathbf{N} = \mathbf{x_2} - \frac{\langle \mathbf{x_1},\mathbf{x_2} \rangle }{ \|\mathbf{x_1}\|} \frac{\mathbf{x_1}}{\|\mathbf{x_1}\|} = \mathbf{x_2} - \frac{\langle \mathbf{x_1},\mathbf{x_2} \rangle }{ \|\mathbf{x_1}\|^2} \mathbf{x_1} $$

Let $$ S $$ be the area of the Parallelogram:

$$ S = \|\mathbf{N}\| \cdot \|\mathbf{x_1}\| = \| \mathbf{x_2} - \frac{\langle \mathbf{x_1},\mathbf{x_2} \rangle }{ \|\mathbf{x_1}\|^2} \mathbf{x_1} \| \cdot \|\mathbf{x_1}\| = \frac{\left \| \|\mathbf{x_1}\|^2 \mathbf{x_2} - \langle \mathbf{x_1},\mathbf{x_2} \rangle  \mathbf{x_1} \right \|} { \|\mathbf{x_1}\|} = \frac{\left \| \langle \mathbf{x_1}, \mathbf{x_1} \rangle \mathbf{x_2} - \langle \mathbf{x_1},\mathbf{x_2} \rangle  \mathbf{x_1} \right \|} { \|\mathbf{x_1}\|} $$

$$ S^2 = \frac{ \| \langle \mathbf{x_1}, \mathbf{x_1} \rangle \mathbf{x_2} - \langle \mathbf{x_1},\mathbf{x_2} \rangle

\mathbf{x_1} \|^2} { \|\mathbf{x_1}\|^2} = \frac{ \left \langle \langle \mathbf{x_1}, \mathbf{x_1} \rangle \mathbf{x_2} - \langle \mathbf{x_1},\mathbf{x_2} \rangle \mathbf{x_1}, \langle \mathbf{x_1}, \mathbf{x_1} \rangle \mathbf{x_2} - \langle \mathbf{x_1},\mathbf{x_2} \rangle  \mathbf{x_1} \right \rangle} { \langle \mathbf{x_1},\mathbf{x_1} \rangle } $$

$$ = \frac {\langle \mathbf{x_1}, \mathbf{x_1} \rangle ^2 \langle \mathbf{x_2},\mathbf{x_2} \rangle + \langle \mathbf{x_1},\mathbf{x_2} \rangle^2 \langle \mathbf{x_1}, \mathbf{x_1} \rangle - 2 \cdot \langle \mathbf{x_1}, \mathbf{x_1} \rangle \langle \mathbf{x_1}, \mathbf{x_2} \rangle ^2} { \langle \mathbf{x_1},\mathbf{x_1} \rangle } $$

$$ = \langle \mathbf{x_1}, \mathbf{x_1} \rangle \langle \mathbf{x_2},\mathbf{x_2} \rangle + \langle \mathbf{x_1},\mathbf{x_2} \rangle^2 - 2 \cdot \langle \mathbf{x_1}, \mathbf{x_2} \rangle ^2 = \langle \mathbf{x_1}, \mathbf{x_1} \rangle \langle \mathbf{x_2},\mathbf{x_2} \rangle - \langle \mathbf{x_1},\mathbf{x_2} \rangle ^ 2 $$ $$ = (x_1^2+y_1^2) (x_2^2+y_2^2) - (x_1x_2+y_1y_2) ^ 2 = x_1^2 x_2^2 + y_1^2 y_2^2 + x_1^2 y_2^2 + y_1^2 x_2^2 - \left( x_1^2 x_2^2 + y_1^2 y_2^2 + 2 \cdot x_1 x_2 y_1 y_2 \right) $$

$$ = x_1^2 y_2^2 + y_1^2 x_2^2 - 2 \cdot x_1 x_2 y_1 y_2 = (x_1 y_2 - x_2 y_1)^2 $$

$$ S = x_1 y_2 - x_2 y_1 $$

(-1)*(-1) =?= 1
$$ 0 = 0 \cdot (-1) = (1 + (-1)) \cdot (-1) = $$

$$ = 1 \cdot (-1) + (-1) \cdot (-1) = $$

$$ = (-1) + (-1) \cdot (-1) $$

לכן:

$$ (-1) \cdot (-1) = 1 $$

צעד ראשון: 0 כפול כל מספר הוא 0

צעד שני: 0 סכום של הופכיים חיבוריים

צעד שלישי: דיסטריביוטיביות

צעד רביעי: 1 נטרלי כפלי

הסקה: הופכי חיבורי ל - -1-