User:JeffAEdmonds/Sandbox

I am hoping that this page will give some extra understanding of Gradient Descent. As such, it is talking about the same topic. Hence the critique "The first is whether this is a duplicate of the topic at Gradient descent." I do prefer explanations to a list of facts. Given that, I would love to better understand the critique "The second is that it appeared to be written more like a textbook than an encyclopedia article. I could use this textbook approach but it's not necessarily our "house style"."

---

Gradient descent is based on the observation that multi-variable function $$f(x_1,...,x_n)$$ decreases fastest from a point $$\mathbf{x}_0$$ by heading in the direction of the negative of the gradient $$\nabla f(\mathbf{x}_0) = \left( {\partial f}/{\partial x_1}, ..., {\partial f}/{\partial x_n}\right).$$ Though this is true, it is neither obvious what it means nor why it is true.



Suppose you are standing on some hilly terrain. Denote your east/north coordinate as $$(x_0,y_0)$$ and your height as $$f_0 = f(x_0,y_0)$$ Traveling a path across the hill is not as steep as going straight down because for the same horizontal distance traveled, you do not go down as far vertically. The direction of steepest decent is the direction that water flows and the scariest direction if you are skiing. Your location on a contour map is on the contour curve of points whose height is exactly the same as your own location and the next contour curve is that whose height is one unit less. If instead, you are on a curve staircase such that each step has height one, then the contour lines are the edges of the stairs. The goal of steepest decent is choose the direction that crosses as many of these contour curves while traveling a fixed horizontal distance or conversely is to reach the first of these contour curves while travel a minimum horizontal distance. It is reasonable that this direction is that perpendicular to the contour line.

Now suppose you are in a deep fog with a compass, a ruler, and an altimeter. You measure that moving one unit east increases your height by 4, i.e. $${\partial f}/{\partial x} = 4$$ and that moving one unit north increases your height by 7, i.e. $${\partial f}/{\partial y} = 7$$. From this, we can determine that the direction of steepest accent is given by the vector $$\nabla f(\mathbf{x}_0) = \left( {\partial f}/{\partial x},{\partial f}/{\partial y}\right) = \left( 4,7 \right). $$ This means that your height decreases fastest by moving to from your current location  $$(x_0,y_0)$$  to $$(x_0 + 4h,y_0+7h)$$ for some small value  $$h$$. We will prove this in three ways.

Proof 1: Perpendicular to the contour lines
We have argued that the direction of steepest accent/descent is that perpendicular to the contour curve. Let us calculate this. Knowing $${\partial f}/{\partial x} = 4$$ and $${\partial f}/{\partial y} = 7$$ give us that the plain approximating the terrain is defined by the equation $$f(x,y) = 4(x-x_0) + 7(y-y_0) + f_0$$. The contour line that you are standing on has the equation $$4(x-x_0) + 7(y-y_0) + f_0 = f_0$$. Another point on this contour line is $$(x_b,y_b) = (x_0 - 7,y_0+4)$$. Hence, direction of the line is the vector $$(x_b-x_0,y_b-y_0) = (-7,4)$$. Drawing a picture of triangles can convince you that the vector perpendicular to the vector $$(-7,4)$$ is $$(4,7)$$. This gives our result.

If $$f(x_1,...,x_n)$$ depends on your location within an $$n$$-dimensional space, then the contour "curve" will be an $$n-1$$-dimensional sub-space. The direction of steepest ascent is the vector that is perpendicular to this. This can be formalized more as follows. Consider $$f$$'s linear approximation $$f(x_1,...,x_n) = c + \sum_i {\partial f}/{\partial x_i} \times (x_i-\overline{x}_i) $$. The equation of the contour plane, shifted to go through the origin is $$\sum_i {\partial f}/{\partial x_i} \times x_i = 0 $$. I claim that the vector $$ \left( {\partial f}/{\partial x_1}, ..., {\partial f}/{\partial x_n} \right) $$ is perpendicular to this plain because the dot product between it and any vector    $$ \left( x_1, ..., x_n \right) $$ in the contour plane is $$\sum_i {\partial f}/{\partial x_i} \times x_i $$ which by the definition of the conture plane is always zero.

Proof 2: Calculous
Lets again assume terrain is the plain $$f(x,y) = c(x-x_0) + d(y-y_0) + f_0$$. Lets move in the direction $$(u,v)$$ from $$(x_0,y_0)$$ $$(x_0+u,y_0+v)$$. This increases our height from by $$\partial f = cu+dv$$ from $$f(x_0,y_0) = f_0$$ to $$f(x_0+u,y_0+v) = cu + dv + f_0$$. The distance traveled horizontally $$\partial t = \sqrt{u^2+v^2}$$. Hence, the slope traversed is

slope = \frac{\partial f}{\partial t} = \frac{cu+dv}{\sqrt{u^2+v^2}} $$ Steepest ascent chooses the direction $$(u,v)$$ in order to maximize this slope.

\frac{\partial slope}{\partial u} = \frac{c}{\sqrt{u^2+v^2}} - \frac{1}{2} \frac{cu+dv}{(u^2+v^2)^{3/2}} (2u) = 0 { } { } or { } { } \frac{u}{c} = \frac{u^2+v^2}{cu+dv} $$

\frac{\partial slope}{\partial v} = \frac{d}{\sqrt{u^2+v^2}} - \frac{1}{2} \frac{cu+dv}{(u^2+v^2)^{3/2}} (2v) = 0 { } { } or { } { } \frac{v}{d} = \frac{u^2+v^2}{cu+dv} or \frac{u}{c} = \frac{v}{d} or (u,v) \approx (c,d) $$ Again giving us the result.

Proof 3: Lagrange
Suppose we wish to maximize $$f(x,y)=x+y$$ subject to the constraint $$x^2+y^2=1$$. The feasible set is the unit circle, and the level sets of $f$ are diagonal lines (with slope −1), so we can see graphically that the maximum occurs at $$\left(\tfrac{\sqrt{2}}{2},\tfrac{\sqrt{2}}{2}\right)$$, and that the minimum occurs at $$\left(-\tfrac{\sqrt{2}}{2},-\tfrac{\sqrt{2}}{2}\right)$$.

For the method of Lagrange multipliers, the constraint is


 * $$g(x,y)=x^2+y^2-1,$$

hence


 * $$\begin{align}

\mathcal{L}(x, y, \lambda) &= f(x,y) + \lambda \cdot g(x,y) \\[4pt] &= x+y + \lambda (x^2 + y^2 - 1). \end{align}$$

Now we can calculate the gradient:


 * $$\begin{align}

\nabla_{x,y,\lambda} \mathcal{L}(x, y, \lambda) &= \left( \frac{\partial \mathcal{L}}{\partial x}, \frac{\partial \mathcal{L}}{\partial y}, \frac{\partial \mathcal{L}}{\partial \lambda} \right ) \\[4pt] &= \left ( 1 + 2 \lambda x, 1 + 2 \lambda y, x^2 + y^2 -1 \right) \end{align}$$

and therefore:


 * $$\nabla_{x,y,\lambda} \mathcal{L}(x, y, \lambda)=0 \quad \Leftrightarrow \quad \begin{cases} 1 + 2 \lambda x = 0 \\ 1 + 2 \lambda y = 0 \\ x^2 + y^2 -1 = 0 \end{cases}$$

Notice that the last equation is the original constraint.

The first two equations yield


 * $$x= y = - \frac{1}{2\lambda}, \qquad \lambda \neq 0.$$

By substituting into the last equation we have:


 * $$\frac{1}{4\lambda^2}+\frac{1}{4\lambda^2} - 1=0, $$

so


 * $$\lambda = \pm \frac{1}{\sqrt{2}},$$

which implies that the stationary points of $$\mathcal{L}$$ are


 * $$\left(\tfrac{\sqrt{2}}{2},\tfrac{\sqrt{2}}{2}, -\tfrac{1}{\sqrt{2}}\right), \qquad \left(-\tfrac{\sqrt{2}}{2}, -\tfrac{\sqrt{2}}{2}, \tfrac{1}{\sqrt{2}}\right).$$

Evaluating the objective function $f$ at these points yields


 * $$f\left(\tfrac{\sqrt{2}}{2},\tfrac{\sqrt{2}}{2}\right)=\sqrt{2}, \qquad f\left(-\tfrac{\sqrt{2}}{2}, -\tfrac{\sqrt{2}}{2}\right)=-\sqrt{2}.$$

Thus the constrained maximum is $$\sqrt{2}$$ and the constrained minimum is $$-\sqrt{2}$$.