Wikipedia:Reference desk/Archives/Mathematics/2020 May 24

= May 24 =

Optimization: Necessary and Sufficient Conditions for Superlinear Convergence of Quasi-Newton Method
Hello,

I am reading *Numerical Optimization* by Nocedal & Wright, and I am having trouble understanding some aspects of the proof of Theorem $$3.7$$. I have written the theorem, its proof, and my questions in LaTeX. I'm sorry I couldn't figure out how to make it show up nicely on Wikipedia.

I have also asked this question on Math Stackexchange, if you would prefer to see it there: https://math.stackexchange.com/questions/3686947/proof-of-superlinear-convergence-of-quasi-newton-methods-in-nocedal-wright

I have been stuck on this theorem for many hours, so any help is greatly appreciated.

There are two things I don't understand:

1) The theorem is an iff statement. The author proves one direction, but I don't see how to prove the reverse.

2) The author seems to use the assumption that the Hessian is Lipschitz, but this is not an explicit assumption of the theorem. Is this a mistake from the author? (I checked the errata and this wasn't on there)

The following are several lines the author references in the proof. The theorem and the proof follows.


 * $$||x_k + p_k^N - x^*|| \le L||x_k - x^*||^2 ~(3.33)$$

[The above is where my  point  #2 comes from. This inequality was derived in the proof of an earlier theorem (the theorem about quadratic convergence of Newton's Method) and in that theorem we had a hypothesis that the Hessian is Lipschitz.(and we used that hypothesis to prove the above inequality)]
 * $$p_k = -B_k^{-1} \nabla f_k ~ \mathrm{where}~B_k~\mathrm{is~symmetric~and~pos.\,def.}~(3.34)$$


 * $$\lim_{k \to \infty} \dfrac {||(B_k - H_f(x^*))p_k||}{||p_k||} = 0~(3.36)$$


 * $$\mathbf{Theorem~3.7:}$$ Suppose that $$f:\mathbb{R}^n \to \mathbb{R}$$ is twice continuously differentiable. Consider the iteration $$x_{k+1} = x_k + p_k$$ (that is, the step length $$\alpha_k$$ is uniformly $$1$$) and that $$p_k$$ is given by $$(3.34)$$. Let us assume also that $$(x_k)$$ converges to a point $$x^*$$ such that $$\nabla f(x^*) = 0$$ and $$H_f(x^*)$$ is positive definite. Then $$(x_k)$$ converges superlinearly if and only if $$(3.36)$$ holds.


 * $$\textit{Proof:}~$$ We first show that $$(3.36)$$ is equivalent to
 * $$p_k - p_k^N = o(||p_k||) ~(3.37)$$
 * where $$p_k^N = - H_f(x_k)^{-1} \nabla f_k$$ is the Newton step. Assuming $$(3.36)$$ holds, we have that
 * $$\begin{array}{lcl}

p_k - p_k^N &=& H_{f}(x_k)^{-1}(H_f(x_k)p_k + \nabla f_k)\\ &=& H_{f}(x_k)^{-1}(H_{f}(x_k) - B_k)p_k\\ &=& O(||(H_f(x_k) - B_k)p_k||)\\ &=& o(||p_k||) \end{array}$$
 * where we have used the fact that $$||H_f(x_k)^{-1}||$$ is bounded above for $$x_k$$ sufficiently close to $$x^*$$, since the limiting Hessian $$H_f(x_*)$$ is positive definite. The converse follows readily of we multiply both sides of $$(3.37)$$ by $$H_f(x_k)$$ and recall $$(3.34)$$.


 * By combining $$(3.33)$$ and $$(3.37)$$, we obtain that
 * $$||x_k+p_k-x^*|| \le||x_k+p_k^N-x^*||+||p_k-p_k^N||=O(||x_{k}-x^*||^2)+o(||p_k||).$$
 * A simple manipulation of this inequality reveals that $$||p_k|| = O(||x_k - x^*||),$$ so we obtain
 * $$||x_k+p_k-x^*|| \le o(||x_k-x^*||),$$
 * giving the superlinear convergence result.




 * In an earlier edition of the book (ISBN 978-0-387-22742-9), the statement of Theorem 3.7 starts with: "Suppose that $$f$$ is twice differentiable and that the Hessian $$\nabla^2 f(x)$$ is Lipschitz continuous ..." [my emphasis by underscoring — L.] The statement of Theorem 3.7 in a later edition (ISBN 978-0-387-40065-5) is as above, but the form of (3.36) is subtly different:
 * $$\lim_{k \to \infty} \dfrac {||(B_k - \nabla^2 f(x^*))p_k||}{||p_k||} = 0~(3.36)$$
 * So what edition is the above from? The presentation is not entirely self-contained; I assume that $$f_k$$ is shorthand notation for $$f(x_k)$$. --Lambiam 10:29, 24 May 2020 (UTC)



Thanks for the response. I'm using the second edition; it's good to hear that the Lipschitz hypothesis appears in the first edition. Its ommision must have been a mistake on the author's part, but unfortunately it wasn't listed in the errata.

As far as the diferences in (3.36) the author does use $$\nabla^2 f(x)$$ while I use $$H_f(x)$$. But from going through the proof of Theorem 3.7, I am quite confident thatI am quite confident that (3.36) a little wrong; instead of $$H_f(x^*)$$, it should have $$H_f(x_k)$$

Yes indeed, $$f_k$$ stands for $$f(x_k)$$. I have tried to make the proof as self contained as possible; did I miss something, or are you referring to the "A simple manipulation of this inequality...."? --BlueDream30 15:01, 24 May 2020 (UTC)