Talk:Newton's method in optimization

Should be merged with Newton's method
There is another wikipedia page called Newton's method (http://en.wikipedia.org/wiki/Newton%27s_method#Nonlinear_systems_of_equations). I suggest both are merged or the content is just too confusing and even seemingly contradictory. Alpheratzstar (talk) 12:25, 9 November 2011 (UTC)alpheratzstar

I agree, this page is very confusing. It could be made better by stating right after

$$x_{n+1} = x_n - \frac{f'(x_n)}{f''(x_n)}, \ n = 0, 1, \dots$$

that this is equivalent to Newton's Method applied to f'(x).173.237.117.134 (talk) 21:11, 9 October 2013 (UTC)


 * I think it is good to seperate like this. The point is there are some differences between: solving systems of non-linear equations (the original purpose of Newton's method) and solving optimization problems (this Wikipedia page). For example, in optimization problems, one wants to find local minima, but Newton's method will converge to a saddle point if the initial point is close to that saddle point. Tuyentruongoslo (talk) 05:46, 19 March 2021 (UTC)

Needs more generalization
Could someone write a piece on Halley's method in multi-dimensional case? I could only find this on the German page, and can't really follow it very well: http://de.wikipedia.org/wiki/Halley-Verfahren I also realize that this is not really the Newton's method, but there are many similarities and more people are likely to visit this page..... — Preceding unsigned comment added by 90.215.230.136 (talk) 15:34, 22 July 2011 (UTC)

Yes! This would be really nice to have! Also,.... most papers don't discuss this method (obviously) but what affect would it have on convergence properties (for example could the addition of the (3rd derv.) tensor actually mess-up iterative schemes that are proven to work (but as part of their proof the truncation MUST be taken at 2nd derv.) Also, in the case of positive definite hessian, can the addition of the 3rd derivative cause the hessian (in the next iteration) to be non-positive definite?  These are good questions indeed.  — Preceding unsigned comment added by 144.32.126.15 (talk) 12:44, 18 October 2011 (UTC)


 * I think you are mixing up quasi-NM with the real thing. 018 (talk) 18:55, 18 October 2011 (UTC)

Call for review/correction)
The first equation appears to contain a significant typo. The derivation from taylors expansion gives f/f' rather than f'/f". I therefore propose a change from f'(x)/f"(x) to f(x)/f'(x) within a reasonable time unless there is objection/discussion or the equation's author agrees and makes the correction. Merlin Pendragon 23:27, 17 August 2006 (UTC)


 * Everything is correct as far as I can tell. The Newton's method is applied to f', as we want to solve f'(x)=0, and that's why you get eventually f'/f". Oleg Alexandrov (talk) 23:50, 17 August 2006 (UTC)


 * Yes, I see now. My error in haste. Thank you. My proposed correction is withdrawn. I had been looking for a cite for Newton_method for finding roots and fell victim to seeing what I expected to see instead of what you actually wrote. :) In review, perhaps a little tweek on your intro two sentences to wake up the lazy or hasty reader, perhaps along the lines that this is an *application* of *Newton's method* which will find a local maximum or minimum value of a function, a task useful in *optimization*... or some such.  The current has multiple (indestinguished, distracting?) blue links in the beginning sentence, the first sentence refers to the root method rather than explaining what this article is about, and the second sentence refers to what "can also be" done suggesting the sentence is precatory rather than making clear that this is the point of the article. Careful reading and reasoning sorts it out, so there is no error, but you may wish to consider a tweek? Thanks for the article, and the quick response. -Merlin

Call for clarification
A beginner will have a hard time understanding how the Taylor expansion is minimized by solving the algebraic equation presented here. I'd bet you can do better at making this important first step more clear. -Feraudyh


 * I support this call of clarification. Why does minimizing the Taylor expansion give us the $\delta x$ step? And isn't $\delta x=0$ a trivial solution for the minimization? -Emre

Root or minimum??
The formula presented is the method applied to find the root of f(x), or to find a minimum, applying the method to f'(x)?? -- NIC1138 17:27, 21 June 2007 (UTC)
 * The second is right, applying the method to f'(x). I think that's mentioned in the intro. Oleg Alexandrov (talk) 01:59, 22 June 2007 (UTC)

Error in the formula?
I think there is an error in the formala: $$\mathbf{p}_{n} = \mathbf{x}_{n+1}-\mathbf{x}_{n} = -[H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n), \ n \ge 0.$$

I think to make it a linear equation it should be $$\mathbf{p}_{n} = [H f(\mathbf{x}_n)] (\mathbf{x}_{n+1}-\mathbf{x}_{n}) = -\nabla f(\mathbf{x}_n), \ n \ge 0.$$

or perhaps $$[H f(\mathbf{x}_n)] \mathbf{p}_{n} = -\nabla f(\mathbf{x}_n), \ n \ge 0.$$

with $$\mathbf{x}_{n+1} = \mathbf{p}_{n} + \mathbf{x}_{n}$$

Any objections? —Preceding unsigned comment added by 91.19.151.5 (talk) 11:32, 5 October 2007 (UTC)


 * Not sure if I understand what you meant. Can you be clearer? For example, I don't see any error with this: $$\mathbf{p}_{n} = \mathbf{x}_{n+1}-\mathbf{x}_{n} = -[H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n), \ n \ge 0.$$ Tuyentruongoslo (talk) 05:50, 19 March 2021 (UTC)

request
Someone should include the formula for Newton's if the Hessian is not invertible. I would, but I don't know how to add mathematical formulas. Some proofs would be nice too. -anon — Preceding unsigned comment added by 128.208.36.170 (talk • contribs) 20:18, 13 March 2006


 * I don't know if Newton's method even works if the hessian is not invertible. Either way, in that case convergence will be slower, so gradient descent may be better. But I don't know much about what happens for non-invertible hessians. Oleg Alexandrov (talk) 01:22, 14 March 2006 (UTC)


 * I don't understand what this business is with inverting the Hessian. Why would you ever invert the Hessian when you can just rearrange the problem to be solved as a traditional linear system?  eg, H(f)*(Xn+1 - Xn) = -Grad(f).  Then you can just solve for (Xn+1 - Xn) using, say, Gaussian elimination, and add that to the previous Xn. Yahastu (talk) 20:44, 15 February 2009 (UTC)


 * Gaussian elimination is equivalent to matrix inversion. —Preceding unsigned comment added by 87.103.91.29 (talk) 02:38, 24 November 2009 (UTC)


 * Not computationally! ;)131.215.143.10 (talk) 19:40, 16 March 2010 (UTC)


 * The Hessian also needs to be Lipschitz continuous near the minimum for quadratic convergence to hold. See http://www.cs.ucf.edu/courses/cap6411/cot6505/spring03/Lecture-20.pdfMattrbunique (talk) 21:53, 23 November 2010 (UTC)


 * Isn't this covered in the last section: "If the Hessian is close to a non-invertible matrix..." It goes on to mention the Levenberg–Marquardt algorithm, which I was going to mention. That stabilizes things by essentially removing zeros in the spectrum of the Hessian, as I understand it.
 * As for adding equations, look at the source. It's pretty easy. Just add the  tags and type LaTeX math in between. —Ben FrantzDale (talk) 14:08, 24 November 2010 (UTC)
 * I can answer this question, but rather wait a little bit more time for the answer to be more creditable. One comment is that Levenberg-Marquardt thing does not guarantee good theoretical results. Tuyentruongoslo (talk) 19:10, 19 March 2021 (UTC)

Comments
The below was added by Tuyentruongoslo to the article. I'm moving them to here, should anyone wish to address them. -- Escape Orbit (Talk) 12:44, 17 March 2021 (UTC)


 * How can I add one remark in the article that the section needs clarified? Tuyentruongoslo (talk) 22:54, 17 March 2021 (UTC)

Some comments: The content of this section is very confusing. The author of this section should explain.

First: What is the meaning of these two sentences "Note that if $$x_0^1 = x_0^2 =\cdots = x_0^n$$ and $$\tau=n$$, SN reduces to Newton's method described above. However, in contrast with Newton's method, in iteration $$k$$, SN needs to compute the gradients and Hessians of functions $$f_i$$ for $$i\in S_k$$ only."? It seems to say that SN is less computational expensive than Newton's method. However, in Newton's method, at step k we need to do computations relevant to the point $$x_k$$ only, while in the description of SN one needs to do so for many points? What is the advantage then?

Second: This sentence also needs explanation "There are various techniques which can to some extent reduce but which can't completely eliminate the effect of the conditioning $$\kappa$$ on the convergence rate of first order methods." What happens if $$ \sum_{i=1}^n f''_i(x_k)$$ is not invertible?

Third: The theorem cited is also very restrictive, since it needs heavily assumptions (strongly convexity + Hessian to be globally Lipschitz continuous) are needed, while Newton's method applies to more general results.

Fourth: Besides the number of iterations needed, one also needs to take running time into account when comparing different methods. In the original paper, there is no such comparisons.

Fifth: The usual Stochastic version would be (as far as I understand): In each epoch, partition randomly the set $$ S\subset \{1,...,n\} $$ into smaller sets $$I_1,...,I_l$$ (mini-batches). Then, for I one of the mini-batches, consider the update

$$x_{k+1}=x_k-\frac{1}{\sharp I}(\sum _{j\in I}f_j(x_k))^{-1}(\frac{1}{\sharp I}\sum _{j\in I}f_j(x_k)).$$

The original author of this section should explain why not following this, but using the more complicated way. What is the gain? Tuyentruongoslo (talk) 22:51, 17 March 2021 (UTC)

Sixth: The section claims superiority of SN to stochastic first order method, without proper citation and experiments. Please do so. — Preceding unsigned comment added by Tuyentruongoslo (talk • contribs) 23:05, 17 March 2021 (UTC)

ask for a clarification for notations
In the introduction : I don't understand why the notation F(x) is switched to f(x) : is it the same function ? Pierre g111 (talk) 04:45, 27 July 2023 (UTC)