Talk:Limited-memory BFGS

It would be nice if the LBFGS article and the BFGS article used the same symbol to represent the approximation to the inverse of the Hessian. LBFGS uses Hk and BFGS uses Bk^{-1} — Preceding unsigned comment added by 76.102.204.220 (talk) 03:52, 26 March 2014 (UTC)

As it stands, this article is basically an a set of links to, and information about, library packages. It should have a more detailed description of the algorithm itself and fewer links to external content. Does someone with a more thorough background in optimization than myself want to take this on? --Soultaco 15:55, 15 October 2007 (UTC)

software information back
The intention when creating this page was to give software information. Our hope is that users can contribute links to versions of L-BFGS written in different languages. There are various explanations of the L-BFGS algorithm on the web and there is no need for another one here. Therefore I will try to restore the earlier version. Nocedal 06:53, 2 December 2007 (UTC)Jorge Nocedal
 * Thanks for contributing, but Wikipedia is not a collection of links. It is an online encyclopedia, and while links to software packages for L-BFGS are certainly relevant and worth including, this entry is not here simply to advertise software packages; the Wikipedia entry on L-BFGS should first and foremost be a discussion the algorithm. If no further objection/elaboration is raised, I'm going to re-add the algorithm information and reformat the article to conform with Wikipedia standards. --Soultaco (talk) 04:58, 6 December 2007 (UTC)

I have rewritten the page so that it conforms to Wikipedia standards and informs the reader about the L-BFGS codes. Please feel free to write a separate page about limited memory algorithms, but I propose that we do not try to do both in the same page. Nocedal February, 2008

I completely agree with Soultaco; a page entitled L-BFGS should provide a concise description of the algorithm. Readers are more interested in how the method works than the author's software package that implements it. Henkelman (talk) 04:34, 20 June 2008 (UTC)

I recently had to implement this, and was very annoyed that there wasn't a helpful Wikipedia page, and none of my standard sources had enough information (e.g. Numerical Recipes). As such, having finished my implementation, and finding the key to doing so buried at the back of the "Representations" article by Byrd, Nocedal and Schnabel which I've cited, I've enriched the wikipedia article to the best of my ability. We really need someone who can explain the relevant proofs -- e.g. Why we're able to use a limited history for the BFGS update without causing the approximate Hessian to stop being symmetric and positive definite. These proofs are given in the "Representations" paper, but I don't understand them well enough to reduplicate the argument without lifting the proofs directly. I imagine they're also in the 1980 paper, but I can't find that anywhere. The 1989 paper is totally useless from an implementation perspective -- it outlines the QN procedure, and skips over how to do the Hessian w/o representing the whole thing. Abeppu (talk) 05:00, 19 February 2009 (UTC)

Thanks to the previous contributors (Abeppu?) for adding in the L-BFGS algorithm, which is indeed very useful and nicely formatted. There seem to have been some minor errors about the loop index i, and while referring to some published descriptions of L-BFGS I fixed these. I also changed the definition of s and y to be consistent with i going to the current iteration minus one, which seems to have been the intention of the code that was there (there are two alternative formulations in the literature which differ by an index offset of one, and I've gone with the one you can see there, where i goes up to the current iteration minus one). I also changed scalars to Greek letters, because when we're not using bold for vector quantities, it is otherwise quite hard to distinguish scalar and vector quantities. Danpovey (talk) 00:37, 6 April 2011 (UTC)

Citation needed?
in the sentence "An early, open source implementation of L-BFGS in Fortran exists" I would expect some citation and/or link to this reference that seems important. No? —Preceding unsigned comment added by Orzelf (talk • contribs) 22:38, 20 February 2010 (UTC) yes - done. --Dikay0 (talk) 18:07, 31 October 2010 (UTC)

Bug in algorithm?
The sign of initial z seems to be wrong here:

$$H^0_k= \gamma_k I$$ $$z = -H^0_k q$$

Nocedal's Numerical Optimization has a positive sign. (Alg 7.4, page 178) I tried implementing the version listed here and it does not converge on Rosenbrock. — Preceding unsigned comment added by 136.152.250.167 (talk) 00:16, 23 January 2019 (UTC)

The assignment $$H^0_k=y^{\rm T}_{k-1} s_{k-1}/y^{\rm T}_{k-1} y_{k-1}$$ can't be right: The left side is a matrix, the right side a scalar.

How to fix this? The paper cited below gives at the top of page 9 (in the paragraph just before equation 3.1) the formula $$H^0_k=\gamma_k I$$ and, on the bottom of page 11, equation 3.12 says $$\gamma_k = y^{\rm T}_{k-1} s_{k-1}/y^{\rm T}_{k-1} y_{k-1}$$.

So it seems a multiplication with $$I$$ should be added on the right side of the assignment.

If someone could verify this and then change the page accordingly that would be great.

Also, note that this calculation of $$H^0_k$$ makes this matrix diagonal, a fact that is described in the following text as only “commonly”. There should, IMO, be a short explanation of this apparent contradiction. I feel I don't know enough about this, so won't provide one.

The paper is: Richard H. Byrd, Jorge Nocedal and Robert B. Schnabel: “Representations of Quasi-Newton Matrices and Their Use in Limited Memory Methods,” Technical Report, CU-CS-612-92, University of Colorado at Boulder, 1992. This is probably the same as the one from citation 5, only two years earlier and fetched from another source.

84.143.150.249 (talk) 21:07, 17 November 2015 (UTC)


 * I was wandering about the same thing. But following your interpretation makes $$H^0_k$$ a **scalar** matrix, so by abuse of notation one might say that the pseudo code is in fact correct? But then surely there should be a better approximation of $$H^0_k$$ that uses a diagonal non-scalar matrix, right? bungalo (talk) 11:00, 15 April 2017 (UTC)

Hi, I know wikipedia is not intended to be a place of original research, but the sign in front of $$H_k^0$$ is definitely bugged. I coded up limited memory BFGS myself to check. Using $$z = -H^0_k q$$ gave failure to find a Wolfe conditions satisfying point. (I used Algorithm 3.5 & 3.6 of Nocedal mentioned above.) Flipping the sign to $$z = H^0_k q$$ gave convergence to local minimum in 14 steps. I used a strictly positive bivariate quartic polynomial for my objective similar to the Rosenbrock function. Even though my code works for a benchmark optimization problem it would be good if someone could find a source or dig through the original papers on limited bfgs. Akrodger (talk) 07:02, 22 March 2019 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Limited-memory BFGS. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20131101205929/http://acl.ldc.upenn.edu/W/W02/W02-2018.pdf to http://acl.ldc.upenn.edu/W/W02/W02-2018.pdf

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 10:37, 23 December 2017 (UTC)

Algorithm description
I don't understand the point of introducing the matrix $$H^0_k$$ in the algorithm. It is just a scaled identity matrix, so why not simply multiply $$z$$ by $$\gamma_k$$? -- Andrew Myers (talk) 22:37, 25 August 2019 (UTC)