User:Numerical one/sandbox

In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems.

The BFGS method belongs to quasi-Newton methods, a class of hill-climbing optimization techniques that seek a stationary point of a (preferably twice-continuously differentiable) function using only gradient evaluations (or approximate gradient evaluations). For such problems, a necessary condition for optimality at a point $$x^*$$is that the gradient at f at $$x^*$$is zero, i.e., $$\nabla f(x^*)=0$$. Under certain conditions, the BFGS method exhibits superlinear convergence. (citation needed) For large problems, limited-memory BFGS (L-BFGS) is commonly used. The BFGS-B variant handles simple box constraints.

The BFGS method is one of the most popular types of quasi-Newton methods. In the case of large scale optimization (i.e., when the number of variables is greater than, for example, 10,000), limited-memory BFGS (L-BFGS) may be used. Other variants include BFGS-B, which handles simple box constraints. citation needed!

The algorithm is named after Charles George Broyden, Roger Fletcher, Donald Goldfarb and David Shanno. (I think we can say more about how these authors found this!)

Rationale
The BFGS algorithm is a quasi-Newton method for unconstrained optimization where the Hessian is approximated using the BFGS update formula. The BFGS algorithm is used to solve problems of the form

minimize $$f(x)$$subject to $$x\in\Re^n,$$where $$f:\Re^n\rightarrow\Re.$$This algorithm is an iterative method where the search direction is computed as the solution of minimizing a quadratic approximation to $$f(x)$$at the current iterate $$x_k,$$i.e., $$B_kp_k=-g_k,$$where $$g_k=\nabla f(x_k)$$and $$B_k$$is the BFGS matrix that approximates the Hessian $$\nabla^2f(x_k).$$ Typically, a line search is then used to compute $$\alpha_k$$so that the next iterate is defined as $$x_{k+1}=x_k+\alpha_kp_k.$$

Derivation
Like all traditional quasi-Newton matrices, the BFGS matrix $$B_k$$satisfies the so-called secant or quasi-Newton condition: $$B_{k+1}s_k=y_k,$$where $$s_k=x_{k+1}-x_k$$and $$y_k=\nabla f(x_{k+1})-f(x_k).$$The BFGS update is a symmetric update that is derived by considering a rank-two change to the previous approximate Hessian, i.e., $$B_{k+1}=B_k+\alpha uu^T+\beta vv^T,$$where $$u$$and $$v$$are $$(n\times 1)$$vectors. One can show that choosing $$u=y_k$$and $$v=B_ks_k$$then $$\alpha=\frac{1}{y_k^Ts_k}$$and $$\beta=-\frac{1}{s_k^TB_ks_k}$$in order that the quasi-Newton condition holds. Thus, the BFGS update is

$$B_{k+1}=B_k+\frac{1}{y_k^Ts_k}y_ky_kT-\frac{1}{s_k^TB_ks_k}B_ks_ks_k^TB_k.$$

It can be shown that if the initial $$B_0$$is positive definite and $$y_k^Ts_k>0$$for all $$k,$$then the sequence of matrices $$\{B_k\}$$will be positive definite. A Wolfe line search is typically used in conjunction with the BFGS algorithm since it guarantees $$y_k^Ts_k>0.$$ (citation needed!)