Talk:Conjugate gradient method

Discussion of correct of the algorithm
The explanation skips a large important part of the correctness proof, which shows that the mentioned "Gram-Schmidt orthonormalization" only needs to consider the latest direction p_k and need not be performed relative to all previous directions p_j where j < k. This can be deduced from the fact that the r_k and p_k span the same Krylov subspace, but it should be highlighted how this implies that p_j r_k = 0 for j <= k and p_j A r_k = 0 for j < k. — Preceding unsigned comment added by 46.128.186.9 (talk) 16:04, 13 December 2023 (UTC)

Comment
In your algorithm, the formula to calculate pk differs from Jonathan Richard Shewchuk paper. The index of r should be k instead of k-1. Mmmh, sorry, it seams to be correct! ;-) —The preceding unsigned comment was added by 171.66.40.105 (talk • contribs) 01:45, 21 February 2007 (UTC)

additional comment:

basis vector set $P = \{ p_0, ... p_{n-1} \}$.

Conflicting information under "The resulting algorithm" subsection
In the subsection, I can see that riT rj = 0. The following is quoted from the same. However, in the algorithm, in the calculation for the value of {\beta}, the rkT rk comes up in the denominator, which does not look right. In my openion, the denominator should be rkT rk+1. However, I could be incorrect, and hence, I would like to bring this to the notice of the experts. — Preceding unsigned comment added by Zenineasa (talk • contribs) 13:25, 26 November 2021 (UTC)

Missing steps thinking about it as a linear solver
I can't wrap my head around the fact that it's optimizing
 * $$ f(\mathbf{x}) = \tfrac12 \mathbf{x}^\mathsf{T} \mathbf{A}\mathbf{x} - \mathbf{x}^\mathsf{T} \mathbf{b}, \qquad \mathbf{x}\in\mathbf{R}^n \,$$.

Abstractly, I think (?) the idea is that we choose $$f(x)$$ such that

\nabla f(\mathbf{x}) = \mathbf{A} \mathbf{x} - \mathbf{b} \, $$ and so the residual vector is the gradient of the surface. But coming at it from the perspective of "let's minimize the residual", my line of thinking is let
 * $$ g(\mathbf{x}) := \|\mathbf{A} \mathbf{x} - \mathbf{b}\|_2^2 = (\mathbf{Ax - b})^\top (\mathbf{Ax - b}) = \mathbf{x^\top A^\top A x} - 2 \mathbf{b^\top Ax + b^\top b}$$

which we could drop the constant $$\mathbf{b^\top b}$$ from. That's similar but decidedly different from $$f(\mathbf{x})$$. What gives? Is $$f(\mathbf{x})$$ minimizing the residual? Can the norm of the residual be massaged to look like $$f(\mathbf{x})$$? —Ben FrantzDale (talk) 21:30, 18 January 2023 (UTC)

Pathological example not PSD
The method is supposed to be used for PSD matrices, but the example given in "A pathological example" is not PSD. Thomasda (talk) 17:47, 23 January 2024 (UTC)


 * A method is not supposed to be used only for specific entries. It is supposed to be used on any input for which it converges to a solution. In, it is shown that it may be applied to any linear system. When used for unconstrained optimization, it can be used for finding a local minimum of a non-convex function, in which case, if it converges, one cannot predict which local minimum is obtained. Personally, around 1970, I used this method for finding a solution of an overconstrained system of 81 quadratic equations in 54 unknowns, by minimizing the sum of the squares of the equations (see ). More than 50 years later, I do not know any other method that works on this problem. D.Lazard (talk) 20:49, 23 January 2024 (UTC)
 * That's fair, but in the beginning of the article it is assumed A is PSD: "is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite." and "the known $$n \times n$$ matrix $$\mathbf{A}$$ is symmetric (i.e., AT = A), positive-definite (i.e. xTAx > 0 for all non-zero vectors $$\mathbf{x}$$ in Rn), and real".
 * So either it should be more clarified that the method is " supposed to be used on any input" (including in the analysis section, which also assumes PSD, as far as I can tell), or the Pathological example should note that it is not PSD, and hence doesn't conflict with the rest of the analysis.Thomasda (talk) 01:52, 24 January 2024 (UTC)