Talk:Conjugate gradient method/Archive 1

Needs example
This article needs an example usage case, preferably with picture. —Preceding unsigned comment added by 18.243.0.128 (talk) 17:09, 30 March 2008 (UTC)

Stretching, preconditioning
From reading Shewchuk's PDF (linked from this article), I'm beginning to understand this. I'm still a bit unclear about what the conjugacy condition does compared with what preconditioning does. As I understand it, the problem with steepest descent is that long valleys lead to zigzagging. It seems like picking conjugate orthogonal directions essentially stretches the energy space so that all valleys are circular rather than long and thin (see Shewchuk's figure 22 on p. 23). Is that accurate? (If so, it seems like a much simpler way of explaining the solution.) The thing is, this sounds a lot like what preconditioning is supposed to accomplish. That is, we want to use M so that $$M^{-1}A$$ has similar eigenvalues. Could someone clarify? Thanks. —Ben FrantzDale 19:11, 17 January 2007 (UTC)

Removal of "Nonlinear conjugate gradient" section
I would like to explain this edit I made. While it is true that Wikipedia material evolves in time, and starting with at least something is better than nothing, and people will eventually fix and expand on things, nevertheless, I find it a bad idea to insert content which you deliberatly know to need a lot more work.

I would still understand creating a poor quality new article. However, I would be opposed to taking a reasonably well written article as this one, and adding to it material you do not consider of the same quality. Such material will (in my experience) linger for ages without anybody touching it, resulting in a decreased overall quality of the article.

I suggest to BenFrantzDale to either fork it off to a new article, or actually make the effort of making sure the material is already good enough. Comments? Oleg Alexandrov (talk) 03:27, 18 January 2007 (UTC)
 * Fair enough. I started nonlinear conjugate gradient method. If it gets good enough it might be worth merging with this one or it might not. —Ben FrantzDale 03:36, 18 January 2007 (UTC)
 * OK. Actually I think Nonlinear conjugate gradient method has enough info to stand on its own. Although it would be nice to be expanded I think. Oleg Alexandrov (talk) 04:17, 18 January 2007 (UTC)

Another version of algorithm
I'm using such algorithm. IMHO it's clearer, and easier to understand.
 * $$r_1 := b - A x_1 \,$$
 * $$p_1 := r_1 \,$$
 * $$k := 1 \, $$
 * repeat until $$r_k \, $$ is "sufficiently small":
 * $$k := k + 1 \, $$
 * $$\alpha_k := \frac{r_k^\top r_k}{p_k^\top A p_k} \, $$
 * $$r_{k+1} := r_k - \alpha_k A p_k \, $$
 * $$\beta_k := \frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k} \, $$
 * $$p_{k+1} := r_{k+1} + \beta_k p_k \, $$
 * $$x_{k+1} := x_k - \alpha_k p_k \, $$
 * end repeat
 * The result is $$x_k \, $$

—Preceding unsigned comment added by 149.156.83.157 (talk • contribs) 10:11, 30 March 2007


 * I agree, that's a bit clearer so I changed it in the article. I changed it to check for the remainder immediately after you computed it, which I think is more logical. Thanks for your comment! -- Jitse Niesen (talk) 13:50, 30 March 2007 (UTC)


 * I don't know for me it is more logical in while expresion. When implementing CG, You can save additional vector (4, instead of 5) when written in reversed order, but it's only technical problem, not so important. y


 * Yes, the algorithm is not optimized and it doesn't need to be. It seems we disagree about the logical place for the test. Do change it if you feel strongly about it, because I don't really care. Incidentally, profuse thanks (if it was you) for fixing my sign error. -- Jitse Niesen (talk) 04:07, 31 March 2007 (UTC)

Thanks
I had a conversation two days ago with the CTO of a small high-tech company in Silicon Valley (the specific details are not relevant here). At some point we needed to see how exactly the conjugate gradient works. He said (literally) "Wikipedia must have an article about that." And sure enough, the article was there. It felt gratifying to have a personal experience of how much Wikipedia is used. In particular, Jitse, thanks for this very nice article. The time you spent on it was well-worth it. :) Oleg Alexandrov (talk) 19:44, 31 March 2007 (UTC)


 * That's very good to hear. Fortunately it wasn't an expert; that would be embarrassing as I think that the presentation can be much improved. I suppose the check with my fee is in the mail? ;) -- Jitse Niesen (talk) 04:51, 1 April 2007 (UTC)

Good information, but could be clearer
I came across the conjugate gradient method when programming it for a class in numerics for solving elliptic PDEs. I used the information here in conjunction with our own notes, and found it useful for coding the algorithm. I think the information could be made a bit clearer and and the formatting and presentation of this article improved a little. While everyone will probably already know this, maybe AT could be made clearer as A Transpose (just to make it quite clear) at the beginning. The diagram's information could be linked to the text and vice-versa. Currently, the diagram seems to stand on its own. (Starscreaming Crackerjack 09:57, 18 August 2007 (UTC))

Bold for vectors
I found this page useful, but it would help immensely if the vectors and matrices were in bold font. I sat staring for a while at a value that was treated like a vector but looked like a scalar summation. —Preceding unsigned comment added by 138.64.2.77 (talk) 20:16, 11 September 2007 (UTC)


 * A good point. This article uses the convention used in numerical mathematics: A (upper case letter) is a matrix, x (lower case letter) is a vector and α (lower case Greek letter) is a real number. I will add a note to the beginning of this article. TomyDuby (talk) 02:57, 18 June 2008 (UTC)

Philefou's Octave Implementation
In the past I was using Octave and needed the lsqr function, which is currently unavailable in Octave (http://www.gnu.org/software/octave/projects.html). I was planning to implement it with the conjugate gradient method, but I see Philefou added Octave code on January 14th, 2008. Since he/she has already written the code, I highly recommend he/she submit it to the Octave team. It should probably conform to this specification: http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/lsqr.html If Philefou reads this and decides not to contribute the code, please leave an appropriate comment and I (or maybe someone else) will write an implementation of lsqr. 168.7.230.201 (talk) 18:36, 21 January 2008 (UTC)

(Philefou) Hi everyone, I don't have much time to look at the specification and submit it. I don't mind if anyone want to use this code and submit it to Octave, if it can help someone else I'm alright with the idea!

description of iterative method is confusing
The description of the iterative method is confusing, particularly the formula defining p_{k+1}. I changed it to what it sounds like it needs to be in order to make the previous paragraphs have a lick of sense, but I'm still not sure it's right. Can someone fix this? (I guess I will eventually if no one else does, but I don't have my books on hand...)--141.211.61.254 (talk) 23:54, 4 May 2008 (UTC)

Re-render equations
Some equation images are not rendered correctly. You can see this on horizontal bars in some equations, it is blurry whereas it should be sharp. Or was that a joke, because it's about gradients? (see here, the second equation has the top symbols rendered blurry, whereas the first eq has a distinct line)
 * $$\alpha_k := \frac{r_k^\top r_k}{p_k^\top A p_k} \, $$
 * $$\beta_k := \frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k} \, $$

Visitor.iQ (talk) 15:16, 21 May 2008 (UTC)
 * Such is the TeX->png conversion. No font hinting is used, so you get that when it tries to draw a one-pixel-thick line aligned to the half pixel. —Ben FrantzDale (talk) 23:28, 21 May 2008 (UTC)

Typo in optimal step equation
Very good article, but it seems the equation should be :
 * $$\alpha_k := \frac{r_k^\top p_k}{p_k^\top A p_k} \, $$

And actually, this is correctly implemented in the octave procedure. —Wikisgoodforyou (talk) 10:07, 12 August 2008 (UTC)

Agreed. I have that form of the algorithm from class notes from a numerical methods class. I originally assumed that this had something to do with the mentioned simplifications, but I think the form that exists on the page is actually from the steepest decent method, as CG converges to SD if p_k = r_k. 152.3.159.173 (talk)

Possible Example
An anonymous user suggested an example be included on the Conjugate Gradient Method page. I'm considering adding this but will make a good attempt to keep the procedure and format as close as possible to the generalized algorithms already on the page. Please comment if you agree or disagree with this addition.

BigDix56 (talk) 03:49, 28 April 2009 (UTC)

First few lines are messy
I think that it is more correct to say: Conjugate gradient is a line search method for optimizing a function of several variables. It is often applied to solve linear systems of equations. (save the details for later) —Preceding unsigned comment added by 203.200.55.101 (talk • contribs) 05:55, 2 September 2006 (UTC)

This article is too condensed. Could you please give some more explications? Where do the alphas come from? What does mean: "This suggests taking the first basis vector p1 to be the gradient of f at x = x0, which equals -b" ? Sorry, I dont understand what you are talking about.

Ulrich —Preceding unsigned comment added by 85.90.4.11 (talk • contribs) 07:46, 6 November 2006 (UTC)

Notation and clarification.
Could someone explain the beta? The article explains that $$\alpha_k$$ is the kth component of the solution in the p basis (i.e., it is how far to go at the kth step). But what about beta? We have
 * $$\beta_k := \frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k}=\frac{\|r_{k+1}\|^2}{\|r_k\|^2} $$

and it gets used to find the next p according to
 * $$p_{k+1} := r_{k+1} + \beta_k p_k \, $$

The preceding section says that we'll update p by
 * $$ p_{k+1} = r_{k} - \frac{p_k^\top A r_{k}}{p_k^\top A p_k} p_k $$

and
 * $$r_{k+1}=r_k -\alpha Ap_k$$

so
 * $$r_k = r_{k+1} + \alpha A p_k$$

so
 * $$ p_{k+1} = r_{k+1} + \alpha A p_k - \frac{p_k^\top A r_{k}}{p_k^\top A p_k} p_k = r_{k+1} + \alpha A p_k - \frac{\langle p_k, r_{k}\rangle_A}{\langle p_k, p_k\rangle_A} p_k $$

How does this algebra work out and what does the beta mean? Maybe it's the end of the day and my algebra is just failing me. Also, the Octave code should have similar variable names to the mathematics. —Ben FrantzDale (talk) 22:56, 15 May 2008 (UTC)
 * The explanation is presented in the context. Please derive it once again yourself giving the hint provided. —Preceding unsigned comment added by 173.26.252.42 (talk) 21:46, 16 November 2009 (UTC)


 * While this is an old thread, it shows that this article has been very unclear since quite some time ago about how the the formulae for $$\alpha_k$$ and $$\beta_k$$ as they are presented in the “The resulting algorithm” section are derived from those in previous sections. The root cause of such unclarity is that it fails to point out the orthogonality of residuals, i.e., $$\langle r_i,r_j\rangle=0$$ for any $$i\neq j$$, which is an important property of the conjugate gradient method and crucial to the simplication of the formulae of $$\alpha_k$$ and $$\beta_k$$.
 * With the orthogonality of residuals, one can take advantage of the fact that $$p_k$$ is the sum of $$r_k$$ and some linear combination of $$r_0,r_1,\ldots,r_{k-1}$$ to show that


 * $$p_k^\top r_k=r_k^\top r_k$$


 * and thus


 * $$\alpha_k=\frac{p_k^\top r_k}{p_k^\top Ap_k}=\frac{r_k^\top r_k}{p_k^\top Ap_k}\text{.}$$


 * To simplify


 * $$p_{k+1}=r_{k+1}-\sum_{i\leq k}\frac{p_i^\top Ar_{k+1}}{p_i^\top Ap_i}p_i\text{,}$$


 * one can take advantage of the fact that for $$i\leq k$$,


 * $$p_i^\top Ar_{k+1}=\frac{1}{\alpha_i}(r_{i+1}-r_i)^\top r_{k+1}=\frac{1}{\alpha_i}r_{i+1}^\top r_{k+1}\neq0$$ only if $$i=k$$. [EDIT Apr 11, 2010: Corrected formula.]


 * Hence,


 * $$p_{k+1}=r_{k+1}-\frac{1}{\alpha_k}\frac{r_{k+1}^\top r_{k+1}}{p_k^\top Ap_k}p_k=r_{k+1}-\frac{p_k^\top Ap_k}{r_k^\top r_k}\frac{r_{k+1}^\top r_{k+1}}{p_k^\top Ap_k}p_k=r_{k+1}-\frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k}p_k\text{,}

$$


 * which gives


 * $$\beta_k=\frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k}\text{.}$$


 * All these derivations as well as the orthogonality property should really have been readily available in the text to help people understand the subject more easily rather than the overly simple and context-incoherent opening paragraph of “The resulting algorithm” section. Not every reader can be assumed to have sufficient capability to derive these by themselves.Kxx (talk) 04:17, 5 February 2010 (UTC)

Mistake ?
In sections 'The conjugate gradient method as an iterative method' and 'The resulting algorithm', the definition of alpha changes from p.r to r.r. Is this intentional ? The p.r version is correct, right ? - max. —Preceding unsigned comment added by 78.226.234.207 (talk) 08:50, 14 October 2010 (UTC)
 * Both expressions are correct. They are equal. Kxx (talk | contribs) 19:59, 14 October 2010 (UTC)

Minor point about quadratic forms
Is x'A'x + b'x really a quadratic form? It's not homogeneous, and my understanding is that quadratic forms must be homogeneous. Note that this is just a usage quibble; I have no complaint about the math. Birge (talk) 03:26, 16 May 2009 (UTC)


 * It is not according to "Quadratic form". I have made the modifications.Kxx (talk) 05:48, 16 May 2009 (UTC)

Small error in code
"end if" causes a sintax error in Octave. Should be "endif" or just "end". Italo Tasso (talk) 20:05, 20 July 2009 (UTC)

Major conceptual flaw
The article gives the impression that you need to keep in memory all vectors calculated in order to find a new one that is conjugate to all the previous ones, but this is not true. The vectors are calculated one by iteration and based just on the previous one, and this is precisely one of the most important aspects of the whole theory.

Yes it's that awesome, it's not a simplification, it's not a crazy new direction conjugate just to the last one, it really is conjuge to all previous in spite of we throwing them away! And it must be made clear that the algorithm is direct _and_ iterative. Sometimes you can have good approximations before the last step, and stop before, but this approximate algorithm is otherwise identical to the direct version. -- NIC1138 (talk) 05:04, 11 April 2010 (UTC)


 * It already mentions that, but is simply too vague in doing so. Kxx (talk | contribs) 06:09, 11 April 2010 (UTC)


 * I would love to see more discussion of this. I admit, much as I've tried to understand this algorithm, the details of why you only need O(n) memory and O(n) per iteration illude me. It is pretty darn cool "magic" and could really do with more explanation. —Ben FrantzDale (talk) 13:56, 12 April 2010 (UTC)


 * Yousef Saad's book has a fairly detailed coverage of CG from the perspective of specialization of the Arnoldi iteration, including derivation of the three-term recurrences, of course. PDF of the first edition is online at Saad's homepage. You can check it out. Kxx (talk | contribs) 20:23, 12 April 2010 (UTC)

dead link correction for LSQR
old link = http://www.stanford.edu/group/SOL/software/lsqr.html new link = http://www.stanford.edu/group/SOL/software/lsqr/

Notsofastyou2 (talk) 15:10, 20 August 2014 (UTC)

The new link now added

Original research in "description" section?
The "description" section has always been painfully confusing to whoever wants to understand the theory behind the conjugate gradient method as the two "as a direct/iterative method" subsections provide nothing close to the actual algorithm, which is only vaguely justified with a cryptic paragraph at the beginning of the "resulting algorithm" subsection.

It now seems even worse to me that a large portion of the section appears to consist of pure original research—although the equations are true, nobody derives the conjugate gradient method that way, i.e., expressing the solution as a linear combination of the search directions and then calculating coefficients. (Let's not mention the "numerical example" subsection, which is probably the most obvious OR offender.) I looked at Hestenes and Stiefel's paper and Jonathan Shewchuk's writing, which are close in the way of thinking. But they describe the method as a special case of the conjugate direction method, which readily gives the expressions for $&alpha;_{i}$ and $x_{i}$, no need for linear expansion.

Probably the best way out is a rewrite. Just follow either or both of the two classical derivations (conjugate direction method and Lanczos iteration). Kxx (talk | contribs) 20:12, 13 April 2010 (UTC)

This issue appears to be fixed as of now. — Preceding unsigned comment added by 24.61.185.230 (talk) 16:42, 15 March 2021 (UTC)

Monotonic improvements?
"Fortunately, the conjugate gradient method can be used as an iterative method as it provides monotonically improving approximations x_k to the exact solution." Is this really true? I find, in practice, that the residual norm(A*x_k - b) is definitely not monotonic w.r.t. k. As an example, I invite you to try the following program with the given octave code. You will see that the residuals, when x_0=[0;0;0], are [1.3776, 1.3986, 1.6422, 1.4e-14].

A := [0.100397715109637, 0.333310651201826, 0.197818584877521; 0.333310651201826, 1.443856474586787, 0.826552007973684; 0.197818584877521, 0.826552007973684, 1.143082020356276];

b := [0.964888535199277, 0.157613081677548, 0.970592781760616]';

Rocketman768 (talk) 19:32, 16 May 2012 (UTC)

A note is added that monotonicity is in the energy norm, so the issue is resolved as of now. — Preceding unsigned comment added by 24.61.185.230 (talk) 16:43, 15 March 2021 (UTC)

Complex (preconditioned) CG
The CG algorithm extends to complex matrices and vectors in both the original and the preconditioned case. The derivation for this can be found in Iterative Methods for Solving Linear Systems, A. Greenbaum. The changes to the algorithms are trivial - the transposes are replaced by the hermitian (conjugate) transpose (which I always denote as superscript H). It's necessary in the preconditioned case to take care whether it is the update for z or r that is conjugated (in the real case, it doesn't matter which is transposed), but the algorithm shown is correct. I suggest a comment should be made somewhere that the complex case is also covered. Finding references to this is _not_ trivial (almost universally the derivations say "For real..."). [edit: claiming ownership or last comment] Hgomersall (talk) 19:19, 4 March 2014 (UTC)

Done by adding a new section. — Preceding unsigned comment added by 24.61.185.230 (talk) 16:45, 15 March 2021 (UTC)