Talk:Lagrange multiplier/Archive 1

Treatment of Lagrange Multipliers as a method ONLY
The treatment of Lagrange multipliers as a method only is narrow in scope. Although commonly used in some modern text as a method, they are also a variable with an important role, otherwise referred to as the shadow price or dual variable. --Kgcrowther 15:36, 16 August 2005 (UTC)

Latest change
There was an inconsistency with how I defined the Lagrangian (and how h was defined previously) in the article and the way it is defined in "without permanent scarring", which meant that I had left out a minus sign. I think the equations as written now are right, but I'm not sure which form is most standard. M0nkey 03:02, 10 February 2006 (UTC) (formerly 68.238.90.222)

The second total derivative of f(x, y)
What's that (very simple example)? - could somebody clarify this bit for me. Thanks --catslash 23:00, 18 February 2007 (UTC)


 * I'm not quite sure what the author meant, but it probably has something to do with total derivative. Anyway, I rewrote that part of the example so that it does not refer anymore to total derivatives. If anybody want to give the general procedure for classifying constrained critical points, please go ahead, but for a simple example I think we should use the simple procedure. -- Jitse Niesen (talk) 13:17, 10 March 2007 (UTC)

Constrained Systems of Equations
Does anyone want to contribute a section and an example with polynomials that a high school algebra student could comprehend? Larry R. Holmgren 04:17, 24 March 2007 (UTC)


 * The section "Simple example" contains an example with polynomials. It's hard to come up with a simpler example, though the explanation in that section can probably be improved. Perhaps you can tell where you lose the plot when reading the section, and we can work on improving it?
 * PS: Please do not use ~ (four tildes) when you fill in the edit summary. It does no harm, but it looks a bit silly (look up your edit in the history to see what I mean). -- Jitse Niesen (talk) 12:46, 25 March 2007 (UTC)

How does one interpret Lagrange Multiplier which ar produced on the Sensitive Report of Solver ® as an add-in to Excel®? Specifically how does not interpret 0% compared to 0.00?

g(...) is the gradient?
g(x) is often used as notation for the gradient of the function f(x). Initially, I wasn't sure if that was what was meant in this article. Could it maybe be defined a bit clearer somewhere? Maybe given explicit

"Suppose we have a function, f(x,y), to maximize subject to (add:) the constraint (/add)


 * $$g\left( x,y \right) = c$$,

where c is a constant. "

Im very new to contributing to Wikipedia, so I would rather not make the change myself - especially because im not sure its correct. :)

--Pellarin 10:11, 18 September 2007 (UTC)


 * Hi, I don't think the notation $$g(f)$$ for $$\operatorname{grad}(f)$$ or $$\nabla f$$ is very common; it's not mentioned in the gradient article, and I've not come across it myself (perhaps it used in some particular field of study?). Conversely, I do think that the use of $$g$$ for an arbitrary function (having already used $$f$$), is pretty common and widely understood. Yes, adding the constraint would make it read better; I will do it, but be bold next time! --catslash 10:40, 18 September 2007 (UTC)

Unique solutions?
"a number of unique equations totaling the length of x plus the length of λ. Thus, it is possible to obtain unique values for every x and λk, without inverting the gk" - unless I've misunderstood something, this is inaccurate. This claim cites Mathworld's article on LMs, but that article makes no mention of uniqueness. Having n equations and n unknowns does not guarantee uniqueness.

For example: f(x1,x2) = x1^2 + x2^2, constraint g(x1,x2) = x1^2 + 2x2^2 -1 = 0. It's trivial to see that (1,0) and (-1,0) both maximise f on g, and (0,1/sqrt(2)) and (0,-1/sqrt(2)) both minimise it, so solutions most certainly aren't unique. --144.53.251.2 06:57, 15 October 2007 (UTC)

$$\mbox{grad } g_i$$
The phrase below seems to apply to the case with multiple constraints when the current formulation is for a single constraint:

"$$\mbox{grad } f$$ is a linear combination of $$\mbox{grad } g_i$$."

so I am replacing it with

"$$\mbox{grad } f = \lambda \mbox{grad } g$$ for some real $$\lambda$$."

Also using $$\nabla$$ for consistency with the remaining of the article. —Preceding unsigned comment added by 128.32.39.137 (talk) 06:14, 16 October 2007 (UTC)

Rodrigo de Salvo Braz 06:21, 16 October 2007 (UTC)

Additional Sections for Lagrange Multipliers
Economic Interpretation
 * An important addition to the discussion on Lagrange multipliers, might be the economic interpretation of such variables, wherein they act as a quantity of marginal utility of a specific constraint to the objective function.

Primal-Dual Relationship
 * They also are fundamental in understanding the relationship between a primal and dual optimization problem. This is related to the economic interpretation since in the dual problem the Lagrangian multipliers become the decision variables and the primal decision variables have the role of the Lagrange mulipliers.

Complimentary Slackness When constraints are inequalities they are variables that indicate whether a constraint is binding. If binding then the multiplier is strictly greater than zero.

Trade-off between multiple objectives
 * In multi-objective optimization, one can formulate using the epsilon-constraint method, then the Lagrange multiplier can be interpreted as the tradeoff between mutliple objectives, which becomes an important value in decision analysis to understand the topology of the Pareto optimal frontier.

How to handle inequality
 * A discussion of how to introduce Lagrange multipliers in case of inequality constraints would be nice, since the only change required is to impose a sign requirement of the Lagrange multiplier. A figure explaining why this is so would be even nicer.

Please add to this list, or post information in the article and delete from this list, otherwise I will try to get my facts together and some references and post, using the established notation.--Kgcrowther 01:00, 16 August 2005 (UTC)


 * I had roughly the same things to say... this article seriously needs expansion. --02:28, 21 November 2007 (UTC)

(n+k) Variables?
The opening paragraph, shouldn't it be (n-k) variables? —Preceding unsigned comment added by 134.10.122.120 (talk) 05:22, 4 December 2007 (UTC)
 * No. Each constraint adds an unknown multiplier. In the first example, f(x,y) with one constraint results in solving for Λ(x,y,λ). Two variables and one constraint gives you three variables unconstrained. Robben (talk) 08:49, 11 January 2008 (UTC)

Describing the Gradient Vectors as Parallel Vectors
In the "Introduction" the gradient vectors are said to be parallel at a extrema. I think that this is not entirely correct. Please correct me if I am wrong, but I think that in the example given the gradient vectors of the surfaces in question are only parallel when projected onto the x,y plane. Perhaps this is implied in the description, but regardless, this is a point I am confused on. —Preceding unsigned comment added by Jfischoff (talk • contribs) 20:50, 22 February 2008 (UTC)

They are parallel (in the n dimensional space), and not just on the projection. I think. ..
 * Veritas (talk) 20:35, 15 September 2008 (UTC)

The "general formulation" section
Hello, author of two recently tagged "vandalism" edits here. I was reading this article to learn about Lagrange multipliers, and understood it until this section. It took me nearly ten minutes to figure out what the first sentence was trying to say, so I thought I'd help out those who came after me by writing down what I figured out.

If you have specific technical reasons for reverting my changes, definitely let me know. It certainly wasn't vandalism. I'm pretty sure I didn't convert the page into a "wrong" state, nor do I believe I removed any helpful, correct information... so I'm not sure what reasons are left.

Thanks. --dmwit —Preceding unsigned comment added by 158.130.12.222 (talk) 00:37, 12 November 2008 (UTC)

I see the turnaround on this page is significantly longer than on the actual Lagrange Multipliers page, so in case I don't check back, here's the original and the changes I suggested.

Original:

Denote the objective function by $$f(\mathbf x)$$ and let the constraints be given by $$h_k(\mathbf x)=c_k$$, perhaps by moving constants to the left, as in $$h_k(\mathbf x)-c_k=g_k(\mathbf x)$$.

To understand how confusing this is, you first have to realize that $$h_k$$ is not mentioned anywhere else in the article, and that the ensuing equation for $$g_k$$ is in fact defining $$g_k$$. Neither of those things is clear from the text, so my first change was to:

Denote the objective function by $$f(\mathbf x)$$ and let the constraints be given by $$g_k(\mathbf x)=0$$. For nonzero constraints of the form $$h_k(\mathbf x)=c_k$$, you can of course move the constants to the left by defining $$g_k(\mathbf x)=h_k(\mathbf x)-c_k$$.

These two sentences preserve the meaning and content of the previous sentence. But when you write it this way, the second sentence should be totally obvious to any mathematician sophisticated enough to have read this far in the article. So my second edit is just to remove that sentence in favor of brevity.

I think each edit is totally justifiable. Please let me know what part of those changes is objectionable.

Thanks again, --dmwit —Preceding unsigned comment added by 158.130.12.222 (talk) 00:57, 12 November 2008 (UTC)


 * I agree, so I reinstated your version. Sorry about the rash treatment you received here. -- Jitse Niesen (talk) 13:52, 12 November 2008 (UTC)


 * Just for your information: The user who labelled your edits as vandalism has apparently done this with more edits, as his/her talk page records multiple complaints about this issue. Needless to say, such behaviour is totally inappropriate. -- Jitse Niesen (talk) 14:01, 12 November 2008 (UTC)

"Maximization" of Lagrangian
There seems to be a fundamental misconception in this article, summarized in the introduction:

Whenever we find a maximum (x,y,λ) for this new unconstrained problem, (x,y) furnishes a maximum for our original constrained problem.

As mentioned later in the article, the Lagrangian is actually unbounded from above and below (provided there is an infeasible point), so this statement cannot be true. I admit I'm also a bit confused at this point, but I believe the general technique is to find saddle points of the Lagrangian, since extrema carry no useful information. This corresponds to maximizing the Lagrangian over (x,y) given λ, yielding a function of λ, then minimizing this function over λ. This view is then a special case of the more general notion of Lagrangian duality.

This confusion also seems to be present in the "justification" and "caveat" subsections. The justification section was recently overhauled, but the logic seems to be incorrect--it assumes that one is to maximize the Lagrangian simultaneously over (x,y,λ). The caveat section also seems self-contradictory because again, the goal is not to maximize the Lagrangian over both the primal and dual variables.

Input would be appreciated.--Paul Vernaza (talk) 01:08, 5 January 2009 (UTC)
 * You are absolutely correct, you are pointing out an error that absolutely must be corrected. As you point out, this error is repeated below in Section 1.1, ( Justification ), since if $$g(x,y)-c\ne 0$$ then $$f(x,y) + \lambda(g(x,y)-c)$$ can be arbitrarily large, because $$\lambda$$ can be any real. I don't believe anything in the argument given in Section 1.1 can be rescued; the whole section seems to be wrong, and the method of Lagrange multipliers seems to make sense only in the context of local differential analysis, unless one also limits the range of $$\lambda$$ (in fact, this is done in Section 6, The strong Lagrangian principle: Lagrange duality, where the assumption $$\lambda\ge 0$$ is included). I believe the article needs urgent attention. I am not ready to undertake the editing of these parts of the article just yet, but I believe urgent input is needed as to what to do. Mateat (talk) 17:34, 7 January 2009 (UTC)

 Calqtopia:  I've been having a related discussion of some of these issues with Simplifix (talk) on our two talk pages. I was unaware of discussion pages for articles until he/she pointed it out to me, or else I would have posted here to begin with. As I understand things, even locally the Lagrangian cannot take maxima or minima (except where the conditions of the Lagrange Multiplier Theorem are violated) -- all stationary points can be shown to be saddle points of the Lagrangian. Calqtopia (talk) 04:53, 9 January 2009 (UTC)


 * You should all be less afraid of editing the articles. We have a guideline here that you should be bold when updating pages. That said, I'm grateful that you bring these problems to attention. Anyway, I did get rid of some problematic parts, in particular the sentence quoted by Paul Vernaza, the whole Justification subsection and most of the Caveat subsection. I also rewrote some of the Introduction section, but more work is surely needed. Are the conditions mentioned in the section on the weak Lagrange principle (objective and constraints are C^1 and constraints have nonzero gradient) enough? -- Jitse Niesen (talk) 17:14, 9 January 2009 (UTC)

This Article Is Almost Completely Wrong!?
Shouldn't the Lagrange multiplier be $$F(x,y,\lambda )=f(x,y)-\lambda (g(x,y)-c)$$. I know that λ is a variable, but I get a wrong answer when I try to use $$F(x,y,\lambda )=f(x,y)+\lambda (g(x,y)-c)$$. This same error occurs throughout the article. You should correct it! Please let me know if I am sorely mistaken.
 * Veritas (talk) 20:25, 15 September 2008 (UTC)
 * It doesn't matter whether you call the multiplier $$+\lambda$$ or $$-\lambda$$, you just get a different value when you solve for $$\lambda$$ (minus the value you would have got from introducing $$\lambda$$ with the opposite sign - and so it comes out the same) --catslash (talk) 22:02, 15 September 2008 (UTC)
 * This matter has come up before; I recall edits where the sign of λ has been reversed in a few places. Perhaps the article should mention that there are differing conventions. --catslash (talk) 22:11, 15 September 2008 (UTC)
 * I agree that this is possibly very wrong and that it should be $$F(x,y,\lambda )=f(x,y)-\lambda (g(x,y)-c)$$. -cmansley —Preceding undated comment was added at 05:06, 27 September 2008 (UTC).
 * That's equivalent to saying that the constraint should be written $$-g(x,y) = -c$$, not $$g(x,y) = c$$, whereas I would contend that these are the same thing. If you reckon there's a difference, then please explain why. I would be in favour of negating $$\lambda$$ throughout the article, if I didn't believe that next week somebody would say this was wrong, and it should be positive. --catslash (talk) 13:33, 28 September 2008 (UTC)
 * Agreed, this is purely a matter of convention. --Paul Vernaza (talk) 00:25, 5 January 2009 (UTC)
 * Somewhat along these lines, why is $$c$$ being included in $$F(x,y,\lambda )=f(x,y)-\lambda (g(x,y)-c)$$ at all? Since we are only concerned with $$ \nabla F $$, the constant is irrelevant. —Preceding unsigned comment added by 24.227.222.38 (talk) 03:56, 2 March 2009 (UTC)
 * The $$c$$ is inside the brackets (it's multipied by $$\lambda$$), so in $$F$$ it's part of the coefficient of $$\lambda$$ rather than being a constant term.--catslash (talk) 11:19, 2 March 2009 (UTC)


 * And written this way, it also implies that critical points of the Lagrangian do satisfy the constraint $$g(x) = c$$ (expressed by the partial derivative in $$\lambda$$). Looking for critical points of the Lagrangian is usually considered a nice short way of summarizing things, which you would ruin by removing c. --Bdmy (talk) 13:15, 2 March 2009 (UTC)

A wrong sign before lambda?
[see also preceding section] after the sentence "We introduce a new variable (λ) called a Lagrange multiplier, and study the Lagrange function defined by" at the beginning, should the sign be a "+" before lambda?


 * $$ \Lambda(x,y,\lambda) = f(x,y) + \lambda \Big(g(x,y)-c\Big).$$

I think it could be a pencil-error, because the wiki versions in other languages have "+", only here there is such an exception.

——Nutcracker胡桃夹子^.^tell me... 23:54, 16 April 2009 (UTC)


 * It works whichever sign you use with λ. See the discussion "This Article Is Almost Completely Wrong!?" above, and this edit I've just made. --catslash (talk) 16:23, 19 April 2009 (UTC)

I know that it seems to be a sign problem of lambda, but in fact if we use minus sign in some macroeconometric problem, we will get wrong results. The reason I cannot explain very well, but I have tried it and compared them. If the same wiki articles in other languages use the plus sign, it seems better to use the same sign with the others.

——Nutcracker胡桃夹子^.^tell me... 17:58, 20 April 2009 (UTC)

Entropy Example
What about the constraint that the probabilities be non-negative? The article should mention something about this. —Preceding unsigned comment added by 128.12.188.47 (talk) 04:01, 1 May 2009 (UTC)

Linear systems and positive-definiteness
I'm reading The Finite Element Method by Hughes, which discusses Lagrange multipliers and points out that for positive-definite systems, the Lagrange-multiplier constrained system is not positive definite. For example, suppose
 * $$K=\begin{bmatrix}2 & -1 \\ -1 & 1\end{bmatrix}$$

that is, K describes two particles in 1D with the 1st DoF tied to the ground with a spring and the two tied together with a spring of the same stiffness. The force vector is then
 * $$F=-K x$$

and the spring energy is then
 * $$E=\frac{1}{2} x^\top K x$$.

If we want to find the pose for a given applied force, we just solve
 * $$x=-K^{-1}F$$.

Now, suppose we want to constrain it to $$x_2 = 4$$. We could insert the constraint, remove the corresponding row and column of the system, and solve, but suppose we want to use Lagrange multipliers. We could construct
 * $$K_L=\begin{bmatrix}2 & -1 & 0 \\ -1 & 1 & 1\\ 0 & 1 & 0\end{bmatrix}$$

and then augment F and x accordingly:
 * $$F_L = \begin{bmatrix}F\\ 4\end{bmatrix}\qquad \qquad x_L = \begin{bmatrix}x\\ \lambda\end{bmatrix}$$

then the Lagrange-multiplier version of the system is
 * $$\begin{bmatrix}2 & -1 & 0 \\ -1 & 1 & 1\\ 0 & 1 & 0\end{bmatrix} \begin{bmatrix}x_1\\x_2\\ \lambda\end{bmatrix} = \begin{bmatrix}F_1\\F_2\\ 4\end{bmatrix}$$

Unlike K, $$K_L$$ is not positive-definite (set x=0 and &lambda;=1 and you get $$x_L^\top K_L x_L = 0$$).

So...
 * 1) Should the page discuss application of Lagrange multipliers to linear systems?
 * 2) It sounds like this loss of positive-definiteness is important as it changes which solvers you can use. Does that deserve mention on this page? —Ben FrantzDale (talk) 00:17, 27 July 2009 (UTC)

Circular changes over sign of lambda
Following repeated and inconsistent changes to the sign of lambda in the article (see the two preceding sections), I inserted[] an explanation that the sign of lambda was arbitrary and that the article did not consistently adhere to any convention on this matter. This was subsequently trimmed to a note that the lambda term could either be added or subtracted - however I believe that the article was changed so that it was consistently added. The most recent edits have removed the note and changed the sign in a couple of places. I intend to revert these edits, as it seems (at least to me) clearer as it was. Please discuss if you disagree --catslash (talk) 16:07, 10 July 2010 (UTC)

Kuhn Tucker vector(s) and inequality constraints
Some hero of Wikipedia should add a section on inequality constraints and Lagrangian duality, perhaps using the references I added by Lemaréchal and Bertsekas.

Also, Rockafellar's Convex Analysis suggests disambiguating Lagrangian multipliers, which are used in iterative methods, from Kuhn Tucker vectors, which are Lagrangian multipliers at saddle points of the Lagrangian function.

Thanks, Kiefer.Wolfowitz (talk) 00:24, 17 December 2010 (UTC)

Sign of lambda revisited
The sign of &lambda; in the formula given in the lede is arguably wrong for at least two reasons. First, and most simply, it does not agree with the convention in the cited source. Secondly, the interpretation of &lambda; as it is usually construed is that it represents the rate of change of the critical values with respect to changes in the constraint. With the opposite sign, this is no longer true: rather it is the negative of this rate of change. It has falsely been claimed by User:Catslash that the sign of &lambda; is purely a matter of convention, but in this geometrical context the sign is more than conventional. At any rate, the article as it presently stands is inconsistent, using different sign conventions in different places, and with the resulting geometrical interpretation of &lambda; rendered unintelligible by the lack of consistency. Unless there are strenuous objections backed by good reasons and sources, I propose to make the article consistent with what appear to be the usual conventions once again. Sławomir Biały (talk) 16:44, 10 July 2010 (UTC)
 * I have no objection to the &lambda; term being either consistently subtracted or consistently added throughout the article. My only objection is to the claim that one way is right and one is wrong as this seems to lead to confusion and pointless arguments. --catslash (talk) 00:32, 11 July 2010 (UTC).
 * Yes, it does seem to be inconsistent at the moment. Don't forget the examples at the end and the accompanying diagrams if you're intending to restore consistency. --catslash (talk) 00:48, 11 July 2010 (UTC)
 * You are right, we should follow the sources. However a quick trawl through Google Books does not reveal an overwhelming majority in favour of either sign. Therefore I suggest we follow the originator of the article - as we would do for the choice between British and American spelling or between LaTex and HTML for in-line formulae. --catslash (talk) 12:49, 11 July 2010 (UTC)
 * This mathematically simple problem has brought about a lot of controversy. Creating confusion is obviously not what Wikipedia is meant to do, and yet a mathematics student ought to be able to see that an arbitrarily chosen variable has an arbitrary sign. So why not make the sign consistent in this article and use the sign used in the sited literature ( or the sign most often used in the general literature ) and then explain to any one who has any doubts that the sign can be chosen either way and warn people that either sign may be found in the literature. Doesn't that solve problems of confusion about this page, the method described and the nature of an arbitrarily chosen variable while educating readers of similar problems they will encounter in other mathematical literature of the same level? —Preceding unsigned comment added by 24.177.4.78 (talk) 22:06, 28 February 2011 (UTC)


 * My own trawl actually reveals the same, to my surprise actually. There seems to be a difference between economics conventions, which tend to favor &minus;&lambda;, and mathematics conventions favoring &lambda;.  (Not sure about, e.g., physics.)  Perhaps the article should make a more careful and systematic effort to explain the two conventions, the reasons for them, etc.?  This would seem also to be the best way to address the concerns of the earlier thread that the wrong conventions sometimes lead to the wrong answers.  Best,  Sławomir Biały  (talk) 13:11, 11 July 2010 (UTC)


 * In economics and optimization, the convention is that the Lagrange-multiplier vector (or Kuhn-Tucker vector) of inequality constraints of the form g(x)≤0 are non-negative, because prices are non-negative. Kiefer.Wolfowitz  (Discussion) 23:14, 28 February 2011 (UTC)

The sign in front of Lagrange multiplier in this article is bothering me for the following reasons:
 * An encyclopedia should present definitions in their most commonly stated forms. This is part of a widespread principle that says to use the most commonly used English names to describe objects. A reader of this article would like to see the Lagrange function as he can see it in the majority of reliable sources. It appears to me that in the majority of sources (including the Encyclopedia of Mathematics cited here right after the definition of Lagrange function but with the sign opposite to the one in the Encyclopedia) the Lagrange function is
 * $$ \Lambda(x,y,\lambda) = f(x,y) + \lambda \cdot \Big(c-g(x,y)\Big)$$


 * Lagrange multiplier in the Lagrange function defined above has a clear interpretation. It represents the rate of change of the objective function to change in the independent term of the constraint. In the current form, the article says “λk is the rate of change of the quantity being optimized as a function of the constraint variable.” What “constraint variable” are we talking about here? In the current form the interpretation of the multiplier is pretty much lost.


 * In the current form, there are still some inconsistencies in the article, for example, lambda in the following equation corresponds to the lambda in the definition above, but not to the current definition in the article
 * $$\nabla f=\lambda \, \nabla g$$


 * Changing the sign consistently across the article will require changing the last four plots. Two of them are available with Matlab code and can be easily reconstructed. The other two bitmap plots are in public domain and can be changed in a graphic editor. In the end, going back to the conventional sign is not too resource consuming.

Lagrange's Undetermined Multipliers
In my UK experieence, as far as I recall, these have always been termed "Lagrange's Undetermined Multipliers", or just "Undetermined Multipliers". To aid searching, I suggest that the term "Lagrange's Undetermined Multipliers" be included at least once in the Article. 94.30.84.71 (talk) 19:10, 29 November 2011 (UTC)

Link dead: Lagrange_multipliers#The_strong_Lagrangian_principle:_Lagrange_duality
From the article shadow price a link is set to Lagrange_multipliers. Unfortunately, this article/section does no longer exist, and I couldn't find any subtitle that would fit the term. Is there anyone around who knows enough on the subject to correct the link to the relevant section and/or add that section to the article?

Thanks a lot! --Alpenfreund (talk) 14:40, 4 January 2012 (UTC)
 * It seems you have solved this yourself. Isheden (talk) 15:04, 4 January 2012 (UTC)

incorrect paraphrasing of "tangential intersection"
A sentence in the first section reads as follows: "Only when the contour line for g=c meets contour lines of f tangentially, do we not increase or decrease the value of f — that is, when the contour lines touch but do not cross." Everything before the hyphen is correct. But it is wrong to paraphrase "tangential intersection" as "touch but do not cross." The levels y-x^3=0 and y+x^3=0 intersect tangentially at the origin (because their gradients are parallel) but the curves DO cross. It is misleading to paraphrase "tangential intersection" as "not crossing" -- because it makes it seem as though the condition of parallel gradients should be SUFFICIENT for an extremum when really it's only necessary, and obscures the fact that further analysis has to be done. The possibility of tangential-but-crossing levels amounts to the possibility of a vanishing derivative without a local extremum. Mlord (talk) 06:21, 12 December 2012 (UTC)Mlord

Inaccurate intuition: $$\nabla f=\lambda\nabla g$$ does not mean that contours must be tangent
This article provides the common intuition that, at a point of local maximum or minimum, a level curve of f should be tangent to a level curve of g---because f is increasing or decreasing as one crosses a level curve of f.

However, this is not always true. Suppose one adapts example 1 of the article so that we minimize $$f(x,y)=(x+y)^2$$ along the circle $$g(x,y)=x^2+y^2=1$$. Now the level sets of f are still lines of slope -1, and the points on the circle tangent to these level sets are the same solutions, $$(\sqrt{2}/2,\sqrt{2}/2)$$ and $$(-\sqrt{2}/2,-\sqrt{2}/2)$$.

And yet, the actual minima are at $$(\sqrt{2}/2,-\sqrt{2}/2)$$ and $$(-\sqrt{2}/2,\sqrt{2}/2)$$, where the level curves of f are exactly perpendicular to the circle, not tangent. The condition that $$\nabla f=\lambda\nabla g$$ correctly identifies all four points as extrema; the intuition that the circle should be tangent to a level curve misses two of the solutions, the two minima.

The problem is that this function f is minimized at all points along a level curve, specifically the level curve $$y=-x$$, so it is false to say that f is increasing or decreasing as one crosses a level curve of f. The level curve may actually be the maximum or minimum. This corresponds to points where $$\nabla f=(0,0)$$, which is a solution to $$\nabla f=\lambda\nabla g$$ with $$\lambda=0$$ --- a solution where $$\nabla f$$ and $$\nabla g$$ do not have to be pointing along the same line.

Perhaps an example 1.5 might help to clarify this potential pitfall. — Preceding unsigned comment added by 67.242.118.202 (talk) 05:35, 3 October 2013 (UTC)

This sentence does not make much sense...
"Only when the contour we are crossing actually touches tangentially the contour g = c we are following will this not be possible"

Perhaps it should be changed to:

"Only when the contour we are crossing touches tangentially the value of f does not change."

??
 * I've changed it. How do you like it now? -lethe talk [ +] 18:53, 9 April 2006 (UTC)

I think this could still be cleaned up some, I spent some time looking at this and still don't get this tangent touching thing, lengthening this couldn't hurt. —Preceding unsigned comment added by 96.237.10.225 (talk) 03:25, 11 February 2009 (UTC)

Actually I think there is a major flaw in this part of the explanation. It refers to the graphic and talks about visualizing the contour lines of both f and g, but the graphic only shows the contour of f, in blue. The red line for g(x,y)=c is not the contour line of g. If it were, all its points would have the same "height" (z coordinate) equal to c, but they vary in height. The red line is in fact the set of points { (x,y,f(x,y)) | g(x,y)=c }. The contour line of g(x,y) = c is, instead, the set { (x,y,c) | g(x,y)=c }. --Rodrigo de Salvo Braz (talk) 17:24, 16 January 2014 (UTC)

Confusing statements
The phrase "the Lagrangian Λ" is suddenly thrown at us without any explanation of how or why it's a Lagrangian (I've been studying Lagrangians intensely and I can't see any connection at all).
 * I agree. Chris2crawford (talk) 11:59, 31 May 2014 (UTC)

Also, the article suddenly says "the concept of 'crossing' discussed above", when no such concept has been discussed. The word "crossing" does not even appear in the article prior to that point. 24.55.17.191 (talk) 14:34, 16 March 2014 (UTC)

Bad wording
In the phrase "if the direction that changes f": there are many directions that change f.   You can only say that f is minimized along a constraint every direction in which f changes violates the constraint. I see no way to improve the logic of this section, besides stating that a) the contours of f and g must be tangent and therefore the gradients parallel, or b) the contour of either function is a point, in which case one of the gradients is zero. Chris2crawford (talk) 11:59, 31 May 2014 (UTC)

Simplicity
This article barely introduces a non mathematician to Lagrange multipliers. The first few sentences could do with a little English.


 * the obsession with tailoring all wikipedia articles to non-experts is horrible. It makes articles like this completely useless to experts.  Wikipedia is not a textbook!


 * That's why we organize each article into sections. Experts can simply skip the introductory sections if they do not want to hear intuitive-level explanations.--Headlessplatter (talk) 20:00, 24 June 2014 (UTC)

The article doesnt also mention the 'real world' applications of Lagrange multipliers. Kendirangu 07:32, 23 January 2007 (UTC)


 * I have to disagree; the lead has only words and a picture with no algebra. The remainder of the article only uses school-level algebra, and so should be comprehensible to many non-mathematicians. Applications in mechanics and economics are mentioned (in addition to the worked examples). --catslash 23:00, 18 February 2007 (UTC)


 * I agree with the simplicity comment, and added a gentler statement of the idea behind Lagrange multipliers. Having "only words" does not make the intro simple... the additional vocab such as "stationary points", "constrained functions", "Fermat's theorem", etc. make it a difficult read for the non-expert. Also, articles should provide intuition as well as mechanics. Triathematician (talk) 16:25, 1 January 2008 (UTC)


 * OK, yes a more intuitive/graphic description is desirable, and that was a valiant attempt, (and here comes the but) but the use of set of points is pretty technical, and I found it hard to relate the second sentence to the concept in question. Surely the point is that we can't just go anywhere we want on the mountain, but have to find the highest point we can reach whilst staying on the footpaths? Sorry not to be offering constructive suggestions! --catslash (talk) 21:16, 1 January 2008 (UTC)


 * My assumption was that "peak" implies the very highest point on the mountain. It's an inexact analogy, but I wanted to get the point of tangent surfaces across somehow without bringing in objective, constraint functions, etc. Triathematician (talk) 13:37, 4 January 2008 (UTC)

I completely agree. The first few sentences are very poorly written, not concise and fail to quickly convey a langrange multiplier is. After a few minutes I just gave up reading this article and just checked one of my textbooks. Needless to say my textbook explained it in about 5 minutes of reading. Even now I still can barely understand the first few sentences. Why use words like 'extrema', 'stationary points'? —Preceding unsigned comment added by 128.100.241.23 (talk) 06:29, 21 October 2008 (UTC)


 * I reverted Jitse Niesen's changes to the first paragraph. I rewrote the paragraph a few months ago to make it as simple as possible. The purpose of the first paragraph should be to quickly convey the basics to an average reader. The practical application of a lagrange multiplier is very simple and trivial to understand, so we don't need to introduce unusual terminology (stationary points) or get into corner cases. I deleted your new paragraph but feel free to move it to another section on the page. Andrewcanis (talk) 10:22, 6 February 2009 (UTC)


 * I have restored Jitse Niesen's changes. I think it is important to keep the comments that this only provides a necessary condition.  Perhaps we could take out the phrase stationary point, but as it is standard terminology in discussing Legrange multipliers it seems a bit strange to me to do so.  Thenub314 (talk) 12:52, 6 February 2009 (UTC)


 * Andrew, the reason I felt it necessary to edit the paragraph you wrote is the following sentence: "We introduce a new variable ($$\lambda$$) called a Lagrange multiplier to rewrite the problem as: maximize $$f\left( x, y \right) + \lambda\left( g\left( x, y \right) - c \right). $$" As noted in the section, this maximization problem has no solution, so you cannot say that you rewrite the problem in this way. -- Jitse Niesen (talk) 16:58, 6 February 2009 (UTC)

Removal of civil engineerring example
I have removed the example in the "Economics" section, concerning the application of the Lagrange multipliers method to the construction of Burj Khalifa, because I have found no explicit mentioning of this method in the cited article. (Plus, civil engineering has nothing to do with economics, plus even the title of the afore-mentioned article was wrong. The person having added that sentence has been really superficial.) — Preceding unsigned comment added by 109.102.209.229 (talk) 11:38, 24 January 2015 (UTC)

Rendering as PDF
I noticed adding this page to Book Creator, or using the Download as PDF button, breaks both of these features. As I am able to export many other maths related pages, there is something on this page which is causing the problem. However I haven't got any further than narrowing down to this page I am afraid!

Chris Alexander UK (talk) 16:28, 10 March 2017 (UTC)

Assessment comment
Substituted at 21:48, 26 June 2016 (UTC)


 * Regarding Figure 1 being wrong, someone also mentioned that above in November 2013. I have now removed the incorrect figure. Loraof (talk) 02:25, 9 April 2017 (UTC)

Legend of Figure 1.
The red line in Figure 1 is NOT the constraint, contrary to what the legend says. The constraint is a curve in the x-y plane and f(x,y) is a surface in 3 dimensions. Rather, the red line is the projection of the constraint in the direction of the +z-axis onto the f(x,y) surface. The constraint does NOT touch or cross the contour of f(x,y) as shown in Figure 1. Perhaps all discussions int the main body of the article about the constraint touching or crossing surfaces in 3 or more dimensions should be reexamined. 178.48.120.40 (talk) 15:17, 8 November 2013 (UTC)


 * You're right, and I've removed the incorrect figure. Loraof (talk) 02:26, 9 April 2017 (UTC)

Lagrange Multipliers Can Fail to Determine Extrema
I think there is a particular case that the article fails to address. The simplest example is: $$max~x+y$$ under the condition $$x^2+y^2=0$$. The Lagrangian equations have no solution, yet there is a maximum achieved by point $$(0,0)$$.

More details/explanations can be found in the following paper: https://www.maa.org/sites/default/files/nunemacher01010325718.pdf

The following manuscripts state what many others seem to forget: the whole argument only holds in points where the gradient of $$g$$ is non zero:

https://math.dartmouth.edu/archive/m14f04/public_html/math_14_lagrange.pdf

http://home.iitk.ac.in/~psraj/mth101/lecture_notes/lecture31.pdf

ps. You can also consider problem $$min~f(x,y)=x$$ under the constraint $$g(x,y)=x^3-y^2=0$$. The contour lines of $$f$$ and $$g$$ are tangent at $$(0,0)$$. One could say that the derivative of $$f$$ along $$g$$ is zero at $$(0,0)$$ (constrained critical point). Yet $$\nabla f=(1,0)$$ and $$\nabla g=(0,0)$$ and there is no $$\lambda$$ such that $$\nabla f=\lambda \nabla g$$.

Daniel.porumbel (talk) 21:02, 17 April 2017 (UTC)


 * This is very important and must be addressed. I'll try to correct the article myself as soon as I've time to spare.
 * Similarly to your references, I've at hand also one undergraduate-level textbook that states that in the case of a single constraint equation $$g(\mathbf{x})=b$$ any critical point of g (that is any point $$\mathbf{x} \colon \nabla g(\mathbf{x})=\mathbf{0}$$) can't be covered by the Lagrange multipliers method and should be analyzed separately.
 * On the other hand, I couldn't find any clear reference concerning the case with multiple constraint equations $$\mathbf{g}(\mathbf{x})=\mathbf{b}$$. To my understanding the points where the method doesn't applies are the singular points of the search set (that must be an algebraic variety?), that are the points where the tangent space may not be regularly defined. Note that in the case of a single constraint the search space is often a curve. Anyway, the singular points of an algebraic variety are those where the rank of the Jacobian matrix of g is not maximal, so yet again they're the critical points of g. — Esponenziale (talk) 00:23, 18 May 2018 (UTC)

Article name
One should consider changing the article name to "Method of Lagrange multipliers". It is true that Lagrange multipliers have interesting interpretations on their own, but they originate from a method of incorporating constraints and this is how the topic is typically introduced in courses. If the current title Lagrange multiplier is kept, then the lead sentence should be rewritten to comply with WP:LEADSENTENCE, i.e. concisely define what a Lagrange multiplier is. Isheden (talk) 12:17, 19 February 2013 (UTC)


 * A Lagrange multiplier itself is just a real number. I think it is clear that anyone talking about Lagrange multipliers is doing so in the context of using them as a method, and that it isn't necessary to make that explicit in the title. Unnachamois (talk) 02:33, 17 August 2013 (UTC)


 * So if the article title is kept, how could the first sentence define a Lagrange multiplier? How about: In mathematical optimization, a Lagrange multiplier (named after Joseph Louis Lagrange) is a weighting factor used to incorporate a constraint into the objective function. The method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equality constraints. Isheden (talk) 09:21, 26 August 2013 (UTC)


 * My opinion is that the article is primarily about the Lagrange multipliers method and need to be moved there. If there's no one against I'll move it next week. —Esponenziale (talk) 15:39, 18 May 2018 (UTC)

Lagrange multipliers without the multipliers
The method of Lagrange multipliers can be taught without ever referring to the multipliers $$\lambda$$ or $$\mu$$. Indeed, in the case of a two-variable function $$f(x,y)$$ with one constraint $$g(x,y)=c$$, the gradients $$\nabla f$$ and $$\nabla g$$ are parallel (i.e., one is $$\lambda$$ times the other for some parameter $$\lambda$$) if, and only if, the 2 by 2 determinant $$f_{x}g_{y}-f_{y}g_{x}=0$$. Thus the set of equations to be solved for x and y is simply $$f_{x}g_{y}-f_{y}g_{x}=0$$ and $$g(x,y)=c$$. No need to introduce the auxiliary variable $$\lambda$$ whose precise value is, after all, irrelevant to the problem (one only needs to find $$x$$ and $$y$$, not $$\lambda$$). In the case of a three-variable function $$f(x,y,z)$$ with two constraints $$g(x,y,z)=c$$ and $$h(x,y,z)=d$$, the introduction of the extraneous parameters $$\lambda$$ and $$\mu$$ can be avoided as follows: $$\nabla f$$ lies on the plane generated by the vectors $$\nabla g$$ and $$\nabla h$$ (i.e., $$\nabla f=\lambda\nabla g+\mu\nabla h$$, for some parameters $$\lambda,\mu$$) if, and only if, the cross product vector $$\nabla g \times \nabla h$$ (which is normal to the plane in question) is perpendicular to $$\nabla f$$, i.e.,  $$\nabla f\cdot(\nabla g \times \nabla h)=0$$. Thus the set of equations to be solved for x, y and z is simply $$\nabla f\cdot(\nabla g \times \nabla h)=0$$, $$g(x,y,z)=c$$ and $$h(x,y,z)=d$$. In practice, when the maximization problem at hand is not too complicated, the method described in the article (and in countless Calculus textbooks) and the one described above work equally well. But when the opposite is the case, especially when three variables are involved, the method presented above is significantly less laborius. Try both methods on the following problem and you will see what I mean: maximize $$f(x,y,z)=20+2x+2y+z^{2}$$ subject to the constraints $$x^{2}+y^{2}+z^{2}=11$$ and $$x+y+z=3$$ (taken from "Calculus", ninth ed., by Larson and Edwards, Example 5, Section 13.10). Contributor: C.Gonzalez-Aviles, PhD, Universidad de La Serena, Chile.

146.83.237.121 (talk) 19:12, 24 September 2013 (UTC

The above observations are correct in my opinion. But it is not only a matter of how the method is taught. It is a genuine feature of the method that the values of the multiplier are not explicitly needed. The "single constraint" subsection I added to the "modern formulation" section addresses these issues somewhat. Jimmymath (talk) 04:53, 21 November 2018 (UTC)

Modern Formulation via Differentiable Manifolds missing references
The section "Modern Formulation via Differentiable Manifolds" is missing references and citations. Some links would be appreciated (and are a requirement for proper scientific articles...) — Preceding unsigned comment added by 131.155.221.94 (talk) 09:03, 17 May 2016 (UTC)

The previous version contained a few unnecessary invocations of outside notions (isomorphism theorems, Fermat's theorem, etc.). The new version I just published is much simpler, and I think it does not require any further references. (Jimmymath (talk) 07:06, 17 December 2018 (UTC))

Differential geometry section--several issues
It looks like there are a few places in the section on modern formulation via differentiable manifolds that could be more clear or are technically incorrect. (These are separate from the issue of no references that has already been commented on. See earlier in this talk page.) They are:

1. The functions $$f$$ and $$g$$ should have the same domain, so it doesn't make sense for $$f$$ to be defined on $$M$$, while $$g$$ is defined on $$\mathbb R^n$$. It would make more sense to define a smooth map $$f: \mathbb R^n \to\mathbb R$$ and discuss the restriction $$f|_M$$. In particular, if $$f$$ is defined on a smooth manifold rather than a Euclidean space, we would need to define a submanifold as the restriction set. However, in the following paragraph, we take $$M$$, not a submanifold of $$M$$, as the restriction set, so these two paragraphs read as though they are confusing the domain of $$f$$ with the constraint.

2. In the same line of thought, we should define $$M$$ as a smooth manifold of codimension $$p$$ in the first paragraph in order to make the first and second paragraphs consistent in their use of $$M$$. Then $$\varphi$$ would be a chart that maps into $$\mathbb R^{n-p}$$, and we would need to define $$f'$$ as $$f'=f|_V\circ\varphi^{-1}$$. I also would support an explicit statement that $$\varphi(V)=U$$.

3. There's a technical issue with $$D_x\varphi^{-1}$$ in the second paragraph. Because $$\varphi^{-1}$$ is the inverse of a chart on $$M$$, its differential maps $$\mathbb R^{n-p}$$ into $$T_yM$$, not $$\mathbb R^n$$. Of course, the inclusion map on $$M$$ naturally induces an isomorphism from $$T_yM$$ onto a subspace of $$T_y\mathbb R^n\cong\mathbb R^n$$, but I'm not sure that this identification is obvious without any explanation.

4. I wonder if it's worth remarking that $$D_x\varphi^{-1}$$ (or rather $$D_x(\iota\circ\varphi^{-1})$$ as per the previous point) and $$D_{\varphi^{-1}x}g$$ form a short exact sequence. I'm not familiar enough with that concept to say whether it would be a useful framework for these maps.

5. The statement regarding the first isomorphism theorem isn't completely right. The first isomorphism theorem gives us that the image of $$D_yf$$ is isomorphic to a subspace of the image of $$D_yg$$, but any liner map $$\mathcal L$$ further needs to be constructed. I'm not sure whether this map can be considered canonical in any sense, and it may be appropriate to invoke the third isomorphism theorem here. 35.1.175.50 (talk) 02:14, 2 September 2018 (UTC)


 * Update: I made a number of minor edits to the section designed to address the issues raised in points 1 through 3 here. I also made the dependence of (what was) $$f'$$ on individual charts explicit and added a paragraph showing that the notion of critical point is still well defined. I'm not sure how to address points 4 and 5 above, so I largely avoided implementing those suggestions. I would also like to raise the possibility that the last paragraph of this section be deleted. I'm not sure what point it's making. 35.1.175.50 (talk) 04:28, 2 September 2018 (UTC)

There were indeed several problems with the "Multiple constraints" subsection of the "Modern formulation" section. I have rewritten it completely in the manner of the "Single constraint" section that I added about a month ago. I was reluctant to change the "Multiple constraints" section at the time, because I hadn't been able to understand what it said enough to know whether it was correct or not. After working it out on my own, it became clear that the previous version contained may errors and was not focused enough. (Jimmymath (talk) 07:11, 17 December 2018 (UTC))

Proofs?
I believe this section requires more information in regarding to proving that the coordinates obtained from the method are indeed maxima or minimas, ie more examples & proofs for examples?
 * Proofs are not really encyclopedic, and they may stay in the way of reading the article. More examples never hurt, and if proofs are really necessary at some point and they are helpful and short, one could also put in a proof maybe. Oleg Alexandrov (talk) 19:00, 3 May 2006 (UTC)
 * Theorems are only right when they are proven so, one can understand how to use the theorem, but it is only when he understands the proof does he truely understand the meaning of the theorem. I do appreciate how long proofs can do more harm than good to the article, but for instance adding the proof to a specific example, no matter how short, would greatly aid the reader in comprehending the text. RZ heretic 10:51, 5 May 2006 (UTC)
 * Existence of lagrange multipliers is a necessity condition, not a sufficiency condition. In any case I think some sort of proof would be helpful, specifically I think that geometric intuition suffices for the case of one constraint but is less clear when considering multiple constraints (with several multipliers). M0nkey 00:34, 15 June 2006 (UTC)

Theorem: The extrema of
 * $$f(x_1,\ldots,x_n)$$

subject to the constraints
 * $$g_i(x_1,\ldots,x_n)=0\,,\quad i=1,\ldots,m\qquad\qquad\qquad\qquad\qquad$$(∗)

occur at stationary points (I didn't say "extrema"!!)  of
 * $${\cal L}(x_1,\ldots,x_n,~\lambda_1,\ldots,\lambda_m)\,=\,f(x_1,\ldots,x_n)-\textstyle\sum_i \lambda_i g_i(x_1,\ldots,x_n)$$

without constraints.

Proof 1: Since (∗) makes $\mathcal{L}$ equal to $f$, the extrema of $f$ subject to (∗) are the extrema of $\mathcal{L}$ subject to (∗) and are therefore, as usual, stationary points of $\mathcal{L}$ subject to (∗). But at the stationary points of $\mathcal{L}$, condition (∗) is of no effect, because it merely says that $$\,\partial{\cal L}/\partial\lambda_i=0\,$$ and is therefore satisfied by the "unconstrained" stationary points of $\mathcal{L}$.

Proof 2: The extrema of $f$ subject to (∗) are extrema, and therefore stationary points, of $\mathcal{L}$ subject to the constraint that $\mathcal{L}$ shall be insensitive to the coefficients $&lambda;_{i}$. But even the "unconstrained" stationary points of $\mathcal{L}$ satisfy that constraint.

Remark 1: I have assumed that extrema occur at stationary points. But even that assumption could have been avoided by referring to stationary points at the outset.

Remark 2: If we seek the stationary points of $\mathcal{L}$, we find that $&nabla;f$ is a linear combination of the $&nabla;g_{i}$ (if we must use such esoteric symbols). But so what?

Remark 3: Obviously it makes no difference to the argument if the minus sign in $\mathcal{L}$ is replaced by plus.

— Gavin R Putland (talk) 05:54, 2 July 2019 (UTC).

Figure 1: constant zero?
In Figure 1, a constraint g(x,y)=0 would be more in line with the article (especially the introduction) than g(x,y)=c. An easy fix would be to simply change the red label inside the figure. --Fonss (talk) 12:10, 28 October 2017 (UTC)


 * Not knowing how to change the figure, and noticing that later in the article it uses g(x, y)=c, I’ve tweaked the text near Figure 1. Loraof (talk) 17:34, 31 May 2018 (UTC)

The article unclearly mentions that g(x,y) = c is essentially the same as g(x,y) = 0 since the constant can be absorbed into the function, but it goes on to mix the two arbitrarily. I changed one occurrence of g(x,y) = 0 to g(x,y) = c to make a paragraph more consistent. However more consistency is needed overall. IMO g(x,y) = c distinguishes the nature of what is needed from "g" from the "equals-zero" condition needed for the Lagrangian. — Preceding unsigned comment added by Dmumme (talk • contribs) 06:26, 25 October 2019 (UTC)

Notation Problem
Updated the "Statement" section, explaining what D means in this context.

Vlad Patryshev (talk) — Preceding undated comment added 02:26, 1 January 2021 (UTC)

clarification for stationary points in intro
in the intro where it says "find the of $$\mathcal{L}$$ considered as a function of $$x$$ and the Lagrange multiplier $$\lambda$$", the answer is that we do require the partial derivative of $$\mathcal{L}$$ with respect to $$\lambda$$ to be zero because that is equivalent to the constraint $$g(x)=0$$.

Kylrth (talk) 17:33, 20 December 2021 (UTC)

In the external link "Lagrange Multipliers without permenant scarring" by Dan Klein, you will see that all partial derivatives of the Lagrangian must equal to zero, both with respect to the variables as well as lambda. — Preceding unsigned comment added by 2A02:908:1013:F460:7403:1F5D:6C00:8E66 (talk) 16:37, 7 February 2022 (UTC)


 * I have now added the clarification. Corneille pensive (talk) 08:01, 1 June 2022 (UTC)