Talk:Karush–Kuhn–Tucker conditions

Error in the main KKT theorem
The KKT theorem seems to refer to a different formulation of the problem that is actually there: The KKT theorem makes no reference to $$\lambda$$ or $$h$$, and the Lagrangian in the theorem is $$L(x,\mu)$$ whereas in the problem statement it is $$L(x,\mu,\lambda)$$. Also, the term "optimal vector" is not defined, but if it means "globally optimal" then the theorem is wrong since the saddle point condition at most gives local optimality (and no convexity or the like is assumed here). In short, this central element in the page seems to be a mixture of different formulations. UffeHThygesen (talk) 09:47, 9 December 2020 (UTC)

KKT in complex spaces
Could anyone please write on validity of the KKT for real-valued functions of complex vector variables? The current revision of the article covers the real case only. (2andrewknyazev (talk) 23:48, 22 July 2010 (UTC)).
 * This would be a nonstandard topic, better more of research literature than an encyclopedia. (A complex vector space with complex dimension D can be viewed as a real vector space of dimension 2D.) Kiefer.Wolfowitz (talk) 11:08, 25 July 2010 (UTC)
 * Is there a special reason for complex being "nonstandard"? Why, e.g., minimizing a real quadratic form over a complex space would be any less "standard" than over a real space? If it is true that the KKT holds without any changes for complex vector variables, that would be an encyclopedia-level statement. If there is a special reason, why it is not true, that would also be an encyclopedia-level statement, I guess. (2andrewknyazev (talk) 16:53, 26 July 2010 (UTC))
 * WP is not a forum for original research. An article on KKT conditions should respect WP guidelines, particularly wrt the weight of topics in standard, leading reliable references (e.g. textbooks and research monographs, or invited review articles).
 * In a constructive spirit, let me suggest some references. For the complex field, you may look at literature about Hilbert's 17th problem (sums of squares representations, e.g. J-B Lasserre), plurisubharmonic functions, etc. No reliable standard reference comes to mind for complex vector spaces: It may be worthswhile to look at Karlin's book from the 1950s, e.g. Sobyczyk (sic.). Kiefer.Wolfowitz (talk) 21:52, 29 July 2010 (UTC)

Complex optimization problems can be easliy transformed to real optimization problems mechanically. Thus I do not think it have any special properties. Before transformation you need define a order on complex space, to transform it to real one. If you mean real-valued function over complex field space, it is even simpler, just analyze a vector space with 2n dimensions (real and imaginary parts). --91.213.255.7 (talk) 03:44, 10 April 2012 (UTC)

a minus or a plus ?
In other sources, there is a minus (instead of a plus) sign in front the multiplier (or it is to the right of the equality sign). Is the English Wikipedia article wrong? See for instance the Swedish article. /Anton —Preceding unsigned comment added by 83.249.20.170 (talk) 20:00, 2 September 2009 (UTC)


 * I have edited http://sv.wikipedia.org/wiki/Karush-Kuhn-Tucker-villkor to make the sign correct. (2andrewknyazev (talk) 14:12, 29 July 2010 (UTC)).

typo
The condition seems to be g_i(x)=<0 and g and f are concave I think at least g should be convex see   e.g.  http://www2.imm.dtu.dk/courses/02711/lecture3.pdf

There seems to be a typo here: should the complementarity condition be $\sum_i \mu_i g_i(x^\ast) = 0$, not $\mu_i g_i(x^\ast) = 0$ for each $i$? Tveldhui 21:05, 17 March 2006 (UTC)

These are equivalent statements.

The introduction says "Allowing inequality constraints, the KKT approach to nonlinear programming generalizes the method of Lagrange multipliers, which had traditionally required linear constraints." Shouldn't "linear" (the second-to-last word) be replaced with "equality"? AFAIK, KKT and Langrange multipliers both do nonlinear programming quite naturally, and the first three words of the sentence also seem to support the idea that the shortcoming of Lagrange multipliers is that they require "equality constraints." 18.214.1.205 (talk) 02:46, 31 March 2010 (UTC)


 * Thanks for noticing the problem with linear versus equality constraints, which I fixed. Kiefer.Wolfowitz (talk) 07:59, 31 March 2010 (UTC)

One Condition missing
h(x) = 0 193.143.32.39 09:48, 3 January 2007 (UTC)


 * Isn't that implied by the stipulation that x* be a feasible point? -- Jitse Niesen (talk) 12:44, 4 January 2007 (UTC)

Regularity section
Does the $\lambda$ under "regularity" mean the $\mu$ and $\nu$ in the section before that? In this case this should be corrected, because the $\lambda$ for the dual multipliers doesn't appear anywhere else in the article. —Preceding unsigned comment added by 84.75.149.134 (talk) 22:17, 30 January 2008 (UTC)

Suggestions for article improvement

 * The article needs historical background.
 * It needs to show how the Lagrange Method can be an special case of KKT, when one uses two inequality constraints as an equivalent of one equality constraint.
 * From an economics student's point of view, the article should show more results related to the problem of cost minimization subject to constraints. Also, it would be a good idea to explain the different steps involved in applying the method to particular problems. For example, minimization problems imply a different set up of conditions than maximization problems. A reliable reference on this is Chiang and Wainwright's textbook of mathematical economics.
 * The article needs a graphical ilustration of an example of functions defined over a set of inequality constraints.--Forich (talk) 03:13, 6 May 2008 (UTC)
 * How about finding a max or min value of f(x) on an interval [a, b] or perhaps a two dimensional simplex? The inequalities would be constraints on the domain of x. You could have similar constraints on the range (say f has to be less than or equal to another function g). That should yield an interesting problem graphically. -- KarlHallowell (talk) 13:04, 14 May 2013 (UTC)


 * The article refers to active constraints without defining what it means by "active". I suppose this refers to conditions with non-zero mu and lambda? Janez Demsar (talk) 19:41, 31 August 2009 (UTC)
 * It also doesn't define what is meant by "Primal feasibility", "Dual feasibility", and "Complementary slackness". -- KarlHallowell (talk) 13:04, 14 May 2013 (UTC)


 * I think the article should reference (and compare/contrast with) the well known Fritz John conditions as well (see | Bertsekas' Fritz John paper for some background ). Also, more elaboration on the Slater condition would be great, since this is one of the most used conditions. Lavaka (talk) 16:47, 16 September 2009 (UTC)

Sufficient conditions
Someone correct me if I'm wrong, but shouldn't the sufficient conditions involve the Hessian? I think it's something like: the Hessian is positive definate with respect to feasible search directions? It appears the sufficient conditions are just copy/pasted necessary conditions... Alec.N (talk) 08:11, 27 October 2008 (UTC)

The only difference between the necessary/sufficient conditions is that in the sufficient case f, g are convex and h is affine.


 * You are correct. It's just a particular case. I've clarified this section a bit. (2andrewknyazev (talk) 18:00, 28 July 2010 (UTC)).

math not displayed
I've tried different encodings, but all I see at the end of the article (regularity conditions) are little squares rather than math (evidently, with different encoding you get different gibberish). E.g. "It can be shown that LICQ⇒MFCQ" etc. I went to the source, hoping that it was some simple omission, but this wasn't the case.

Bottom line, the math needs to be fixed it seems. —Preceding unsigned comment added by 83.83.234.195 (talk) 09:53, 9 February 2009 (UTC)

the Hessian matrix determines whether the region is convex or not, H(x)> 0 means that it is stricly convex H(x)≥ 0 the region is convex. Strict convexity ensures that the solution found is a global minimum and thus optimum otherwise it might simply be a saddle point. —Preceding unsigned comment added by 86.163.157.227 (talk) 11:18, 18 April 2009 (UTC)

First Order Necessary Conditions (FONC)
In the book Applied Optimization (2006) from Ross Baldick, the author almost makes no mention of the KKT conditions, and instead uses terms such as First Order Necessary Conditions (FONC), and Second Order Sufficient Conditions (SOSC). Are these terms used regularly in optimization? Do they warrant a mention in this article? ---201.127.248.133 (talk) 01:34, 10 February 2010 (UTC)


 * Now added (2andrewknyazev (talk) 18:02, 28 July 2010 (UTC)).

SD
DSOWOLS'SE —Preceding unsigned comment added by 82.69.107.29 (talk) 09:02, 14 March 2010 (UTC)

typo
In the discussion of the value function, shouldn't the \lambda really be \mu? 18.78.0.83 (talk) 21:10, 28 June 2010 (UTC)

Maximization case
In view of the confusion regarding the signs in the Lagrangian, it would be helpful with a short discussion of the maximization case also. Isheden (talk) 08:15, 14 December 2011 (UTC)

Example
The current example that is given seems to be confusing. Can someone please update it with more details? — Preceding unsigned comment added by 128.97.206.107 (talk) 20:33, 22 January 2013 (UTC)

How about some examples? Lagrange multipliers article, have multiple examples of using them to solve some optimizations. Similar simple examples could be created for this article. Does anybody have some external resources, books or articles, with sensible examples? --91.213.255.7 (talk) 03:45, 10 April 2012 (UTC)

German and even more Portugal wikipedia looks to have interesting examples. — Preceding unsigned comment added by 91.213.255.7 (talk) 03:49, 10 April 2012 (UTC)

Mistake in the example ?
I am nore sure, but the following KKT conditions in the economic example doesn't sound correct to me : $$ (\text{d} R / \text{d} Q) (1+\mu ) - \mu (\text{d} C / \text{d} Q) \le 0,$$

I think it should be a "=".


 * The source, which I have now added, uses the inequality version along with a condition that either the inequality holds with equality or the choice variable equals zero. Loraof (talk) 17:51, 10 April 2017 (UTC)

Second order necessary conditions?
I do not know what they are called but the second order conditions should be discussed more. Second order necessary conditions:

Hessian matrix of the objective function is positive semidefinite

Second order sufficient conditions:

Hessian matrix of the objective function is positive definite

Source (a large number, but here is one: http://www.stanford.edu/class/msande211/KKTgeometry.ppt) — Preceding unsigned comment added by 150.135.222.234 (talk) 21:30, 8 March 2013 (UTC)

matrix notation
What means the comma in this operation?


 * $$\left[ \frac{\partial g(x^*)}{\partial x}, \frac{\partial h(x^*)}{\partial x}  \right]^T$$

Can someone who understand it, better explain it in the article?


 * Done. Loraof (talk) 03:01, 10 April 2017 (UTC)

Problem in the Mangasarian-Fromovitz constraint qualification conditions?
If I want to minimize $$7x+3y$$ under the constraints $$x^2+y^2=2$$ and $$x+y=2$$, there is only one feasible point $$x=y=1$$. The gradients of the constraints $$(2,2)$$ and $$(1,1)$$ are positive-linearly independent (the Mangasarian-Fromovitz condition, as stated in the table) and yet, the KKT conditions are not necessary to certify the optimality of $$x=1,~y=1$$. They actually do not hold in $$(1,1)$$ because there is no linear combination of $$(2,2)$$ and $$(1,1)$$ that is equal to $$(7,3)$$. Daniel.porumbel (talk) 13:14, 16 July 2017 (UTC)
 * In your example, the gradients of the constraints you provide are linearly dependent. Zfeinst (talk) 18:12, 13 July 2017 (UTC)
 * Yes, but I understand from the table that the Mangasarian-Fromovitz condition (positive linearly independence) would be enough to derive the necessity of the KKT conditions, see the table in Section "Regularity conditions (or constraint qualifications)". In my example, the Mangasarian-Fromovitz constraint qualification is satisfied, yet the KKT conditions do not hold. Daniel.porumbel (talk) 13:14, 16 July 2017 (UTC)
 * Positive linear INdependence is the statement for the Mangasarian-Fromovitz condition. Your example has positive linear dependence since $$(2,2) = 2 \times (1,1)$$, which means the condition does not hold. Zfeinst (talk) 23:15, 16 July 2017 (UTC)
 * Thanks for the reply, but sorry, I understand from the article that I do not have positive linear dependence for $$(1,1)$$ and $$ (2,2)$$. I repeat the given dependence definition: ($$v_1,\ldots,v_n$$) is positive-linear dependent if there exists $$a_1\geq 0,\ldots,a_n\geq 0$$ not all zero such that $$a_1v_1+\cdots+a_nv_n=0$$. If $$v_1=(1,1)$$ and $$v_2=(2,2)$$, what are the positive values of $$a_1$$ and $$a_2$$ so that $$a_1v_1+a_2v_2=0$$? Daniel.porumbel (talk) 13:51, 18 July 2017 (UTC)
 * See the updated definition in the text. Hopefully this answers your question. Zfeinst (talk) 15:15, 18 July 2017 (UTC)
 * Yes, perfect, thanks.Daniel.porumbel (talk) 21:41, 18 July 2017 (UTC)
 * In fact, one can find an analogous counterexample even for this second definition of positive linear dependence. Minimize $$7x+3y$$ under the constraints $$x^2+y^2=2$$ and $$x+y=-2$$.There is only one feasible point: $$x=y=-1$$. The gradients of the constraints in this point are $$(-2,-2)$$ and $$(1,1)$$. They are positive-linearly independent according to the new definition. The Mangasarian-Fromovitz condition should imply the necessity of the KKT conditions, but they do not hold in $$(-1,-1)$$. I repeat the new positive dependence definition: ($$v_1,\ldots,v_n$$) is positive-linear dependent if there exists $$a_1\geq 0,\ldots,a_n\geq 0$$ and an element $$v_i$$ such that $$\sum_{j \neq i} a_jv_j=v_i$$. With this definition, $$(-2,-2)$$ and $$(1,1)$$ are positive linearly independent. So either I missed something, or there is something missing in the Mangasarian-Fromovitz conditions as written in the article.Daniel.porumbel (talk) 10:23, 19 July 2017 (UTC)
 * You are yet again correct. In Bertsekas "Nonlinear Programming" (2nd edition) page 329-330 defines MFCQ as a necessary condition by: Let $$x^*$$ be a local minimum of the problem $$\min f(x) \text{ s.t. } h_1(x) = 0,\dots,h_m(x) = 0, g_1(x) \leq 0,\dots, g_r(x)\leq 0$$ where $$f,h_i,g_j:

\mathbb{R}^n \to \mathbb{R}$$ are continuously differentiable functions. If the gradients $$(\nabla h_i(x^*))_{i = 1,...,m}$$ are linearly independent and there exists a vector $$d \in \mathbb{R}^n$$ such that $$\nabla h_i(x^*)^T d = 0$$ and $$\nabla g_j(x^*)^T d < 0$$ for all equality constraints and all active inequality constraints, then the KKT conditions are necessary.
 * Since your example only has equality constraints, this will require linear independence of the gradients (plus an additional condition), which is not satisfied. The text in the article should be updated to reflect this actual definition. Zfeinst (talk) 12:13, 19 July 2017 (UTC)
 * Thanks for this quick response. Indeed, even if I replace $$x^2+y^2=2$$ by $$x^2+y^2\leq 2$$, these new Mangasarian-Fromovitz conditions are not satisfied. The article should be updated. Daniel.porumbel (talk) 13:27, 19 July 2017 (UTC)
 * Actually, I think I managed to figure out what was on the mind of the first person who wrote the Mangasarian-Fromovitz condition in the table. Let us first simplify and consider there is no equality constraints and we generalize afterwards. The existence of some $$d \in \mathbb{R}^n$$ such that $$\nabla g_j(x^*)^T d < 0$$ for all active inequality constraints is equivalent to the fact that there exist no $$a_1\geq 0,~a_2\geq 0,~a_3\geq 0\ldots$$ not all zero such that $$\sum a_j \nabla g_j(x^*) =0$$. This is a consequence of the hyperplane separation theorem. This formulation is equivalent to saying that the $$\nabla g_j(x^*)$$'s are positive linearly independent, using the initial (not accurate as far as I can see) definition of positive linearly dependence. If there are some equality constraints $$h_i(x^*)=0~\forall i\in[1..m]$$, we need to project the gradients $$\nabla g_j(x^*)$$ on the null space (kernel) of $$\nabla h_i(x^*)^\top,~ i\in[1..m]$$. For these projected gradients $$\nabla g_j^p(x^*)$$, there must be no $$a_1\geq 0,~a_2\geq 0,~a_3\geq 0\ldots$$ not all zero such that $$\sum a^p_j \nabla g_j(x^*) =0$$. Anyway, the article should be updated. Daniel.porumbel (talk) 14:07, 20 July 2017 (UTC)

I was the one writing the first definition of MFCQ. The current definition of positive-linear dependence was wrong. I've removed this definition. — Preceding unsigned comment added by 143.107.45.1 (talk) 13:56, 5 April 2018 (UTC)

Problem with second-order sufficient conditions?
I have the feeling, the sufficient conditions should involve a test about positivity of the hessian in more than just one direction. That is, till now, the text states a test


 * $$s^T\nabla ^2_{xx}L(x^*,\lambda^*,\mu^*)s \ge 0$$

with $$s$$ being orthogonal to the gradient of the constraint functions. But shouldn't $$s$$ stem from a whole cone of admissible directions (in case of inequality constraints)? It's hard to find references, but here is something in French (see section 5.3.4): https://fr.wikipedia.org/wiki/Conditions_d%27optimalité#Problèmes_d'optimisation_avec_contraintes_d'égalité_et_d'inégalité

2003:D5:9F20:9500:5C01:2FA:636C:3372 (talk) 12:32, 19 November 2021 (UTC)

Grammar and layout
The current text includes:

"The solution $$x^*, \lambda^*, \mu^*$$ found in the above section is a constrained local minimum if for the Lagrangian,


 * $$ L(x,\lambda,\mu) = f(x) + \sum_{i=1}^m \mu_i g_i(x) + \sum_{j=1}^\ell \lambda_j h_j(x) $$

then,


 * $$s^T\nabla ^2_{xx}L(x^*,\lambda^*,\mu^*)s \ge 0$$"

My strong impression is that, grammatically, the word "then" is superfluous, and it confuses (me). Also, the text reads as if the first formula is merely an apposition to or an explanation of "the Lagrangian".

Now, I understand that two formulas separated by a mere comma may make for an ugly layout. Therefore, I propose:

"The solution $$x^*, \lambda^*, \mu^*$$ found in the above section is a constrained local minimum for the Lagrangian,


 * $$ L(x,\lambda,\mu) = f(x) + \sum_{i=1}^m \mu_i g_i(x) + \sum_{j=1}^\ell \lambda_j h_j(x) $$

if


 * $$s^T\nabla ^2_{xx}L(x^*,\lambda^*,\mu^*)s \ge 0$$"

Any motivated objection(s)?Redav (talk) 22:32, 28 January 2022 (UTC)