Talk:Jensen's inequality

Restrictions on Phi(x)
What restrictions must be placed on &phi;(x)?

I attempted to derive the triangle inequality for infinite series from Jensen's Inequality. I made this attempt by assuming all ai=1. And then by allowing &phi;(x)= |x|, the absolute value function.

The math works for the most part. But Im not sure whether to treat |x| as a linear function or not. Under the conditions listed, as they are listed, and assuming |x| is linear, I arrive at the equality | &sum;x | = &sum;|x|. While the triangle inequality demands a strict inequality.

Am I doing something mathematically unsound? --67.168.137.181 (talk) 01:53, 1 December 2014 (UTC)

University logo
Jensen's inequality serves as logo for the mathematics department of Copenhagen University.

In the language of measure theory
The statement in the language of measure theory is true iff $$ \mu $$ is a positive measure. That is, if it is indeed a probability measure. So, I do not see the point of keeping the two different statements (language of measure theory and probability theory). They are exactly the same, with two different notations! And there are several other coherent notations used in measure/probability theory that we could then use here (but of course, the Jensen article is not the right place to discuss about general notations). Therefore, I purpose to delete the language of measure theory section, and just leave the theorem stated with the $$ \mathbb E $$ notation. I will delete it in a few days if I do not receive comments. gala.martin ( what? ) 18:39, 30 April 2006 (UTC)

Use of g
In the measure theoretic notation, the use of g is IMHO misleading. We should replace it by x simply. Indeed, the inequality written with x is not less general than the one with g(x), since the generality of the measure mu allows to recover any function with 0 effort. Adding functions instead of the identity in this case is not generalizing, is confusing. I'll try to be clearer. If you want to write a theorem about a random variable, you say let X be a random variable with property A and B, then X has the property C, this is not less general than let X be a random variable, such that g(X) has the property A and B. Than g(X) has the property C. I think that's exactly what we are writing. Am I right? --gala.martin ( what? ) 09:36, 29 August 2006 (UTC)

Different proofs
The graphical proof can be made more clear with a concrete example, say $$\phi(x)=e^x$$, and a particular distribution, say discretely uniform random variable. The abstract proof number 2 using measure theoretic notation can also be illustrated graphically and that it can tie in perfectly with the intuitive graphical argument.

The first proof by induction does not appear simple in the generalization step with the use of delta function and other notions. The third proof appears to have overly complicated notations. The proof idea is unclear at the end, which a summary or conclusion would help clarify. It would be good to point out the difference compared to the second proof, if any, in addition to the notations.

The second proof is concise yet general. A translation to the probability notation should simply involve rewriting the integral as the expectation, and translating the linearity of integration to linearity of expectation. It would be better if it is put first with the following changes: --Chungc 05:55, 4 December 2006 (UTC)
 * 1) use $$X$$ instead of $$g$$ for the random variable
 * 2) point out at the end that any subderivative could have been used in place of the right-handed derivative
 * 3) tie it in with the graphical proof and a concrete example.

The following equality in the third proof fails for $$\phi(x)=e^x$$, as for, say, y=1 the lim is 1 at 0, while the inf is 0, as one can see for $$\theta \to - \infty$$:
 * $$(D\varphi)(x)\cdot y:=\lim_{\theta \downarrow 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}=\inf_{\theta \neq 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}.$$

One should really use a subderivative here. On a related note, does anyone know a way to prove the existence of such on an arbitrary vector space without using Zorn's lemma? --Pavel.zorin (talk) 10:14, 10 November 2009 (UTC)

Image:Jensen_graph.png
This bot has detected that this page contains an image, Image:Jensen_graph.png, in a raster format. A replacement is available as a Scalable vector graphic (SVG) at File:Jensen graph.svg. If the replacement image is suitable please edit the article to use the vector version. Scalable vector graphics should be used in preference to raster for images that can easily represented in a vector graphic format. If this bot is in error, you may leave a bug report at its talk page Thanks SVnaGBot1 (talk) 15:09, 3 July 2009 (UTC)


 * Also I noticed that in the bottom graph, it says Y(E(X)) when it should actually say $$\varphi(E(X))$$. It would also be useful to add the X=Y line to the image to more easily see that the Y values are larger than their corresponding X values. Any idea how to correct the image? Toby Dylan Hocking, 4 Feb 2010.


 * I agree with your suggestions. The file is an SVG so you can just open and change it in an ordinary text editor. For a GUI, see Inkscape, which seems to be what the SVG was made in. --C. lorenz (talk) 11:42, 7 February 2010 (UTC)

I believe that the y'axis label $$Y(E(X))$$ should instead be $$\varphi(E(X))$$.

Conditional expectation in Proof 3
Isn't it important to notice that the conditional expectation preserves order? I mean:
 * $$X \geq Y \Rightarrow \mathbb{E}\{ X |\mathfrak{G}\} \geq \mathbb{E}\{ Y |\mathfrak{G}\}.$$

The fact is not that obvious in my opinion. André Caldas (talk) 01:09, 4 August 2010 (UTC)

Reference Missing for Special Result
There is a special form of Jensen's inequality given for probability density functions f ('Form involving a probability density function'):


 * $$\varphi\left(\int_{-\infty}^\infty g(x)f(x)\, dx\right) \le \int_{-\infty}^\infty \varphi(g(x)) f(x)\, dx. $$

However, there is no proof or reference for this formula and it does not seem to be so easy to derive it from the standard form. Can someone please add a reference (or a short proof if possible). Thank you, --134.60.10.241 (talk) 10:50, 15 August 2011 (UTC)

Is it sufficient to set r.v. X to g(X) in the standard probabilistic form? Hupili (talk) 14:03, 12 March 2012 (UTC)

"Subdifferential" in proof 3
The use of the "subdifferential" in Proof 3 is problematic: First of all to make the two definitions (limit vs infimum) agree, the infimum must be restricted to $$\theta > 0$$. Second, $$(D\varphi)(x)$$ is not linear in $$y$$: Consider for example the function defined by $$f(x) = |x|$$ for $$x \geq 0$$ and $$f(x) = |x|/2$$ for $$x \leq 0$$. Then $$(D\varphi)(0)(y) = f(y)$$. Why not simply take any subderivative and link to the corresponding article for existence? Xvlcw (talk) 09:38, 17 January 2013 (UTC)

removed comment in Finite form section
A previous version included this statement in parentheses in the Finite Form section: "the function log(x) is concave (note that we can use Jensen's to prove convexity or concavity, if it holds for two real numbers whose functions are taken)". This statment does not make sense. Jensen's inequality doesn't say the function is concave if and only if the inequality holds. The easiest way to prove that log(x) is concave is to observe that the second derivative is negative as described on the concave wikipedia page. Once you know it is concave, you can then apply Jensen's inequality. John Lawrence (talk) 16:27, 18 April 2013 (UTC)

Conditions for equality to hold
The statement: "the equality holds if and only if X is constant (degenerate random variable) or $φ$ is linear" is not correct (at least the "only if" part). For instance, if X has an exponential distribution and $φ$=|X|, then equality will hold.

I have to think about it a little but I am almost sure that the correct statement is "... if and only if $φ$ is linear over a set A such that PrX(A)=1 (which is trivially true if X is constant).

Anyway, I feel like particular cases "X constant" and "linear function" are worth a reference, so I hear your opinions before any changes. AleNS (talk) 02:51, 15 February 2017 (UTC)

Converse (partial) to Jensen Inequality
We need a new section on (partial) converses to Jensen inequality Kjetil B Halvorsen 13:25, 10 October 2017 (UTC) — Preceding unsigned comment added by Kjetil1001 (talk • contribs)

Proof for the finite case is unnecessarily complicated
The result about the finite case does not specify that the weights sum up to 1 (and indeed, this wouldn't make the result any weaker). The proof does – if this requirement is taken out, it becomes even easier. One doesn't need the normalizing term \frac{\lambda_i}{1 - \lambda_1}. --109.192.165.115 (talk) 14:55, 1 January 2020 (UTC)

Edit: Nvm, I'm completely wrong, excuse me; the result does normalize them to add to 1 and the proof is fine. --109.192.165.115 (talk) 20:37, 4 January 2020 (UTC)

proof 3 problem
Factoring out $$ (D\varphi)(\operatorname{E}[X\mid\mathfrak{G}]) $$ from the conditional expectation in the second to last line doesn't seem justified. Though it is $$ \mathfrak{G} $$-measurable, it isn't integrable and neither its positive nor its negative parts seem integrable in general. 2600:8803:8711:F900:2CE5:8D62:8F6B:81E9 (talk) 19:30, 20 March 2023 (UTC)