This post can be regarded as a sequel to my previous (and very ancient) post on 1+2+3+…. Though these two posts are not quite logically related, they share the same spirit (I’m asking a dumb question again).

How can one prove the following?

Theorem 1

This extremely simple inequality (just next to the simplest but arguably the most important inequality ) turns out to be the testing ground for a lot of more advanced inequalities (e.g. Cauchy-Schwarz, or Newton’s inequality).

In this post, I record some proofs of this inequality that I can think of. Some of them are quite similar, or may be considered almost the same depending how you measure the level of similarity. I try to group proofs of roughly the same idea together. In some of these proofs, we have actually used more advanced inequalities (which perhaps are even proved from this simple inequality). For such a simple result, I guess there may be over a hundred proofs.

So what’s the point of doing all these, since only one proof is all that suffices? Because not many people would have the patience to read till the end (or even this paragraph), let me put the summary here:

Don’t just read it; fight it! Ask your own questions, look for your own examples, discover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case? What about the degenerate cases? Where does the proof use the hypothesis? (Paul Halmos, “I want to be a mathematician: an automathography”)

So, let’s begin.

- (My favourite)
- By AM-GM inequality,
- There are a number of variations to the above argument, using the generalized mean inequality. For example one can use the quadratic mean-arithmetic mean inequality to show that
Squaring this inequality, we can get the result.

One can also use the GM-HM inequality: for ,

We can get the result after simplification.

- We only have to prove the case where . If this is trivial. Otherwise, we can assume . The square below has side , and we put a smaller square with side at the lower left corner as shown.

The upper right square, which clearly has positive area, is obtained by cutting two rectangles of area and the square of side from the large square, soThis is nothing but just the geometric interpretation of Proof 1.

- Again assume . Without loss of generality , then and so
This computation has a geometric meaning as follows. The large rectangle below has height and base .

The lower left rectangle and the lower right rectangle has the same height but they have base and respectively. Clearly the lower right one has larger area, and their areas are respectively and . - The squared distance between and is non-negative:
- The triangle inequality (or Minkowski inequality when ) states that
Now take and . Then

Squaring both side, we can get the result.

- By Cauchy-Schwarz inequality (or more generally, Holder or generalized Holder inequality),
Alternatively, one can use the Lagrange’s identity

and choose and to prove the result.

- Let , then
- Consider the (possibly degenerate) parallelogram spanned by the vectors and . By the parallelogram law,
So

from which we obtain the result.

- Obviously we can assume . The (possibly degenerate) parallelogram spanned by and has squared area
Alternatively, this is the square of the norm of the cross product .

- Clearly the inequality is invariant under scaling of . So rescale the vector to have norm (if the inequality is trivial). Let . Then
- A variant of the above is that if and , then
This is also the geometric interpretation of the Cauchy-Schwarz inequality.

- The polynomial clearly has two real roots. So its discriminant satisfies
- By the convexity of , (or by Jensen’s inequality),
which implies .

- A variant of the above is to use the concavity of :
Squaring this inequality gives the result.

- By the convexity of ,
Take and , we get

- This is the dual version of the above proof. By the concavity of , for positive ,
As is increasing, . Now just take and .

- We have the standard formula for the standard deviation: for a random variable with mean ,
Now take to be the discrete random variable with each having probability , then

From this we can get the result.

- Without loss of generality . For , with , let
be the -th elementary symmetric function of . The Bohnenblust’s inequality states that is concave in the sense that

Take , and . Then by applying the above inequality, we have

Squaring this inequalilty, we can get the result.

- For any matrix , is non-negative definite. Without loss of generality assume . Take . Then and so
- The Minkowski determinant theorem (which is a special case of Bohnenblust’s inequality) states that if are non-negative symmetric matrices, then is concave:
Without loss of generality, assume . Now take and . Applying the inequality, we have

Squaring this inequality, we get the result.

- The Newton’s inequality states that for where , we have
Without loss of generality both . Let . We compute , , , (and ). Then

The other Newton’s inequality also gives the same result. From , we have .

Comparing with ,

from which we obtain .

On can also use the Maclaurin’s inequality. For example by cancelling the terms in the inequality , we can also obtain the result.

- The rearrangement inequality states that if and , then
for any permutation of . We can assume that . Let . The rearrangement inequality states that , and so

- The Chebyshev order inequality states that if , and , then
Without loss of generality . Take . Then the Chebyshev order inequality gives

From this the result follows.

- The Mahler’s inequality states that the geometric mean of a sum is greater than the sum of the geometric mean: if , then
Let , , and . Then

- Suppose be fixed. We consider the problem of maximizing subject to . Since is compact, the maximum of can be attained. So by Lagrange multiplier method, we solve
We must have and so . For these , and so clearly the maximum value is . Since is arbitrary, we have .

- We can also carry out the dual form of the above argument by minimizing subject to . But we have to be a bit more careful because the set is not compact, so we must argue that the minimum of is attainable (while the supremum is not). This can be seen from the fact that as and so we only have to restrict on a (large enough) compact subset to look for a minimum.
- Let be fixed and consider . As , the only critical point of is . This must be the minimum by, say, (global version of) the first or second derivative test. So .
- (Worst one) This is a special case of the Young’s inequality
where .

- (A proof by Juno Mak)

(This can be regarded as the geometric version of Proof 27.)

It suffices to show it for . Consider a right-angled triangle with side and hypotenuse . We arrange four such triangles gathered at their right angles to form a rhombus as shown:

Suppose we vary so as to maximize the area of this rhombus, subject to the condition that the length of the hypotenuse is fixed. Then the maximum is clearly attained when the rhombus becomes a square (i.e. ), which has area , because all the rhombuses are parallelograms having the same base but the square attains the maximum height. We conclude that .

- (A proof by Anthony Suen) Let us fix a right-angled triangle with side and hypotenuse . We arrange four such triangles as shown:Clearly the large square, which has area , has area larger than the sum of the areas of the four right-angled triangles (each having area ). So .
A variant is to see from the figure below that . (A similar idea is also suggested by Hon Leung.)

- This is a variant of Juno’s proof. Fix a semicircle and we want to maximize the area of the triangle inscribed in this semicircle as shown:By elementary geometry, all such triangles are right-angled and so have the same hypotenuse , which is the diameter of the semicircle. The maximum area is attained when the height is maximum, i.e. the height is the radius . Therefore
- (Another proof by Anthony Suen) Without loss of generality are positive and , then the angle between vectors and is less than (because makes an angle less than with the positive -axis, while the angle between and the positive -axis is ).

So - (A proof by Lam Wai Kit)

Without loss of generality assume . Show that by calculus and put .

Remark 1It’s always tempting to generalize one result to different settings. The different proofs of the very simple inequality provide us some insights to obtain further generalizations. For example, how many of the above proofs can be adapted to prove the slightly more complicated (but still quadratic) inequality

How many don’t, and why? How about the “weighted” version

where ? Or the cubic version

What about the continuous versions? E.g. can one prove

for any positive integrable function ? (Why this is the analogue?). Is there any continuous analogue of the above methods to prove this version? Etc.

yet another easy proof: show that -1 \leq 2x/(1+x^2) \leq 1 (by calculus) and put x=a/b.

[Nice! Thank you. -KKK]