Why is a² + b² ≥ 2ab ?

This post can be regarded as a sequel to my previous (and very ancient) post on 1+2+3+…. Though these two posts are not quite logically related, they share the same spirit (I’m asking a dumb question again).

How can one prove the following?

 Theorem 1 $\displaystyle \begin{array}{rl} \displaystyle a^2+b^2\ge 2ab.\end{array}$

This extremely simple inequality (just next to the simplest but arguably the most important inequality ${a^2\ge 0}$) turns out to be the testing ground for a lot of more advanced inequalities (e.g. Cauchy-Schwarz, or Newton’s inequality).

In this post, I record some proofs of this inequality that I can think of. Some of them are quite similar, or may be considered almost the same depending how you measure the level of similarity. I try to group proofs of roughly the same idea together. In some of these proofs, we have actually used more advanced inequalities (which perhaps are even proved from this simple inequality). For such a simple result, I guess there may be over a hundred proofs.

So what’s the point of doing all these, since only one proof is all that suffices? Because not many people would have the patience to read till the end (or even this paragraph), let me put the summary here:

 Don’t just read it; fight it! Ask your own questions, look for your own examples, discover your own proofs. Is the hypothesis necessary? Is the converse true? What happens in the classical special case? What about the degenerate cases? Where does the proof use the hypothesis? (Paul Halmos, “I want to be a mathematician: an automathography”)

So, let’s begin.

1. (My favourite)

$\displaystyle \begin{array}{rl} \displaystyle 0\le(a-b)^2=a^2+b^2-2ab.\end{array}$

2. By AM-GM inequality,

$\displaystyle \begin{array}{rl} \displaystyle \frac{a^2+b^2}{2}\ge \sqrt{a^2b^2}=|ab|\ge ab.\end{array}$

3. There are a number of variations to the above argument, using the generalized mean inequality. For example one can use the quadratic mean-arithmetic mean inequality to show that

$\displaystyle \begin{array}{rl} \displaystyle \sqrt{\frac{a^2+b^2}{2}}\ge \frac{a+b}{2}.\end{array}$

Squaring this inequality, we can get the result.

One can also use the GM-HM inequality: for ${a, b>0}$,

$\displaystyle \begin{array}{rl} \displaystyle \sqrt{ab}\ge \frac{2}{\frac{1}{a}+\frac{1}{b}}=\frac{2ab}{a+b}.\end{array}$

We can get the result after simplification.

4. We only have to prove the case where ${a,b>0}$. If ${a=b}$ this is trivial. Otherwise, we can assume ${a. The square below has side ${b}$, and we put a smaller square with side ${a}$ at the lower left corner as shown.

The upper right square, which clearly has positive area, is obtained by cutting two rectangles of area ${a(b-a)}$ and the square of side ${a}$ from the large square, so

$\displaystyle \begin{array}{rl} \displaystyle 0

This is nothing but just the geometric interpretation of Proof 1.

5. Again assume ${a,b\ge 0}$. Without loss of generality ${a\le b}$, then ${b-a\ge 0}$ and so

$\displaystyle \begin{array}{rl} \displaystyle b(b-a)\ge& \displaystyle a(b-a)\\ \textrm{i.e. }a^2+b^2\ge& \displaystyle 2ab. \end{array}$

This computation has a geometric meaning as follows. The large rectangle below has height ${b}$ and base ${a+b}$.

The lower left rectangle and the lower right rectangle has the same height $b-a$ but they have base ${a}$ and ${b}$ respectively. Clearly the lower right one has larger area, and their areas are respectively ${a(b-a)}$ and ${b(b-a)}$.

6. The squared distance between ${(a, b)}$ and ${(b, a)}$ is non-negative:

$\displaystyle \begin{array}{rl} \displaystyle 0\le \left|(a,b)-(b,a)\right|^2 = (a-b)^2+(b-a)^2 =2(a^2+b^2-2ab). \end{array}$

7. The triangle inequality (or Minkowski inequality when ${p=2}$) states that

$\displaystyle \begin{array}{rl} \displaystyle |\boldsymbol x+\boldsymbol y|\le |\boldsymbol x|+|\boldsymbol y|.\end{array}$

Now take ${\boldsymbol x=(a, b)}$ and ${\boldsymbol y=(b,a)}$. Then

$\displaystyle \begin{array}{rl} \displaystyle \sqrt{2(a+b)^2}\le 2\sqrt{a^2+b^2}.\end{array}$

Squaring both side, we can get the result.

8. By Cauchy-Schwarz inequality (or more generally, Holder or generalized Holder inequality),

$\displaystyle \begin{array}{rl} \displaystyle 2ab=(a, b)\cdot(b,a)\le a^2+b^2.\end{array}$

Alternatively, one can use the Lagrange’s identity

$\displaystyle \begin{array}{rl} \displaystyle |\boldsymbol x|^2|\boldsymbol y|^2-(\boldsymbol x\cdot \boldsymbol y)^2=\frac{1}{2}\sum_{i\ne j}(x_iy_j-x_jy_i)^2\end{array}$

and choose ${\boldsymbol x=(a, b)}$ and ${\boldsymbol y=(b,a)}$ to prove the result.

9. Let ${z=a+ib,w=b+ia}$, then

$\displaystyle \begin{array}{rl} \displaystyle 2ab =\mathrm{Re}(z \overline w)\le |z \overline w|=|z||w|= a^2+b^2. \end{array}$

10. Consider the (possibly degenerate) parallelogram spanned by the vectors ${\boldsymbol x=(a, b)}$ and ${\boldsymbol y=(b, a)}$. By the parallelogram law,

$\displaystyle \begin{array}{rl} \displaystyle |\boldsymbol x-\boldsymbol y|^2+|\boldsymbol x+\boldsymbol y|^2=2(|\boldsymbol x|^2+|\boldsymbol y|^2).\end{array}$

So

$\displaystyle \begin{array}{rl} \displaystyle 2(a^2+b^2)=2(|\boldsymbol x|^2+|\boldsymbol y|^2)\ge |\boldsymbol x+\boldsymbol y|^2=2(a+b)^2, \end{array}$

from which we obtain the result.

11. Obviously we can assume ${a, b\ge 0}$. The (possibly degenerate) parallelogram spanned by ${(\sqrt{a}, \sqrt{b})}$ and ${(\sqrt{b}, \sqrt{a})}$ has squared area

$\displaystyle \begin{array}{rl} \displaystyle 0\le \mathrm{Area}^2 =\left|\det\begin{pmatrix} \sqrt{a}& \displaystyle \sqrt{b}\\ \sqrt{b}& \displaystyle \sqrt{a} \end{pmatrix} \right|^2=(a-b)^2=a^2+b^2-2ab. \end{array}$

Alternatively, this is the square of the norm of the cross product ${(\sqrt{a}, \sqrt{b}, 0)\times (\sqrt{b}, \sqrt{a}, 0)}$.

12. Clearly the inequality is invariant under scaling of ${(a, b)}$. So rescale the vector ${(a, b)}$ to have norm ${1}$ (if $(a, b)=(0,0)$ the inequality is trivial). Let ${(a, b)=(\cos \theta, \sin \theta)}$. Then

$\displaystyle \begin{array}{rl} \displaystyle a^2+b^2=1\ge \sin (2\theta)=2\sin \theta\cos \theta=2ab. \end{array}$

13. A variant of the above is that if ${(a, b)=(\cos \alpha, \sin \alpha)}$ and ${(b, a)=(\cos \beta, \sin \beta)}$, then

$\displaystyle \begin{array}{rl} \displaystyle a^2+b^2=1\ge \cos (\alpha-\beta) =\cos \alpha\cos \beta+\sin \alpha\sin\beta =ab+ba=2ab. \end{array}$

This is also the geometric interpretation of the Cauchy-Schwarz inequality.

14. The polynomial ${p(x)=(x-a)(x-b)=x^2-(a+b)x+ab}$ clearly has two real roots. So its discriminant satisfies

$\displaystyle \begin{array}{rl} \displaystyle 0\le (a+b)^2-4ab=a^2+b^2-2ab. \end{array}$

15. By the convexity of ${x^2}$, (or by Jensen’s inequality),

$\displaystyle \begin{array}{rl} \displaystyle \left(\frac{a+b}{2}\right)^2\le& \displaystyle \frac{1}{2} a ^2 + \frac{1}{2} b ^2 \end{array}$

which implies ${2ab\le a^2+b^2}$.

16. A variant of the above is to use the concavity of ${\sqrt x}$:

$\displaystyle \begin{array}{rl} \displaystyle \sqrt{\frac{(2a)^2+(2b)^2}{2}}\ge \frac{1}{2}\left(\sqrt{(2a)^2}+\sqrt{(2a)^2}\right)=a+b. \end{array}$

Squaring this inequality gives the result.

17. By the convexity of ${e^x}$,

$\displaystyle \begin{array}{rl} \displaystyle e^{\frac{u+v}{2}} \le \frac{1}{2}e^u+\frac{1}{2}e^v. \end{array}$

Take ${a=e^{\frac{u}{2}}}$ and ${b=e^{\frac{v}{2}}}$, we get

$\displaystyle \begin{array}{rl} \displaystyle ab\le \frac{1}{2}a^2+ \frac{1}{2}b^2.\end{array}$

18. This is the dual version of the above proof. By the concavity of ${\log}$, for positive ${x, y}$,

$\displaystyle \begin{array}{rl} \displaystyle \log\left(\frac{x+y}{2}\right) \ge \frac{1}{2}\log x+\frac{1}{2}\log y = \log \sqrt{xy}. \end{array}$

As ${\log}$ is increasing, ${x+y\ge 2\sqrt{xy}}$. Now just take ${x=a^2}$ and ${y=b^2}$.

19. We have the standard formula for the standard deviation: for a random variable ${X}$ with mean ${\mu}$,

$\displaystyle \begin{array}{rl} \displaystyle E [(X-\mu)^2] =E[ X^2] -\mu^2.\end{array}$

Now take ${X}$ to be the discrete random variable ${a, b}$ with each having probability ${\frac{1}{2}}$, then

$\displaystyle \begin{array}{rl} \displaystyle 0\le E[(X-\mu)^2]=\frac{1}{2}(a^2+b^2)-\left(\frac{a+b}{2}\right)^2.\end{array}$

From this we can get the result.

20. Without loss of generality ${a, b\ge 0}$. For ${\boldsymbol x=(x_1, \cdots, x_n)}$, ${\boldsymbol y=(y_1, \cdots, y_n)}$ with ${x_i, y_i\ge 0}$, let

$\displaystyle \begin{array}{rl} \displaystyle E_k(\boldsymbol x):=\sum_{1\le i_1<\cdots

be the ${k}$-th elementary symmetric function of ${\boldsymbol x}$. The Bohnenblust’s inequality states that ${E_k^{\frac{1}{k}}}$ is concave in the sense that

$\displaystyle \begin{array}{rl} \displaystyle E_k(\boldsymbol x+\boldsymbol y)^{\frac{1}{k}}\ge E_k(\boldsymbol x)^{\frac{1}{k}}+E_k(\boldsymbol y)^{\frac{1}{k}}.\end{array}$

Take ${k=2}$, ${\boldsymbol x=(a, b)}$ and ${\boldsymbol y=(b, a)}$. Then by applying the above inequality, we have

$\displaystyle \begin{array}{rl} \displaystyle a+b\ge 2\sqrt{ab}.\end{array}$

Squaring this inequalilty, we can get the result.

21. For any matrix ${A}$, ${A^TA}$ is non-negative definite. Without loss of generality assume ${a, b\ge 0}$. Take ${A= \begin{pmatrix} \sqrt{a}& \displaystyle \sqrt{b}\\ \sqrt{b}& \displaystyle \sqrt{a} \end{pmatrix} }$. Then ${A^TA= \begin{pmatrix} a+b& \displaystyle 2\sqrt{ab}\\ 2\sqrt{ab}& \displaystyle a+b \end{pmatrix} }$ and so

$\displaystyle \begin{array}{rl} \displaystyle 0\le \det (A^TA)=(a+b)^2-4ab=a^2+b^2-2ab. \end{array}$

22. The Minkowski determinant theorem (which is a special case of Bohnenblust’s inequality) states that if ${A, B}$ are non-negative symmetric matrices, then ${\det ^{\frac{1}{n}}}$ is concave:

$\displaystyle \begin{array}{rl} \displaystyle \left(\det (A+B)\right)^{\frac{1}{n}}\ge \left(\det A\right)^{\frac{1}{n}}+\left(\det B\right)^{\frac{1}{n}}.\end{array}$

Without loss of generality, assume ${a, b\ge 0}$. Now take ${A= \begin{pmatrix} a& \displaystyle 0\\ 0& \displaystyle b \end{pmatrix} }$ and ${B=\begin{pmatrix} b& \displaystyle 0\\ 0& \displaystyle a \end{pmatrix}}$. Applying the inequality, we have

$\displaystyle \begin{array}{rl} \displaystyle |a+b|\ge 2\sqrt{ab}.\end{array}$

Squaring this inequality, we get the result.

23. The Newton’s inequality states that for ${H_k=\frac{E_k(\pmb x)}{{n\choose k}}}$ where ${\pmb x=(x_1, \cdots, x_n)}$, we have

$\displaystyle \begin{array}{rl} \displaystyle H_j^2\ge H_{j-1}H_{j+1}.\end{array}$

Without loss of generality both ${a, b>0}$. Let ${\pmb x=(a, a, b, b)}$. We compute ${H_1=\frac{a+b}{2}}$, ${H_2=\frac{a^2+b^2+4ab}{6}}$, ${H_3=\frac{a^2b+ab^2}{2}}$, ${H_4=a^2b^2}$ (and ${H_0:=1}$). Then

$\displaystyle \begin{array}{rl} \displaystyle 0\le H_1^2-H_2=\frac{a^2+b^2-2ab}{12}. \end{array}$

The other Newton’s inequality also gives the same result. From ${H_3^2\ge H_2H_4}$, we have ${\frac{H_3^2}{H_4}\ge H_2}$.

$\displaystyle \begin{array}{rl} \displaystyle \frac{H_3^2}{H_4}=\frac{a^2+2ab+b^2}{4}. \end{array}$

Comparing with ${H_2}$,

$\displaystyle \begin{array}{rl} \displaystyle \frac{a^2+b^2+2ab}{4}\ge\frac{a^2+b^2+4ab}{6}, \end{array}$

from which we obtain ${a^2+b^2\ge 2ab}$.

On can also use the Maclaurin’s inequality. For example by cancelling the ${ab}$ terms in the inequality ${H_1^3\ge H_3}$, we can also obtain the result.

24. The rearrangement inequality states that if ${x_1\le\cdots\le x_n}$ and ${y_1\le\cdots\le y_n}$, then

$\displaystyle \begin{array}{rl} \displaystyle \sum_{i=1}^{n}x_iy_{\sigma(i)}\le\sum_{i=1}^{n}x_iy_i\end{array}$

for any permutation ${\sigma}$ of ${\{1,\cdots,n\}}$. We can assume that ${a\le b}$. Let ${(x_1, x_2)=(y_1, y_2)=(a, b)}$. The rearrangement inequality states that ${x_1y_2+x_2y_1\le x_1y_1+x_2y_2}$, and so

$\displaystyle \begin{array}{rl} \displaystyle ab+ba\le a^2+b^2.\end{array}$

25. The Chebyshev order inequality states that if ${\sum_{i=1}^{n}p_i=1}$, ${x_1\le\cdots\le x_n}$ and ${y_1\le\cdots\le y_n}$, then

$\displaystyle \begin{array}{rl} \displaystyle \left(\sum_{i=1}^{n}p_i x_i\right) \left(\sum_{i=1}^{n}p_i y_i\right)\le \sum_{i=1}^{n}p_i x_iy_i.\end{array}$

Without loss of generality ${a\le b}$. Take ${(x_1, x_2)=(y_1, y_2)=(a, b)}$. Then the Chebyshev order inequality gives

$\displaystyle \begin{array}{rl} \displaystyle \left(\frac{a+b}{2}\right) \left(\frac{a+b}{2}\right)\le \frac{1}{2}(a^2+b^2).\end{array}$

From this the result follows.

26. The Mahler’s inequality states that the geometric mean of a sum is greater than the sum of the geometric mean: if ${x_k, y_k\ge 0}$, then

$\displaystyle \begin{array}{rl} \displaystyle \prod _{k=1}^n(x_k+y_k)^{\frac{1}{n}}\ge \prod _{k=1}^n x_k^{\frac{1}{n}}+ \prod _{k=1}^n y_k^{\frac{1}{n}}.\end{array}$

Let ${n=2}$, ${(x_1, x_2)=(a^2, b^2)}$, and ${(y_1, y_2)=(b^2, a^2)}$. Then

$\displaystyle \begin{array}{rl} \displaystyle a^2+b^2=(a^2+b^2)^{\frac{1}{2}} (b^2+a^2)^{\frac{1}{2}}\ge \left(a^2 b^2\right)^{\frac{1}{2}} +\left(b^2a^2\right)^{\frac{1}{2}}=2|ab|\ge 2ab. \end{array}$

27. Suppose ${c\ge 0}$ be fixed. We consider the problem of maximizing ${f(x, y)=2xy}$ subject to ${x^2+y^2=c^2}$. Since ${\{(x,y): x^2+y^2=c^2\}}$ is compact, the maximum of ${f}$ can be attained. So by Lagrange multiplier method, we solve

$\displaystyle \begin{array}{rl} \displaystyle \begin{cases} (2y, 2x)=\lambda (2x, 2y)\\ x^2+y^2=c^2. \end{cases} \end{array}$

We must have ${x^2=y^2}$ and so ${x=\pm y=\pm\frac{c}{\sqrt{2}}}$. For these ${x, y}$, ${f(x, y)=\pm c^2}$ and so clearly the maximum value is ${c^2=x^2+y^2}$. Since ${c}$ is arbitrary, we have ${2xy\le x^2+y^2}$.

28. We can also carry out the dual form of the above argument by minimizing ${f(x, y)=x^2+y^2}$ subject to ${2xy=\textrm{constant}}$. But we have to be a bit more careful because the set ${\{2xy=c\}}$ is not compact, so we must argue that the minimum of ${f}$ is attainable (while the supremum is not). This can be seen from the fact that ${f\rightarrow \infty}$ as ${(x, y)\rightarrow \infty}$ and so we only have to restrict on a (large enough) compact subset to look for a minimum.
29. Let ${b}$ be fixed and consider ${f(x)=x^2-2bx+b^2}$. As ${f'(x)=2x-2b}$, the only critical point of ${f}$ is ${b}$. This must be the minimum by, say, (global version of) the first or second derivative test. So ${f(a)=a^2+b^2-2ab\ge f(b)=0}$.
30. (Worst one) This is a special case of the Young’s inequality

$\displaystyle \begin{array}{rl} \displaystyle ab\le \frac{a^p}{p}+\frac{b^q}{q}\end{array}$

where ${\frac{1}{p}+\frac{1}{q}=1}$.

31. (A proof by Juno Mak)
(This can be regarded as the geometric version of Proof 27.)
It suffices to show it for ${a, b>0}$. Consider a right-angled triangle with side ${a, b}$ and hypotenuse ${c=\sqrt{a^2+b^2}}$. We arrange four such triangles gathered at their right angles to form a rhombus as shown:

Suppose we vary ${a, b}$ so as to maximize the area of this rhombus, subject to the condition that the length of the hypotenuse ${\sqrt{a^2+b^2}=c}$ is fixed. Then the maximum is clearly attained when the rhombus becomes a square (i.e. ${a=b}$), which has area ${a^2+b^2}$, because all the rhombuses are parallelograms having the same base ${c}$ but the square attains the maximum height. We conclude that ${2ab\le a^2+b^2}$.
32. (A proof by Anthony Suen) Let us fix a right-angled triangle with side ${a, b}$ and hypotenuse ${\sqrt{a^2+b^2}}$. We arrange four such triangles as shown:Clearly the large square, which has area ${a^2+b^2}$, has area larger than the sum of the areas of the four right-angled triangles (each having area ${\frac{1}{2}ab}$). So ${a^2+b^2\ge 2ab}$.

A variant is to see from the figure below that ${4 ab\le (a+b)^2}$. (A similar idea is also suggested by Hon Leung.)

33. This is a variant of Juno’s proof. Fix a semicircle and we want to maximize the area of the triangle inscribed in this semicircle as shown:By elementary geometry, all such triangles are right-angled and so have the same hypotenuse ${\sqrt{a^2+b^2}}$, which is the diameter of the semicircle. The maximum area is attained when the height is maximum, i.e. the height is the radius ${\frac{1}{2}\sqrt{a^2+b^2}}$. Therefore

$\displaystyle \begin{array}{rl} \displaystyle \frac{1}{2}ab\le \frac{1}{2}\cdot\sqrt{a^2+b^2}\cdot \frac{\sqrt{a^2+b^2}}{2}=\frac{1}{4}(a^2+b^2).\end{array}$

34. (Another proof by Anthony Suen) Without loss of generality ${a,b}$ are positive and ${0, then the angle between vectors ${(a,b)}$ and ${(a-b, b-a)}$ is less than ${\frac{\pi}{2}}$ (because ${(a, b)}$ makes an angle less than ${\frac{\pi}{4}}$ with the positive ${y}$-axis, while the angle between ${(a-b,b-a)}$ and the positive ${y}$-axis is ${\frac{\pi}{4}}$).
So

$\displaystyle \begin{array}{rl} \displaystyle 0<(a,b)\cdot (a-b,b-a)=a(a-b)+b(b-a)=a^2+b^2-2ab.\end{array}$

35. (A proof by Lam Wai Kit)
Without loss of generality assume ${a, b>0}$. Show that ${\frac{2x}{ 1+x^2 } \leq 1 }$ by calculus and put ${x=\frac{a}{b}}$.
 Remark 1 It’s always tempting to generalize one result to different settings. The different proofs of the very simple inequality ${a^2+b^2\ge 2ab}$ provide us some insights to obtain further generalizations. For example, how many of the above proofs can be adapted to prove the slightly more complicated (but still quadratic) inequality $\displaystyle \begin{array}{rl} \displaystyle ab+bc+ca\le a^2+b^2+c^2?\end{array}$ How many don’t, and why? How about the “weighted” version $\displaystyle \begin{array}{rl} \displaystyle ab\le pa^{\frac{1}{p}}+qb^{\frac{1}{q}},\end{array}$ where ${p+q=1}$? Or the cubic version $\displaystyle \begin{array}{rl} \displaystyle 3abc\le |a|^3+|b|^3+|c|^3?\end{array}$ What about the continuous versions? E.g. can one prove $\displaystyle \begin{array}{rl} \displaystyle \exp\left(\int_{0}^{1} \log f(x)dx\right)\le \int_{0}^{1}f(x)dx\end{array}$ for any positive integrable function ${f}$? (Why this is the analogue?). Is there any continuous analogue of the above methods to prove this version? Etc.