## S-Lemma

This post aims at proving a standard result in quadratic programming called S-Lemma. The use of this result will be obvious in a future post. We begin with a lemma.

Lemma 1.           Let $P, Q$ be two $n \times n$ real symmetric matrices such that $\text{tr} (P)\geq 0$ and $\text{tr} (Q) < 0$. Then there exists $e \in \mathbb{R}^n$ such that $e^T Pe \geq 0$ and $e^T Q e < 0$.

Proof. First write $Q=U^T\Lambda U$ where $U$ is orthogonal and $\Lambda$ is diagonal. Also consider a random vector $\xi = (\xi_1,\cdots,\xi_n)$ where each $\xi_i$ takes values $\pm 1$ with probability 1/2-1/2, and $\xi_i$‘s are independent. Since $\mathbb{E}(\xi_i \xi_j) = \delta_{ij}$ (Kronecker delta),

$\mathbb{E}((U\xi)^T P (U\xi)) = \mathbb{E}(\xi^T (UPU^T) \xi) = \text{tr}(UPU^T) = \text{tr}(P)\geq 0$.

Thus there is a vector $\bar{\xi}$ such that $(U\bar{\xi})^T P (U \bar{\xi})\geq 0$. Set $e \triangleq U \bar{\xi}$. Then $e^T P e \geq 0$. On the other hand, for any $\xi \in \mathbb{R}^n$,

$(U\xi)^T Q (U^T \xi) = (U^T \xi)^T U^T \Lambda U (U^T \xi) = \xi^T \Lambda \xi = \text{tr}(\Lambda) = \text{tr}(Q)<0$.

Thus our choice of $e$ does the job. $\square$

Then we go to the main course. When we say $X\succeq Y$ it means $X,Y$ are symmetric matrices and $X-Y$ is positive semidefinite.

Theorem 2 (S-Lemma).          Let $A,B$ be two $n \times n$ real symmetric matrices. Suppose there is $\bar{x}$ such that $\bar{x}^T A \bar{x} > 0$. Then the following are equivalent:

(1) If $x$ is such that $x^T A x \geq 0$, then $x^T B x\geq 0$.

(2) There is $\lambda \geq 0$ such that $B \succeq \lambda A$ (i.e., $x^T Bx \geq \lambda x^T Ax$ for any $x$).

Proof. (2) clearly implies (1). To show the converse, suppose (1) holds. By the existence of $\bar{x}$ in the statement and (1) we know the problem

(i)          $\displaystyle \min_x \{x^T Bx: x^T Ax \geq 0, x^T x = 1\}$

is strictly feasible and has nonnegative value. We consider the following semidefinite relaxation of (i) (comes from replacing $x x^T$ by $X$):

(ii)          $\displaystyle \min_X \{\langle B,X\rangle: \langle A,X\rangle \geq 0, \text{ tr}(X) = 1,X\succeq 0\}$

One can show by linear algebra that $\{X\succeq 0: \text{ tr}(X)=1\}$ is compact in $\{X: X\succeq 0\}$, hence (ii) is bounded from below. By setting $X$ to be the normalized (trace $= 1$) matrix of $\bar{x}\bar{x}^T + \sigma I$ for small positive $\sigma$ one sees that (ii) is strictly feasible (i.e., $\langle A,X\rangle >0$, $\text{tr}(X)=1$ and $X$ is symmetric positive definite). By the strong duality of conic linear programming (Remark: I don’t have a good reference of this now, probably I will write another post discussing this topic which is indeed interesting), the dual problem of (ii), which is

(iii)          $\displaystyle \max_{\lambda, \mu} \{\mu: \lambda A + \mu I \preceq B, \lambda\geq 0\}$,

has an optimal solution of which the optimal value is the same as that of (ii). So there exists $\lambda\geq 0$ and $\mu\in \mathbb{R}$ such that

$B \succeq \lambda A + \mu I$

If the CLAIM “the optimal value of (ii) is nonnegative” is true, then strong duality says $\mu \geq 0$ so $B \succeq \lambda A$ which is the conclusion. Thus it remains to prove the CLAIM as follows.

Suppose $X^*$ is the optimal solution for (ii). Since $X^*\succeq 0$, by diagonalization we know there is $D$ such that $X = DD^T$. Let $P = D^T AD$ and $Q = D^T BD$. Then

$\text{tr}(P) = \text{tr}(D^T AD) = \text{tr} (ADD^T) = \text{tr}(AX^*)\geq 0$

$\text{tr}(Q) = \text{tr}(D^T BD) = \text{tr}(BDD^T) = \text{tr}(BX^*)$

We want to show $\text{tr}(BX^*)\geq 0$. Suppose not. Then $\text{tr}(Q) = \text{tr}(BX^*)<0$. So by Lemma 1 there is $e$ such that $e^T Pe \geq 0$ and $e^T Q e <0$. Unwinding these and set $x \triangleq De$, we see that

$x^T A x = (De)^T A (De) = e^T P e \geq 0$

$x^T Bx = (De)^T B (De) = e^T Q e <0$

contradicting the assumption (1). $\square$

Since $x^T Ax$ and $x^T Bx$ are homogeneous quadratic polynomials, some people call the above S-Lemma the homogeneous S-Lemma. This result also works for inhomogeneous quadratic polynomials. It follows from Theorem 2 and homogenization:

Corollary 3.           Let $p(x) = x^T A x + 2 a^T x + \alpha$ and $g_1(x) = x^T A_1 x + 2a_1^T x + \alpha_1$ where $A,A_1$ are symmetric $n \times n$ matrices, $\alpha,\alpha_1\in \mathbb{R}^n$ and $\alpha,\alpha_1\in \mathbb{R}$. Notice that

$p(x) = y^T P y$ and $g_1(x) = y^T P_1 y$

where  $y =\begin{bmatrix} 1 \\ x \end{bmatrix}$, $P = \begin{bmatrix} \alpha & a^T \\ a& A\end{bmatrix}$$P_1 = \begin{bmatrix} \alpha_1 & a_1^T \\ a_1 & A_1\end{bmatrix}$.

Suppose there exists $x_0\in \mathbb{R}^n$ with $g_1(x_0)>0$. Then the following assertions are equivalent:

(1) If $x$ is such that $g_1(x)\geq 0$, then $p(x)\geq 0$.

(2) There exists $\lambda\geq 0$ such that $P \succeq \lambda P_1$.

(3) There exists $\lambda\geq 0$ such that $p(x)\geq \lambda g_1(x)$ for all $x\in \mathbb{R}^n$.

This entry was posted in Applied mathematics, Linear Algebra, Optimization. Bookmark the permalink.

### 4 Responses to S-Lemma

1. CHEN ZHI says:

Hi, thanks for sharing this post. I have some little questions as below:
(a) (ii) indeed is equivalent to (i) rather than relaxation because X always positive definite.
(b) The CLAIM is just the restatement of (1) because every feasible x in (i) (i.e., X in (ii)), x’Bx>=0, so the optimal value of (ii) is nonnegative.

• CHEN ZHI says:

Sorry, TYPO. X is always positive semidefinite.

• Hon Leung says:

No, I don’t think (i) and (ii) are equivalent because not every $X\succeq 0$ can be decomposed into $xx^\top$.