This post aims at proving a standard result in quadratic programming called S-Lemma. The use of this result will be obvious in a future post. We begin with a lemma.

Lemma 1.           Let P, Q be two n \times n real symmetric matrices such that \text{tr} (P)\geq 0 and \text{tr} (Q) < 0. Then there exists e \in \mathbb{R}^n such that e^T Pe \geq 0 and e^T Q e < 0.

Proof. First write Q=U^T\Lambda U where U is orthogonal and \Lambda is diagonal. Also consider a random vector \xi = (\xi_1,\cdots,\xi_n) where each \xi_i takes values \pm 1 with probability 1/2-1/2, and \xi_i‘s are independent. Since \mathbb{E}(\xi_i \xi_j) = \delta_{ij} (Kronecker delta),

\mathbb{E}((U\xi)^T P (U\xi)) = \mathbb{E}(\xi^T (UPU^T) \xi) = \text{tr}(UPU^T) = \text{tr}(P)\geq 0.

Thus there is a vector \bar{\xi} such that (U\bar{\xi})^T P (U \bar{\xi})\geq 0. Set e \triangleq U \bar{\xi}. Then e^T P e \geq 0. On the other hand, for any \xi \in \mathbb{R}^n,

(U\xi)^T Q (U^T \xi) = (U^T \xi)^T U^T \Lambda U (U^T \xi) = \xi^T \Lambda \xi = \text{tr}(\Lambda) = \text{tr}(Q)<0.

Thus our choice of e does the job. \square

Then we go to the main course. When we say X\succeq Y it means X,Y are symmetric matrices and X-Y is positive semidefinite.

Theorem 2 (S-Lemma).          Let A,B be two n \times n real symmetric matrices. Suppose there is \bar{x} such that \bar{x}^T A \bar{x} > 0. Then the following are equivalent:

(1) If x is such that x^T A x \geq 0, then x^T B x\geq 0.

(2) There is \lambda \geq 0 such that B \succeq \lambda A (i.e., x^T Bx \geq \lambda x^T Ax for any x).

Proof. (2) clearly implies (1). To show the converse, suppose (1) holds. By the existence of \bar{x} in the statement and (1) we know the problem

(i)          \displaystyle \min_x \{x^T Bx: x^T Ax \geq 0, x^T x = 1\}

is strictly feasible and has nonnegative value. We consider the following semidefinite relaxation of (i) (comes from replacing x x^T by X):

(ii)          \displaystyle \min_X \{\langle B,X\rangle: \langle A,X\rangle \geq 0, \text{ tr}(X) = 1,X\succeq 0\}

One can show by linear algebra that \{X\succeq 0: \text{ tr}(X)=1\} is compact in \{X: X\succeq 0\}, hence (ii) is bounded from below. By setting X to be the normalized (trace = 1) matrix of \bar{x}\bar{x}^T + \sigma I for small positive \sigma one sees that (ii) is strictly feasible (i.e., \langle A,X\rangle >0, \text{tr}(X)=1 and X is symmetric positive definite). By the strong duality of conic linear programming (Remark: I don’t have a good reference of this now, probably I will write another post discussing this topic which is indeed interesting), the dual problem of (ii), which is

(iii)          \displaystyle \max_{\lambda, \mu} \{\mu: \lambda A + \mu I \preceq B, \lambda\geq 0\} ,

has an optimal solution of which the optimal value is the same as that of (ii). So there exists \lambda\geq 0 and \mu\in \mathbb{R} such that

B \succeq \lambda A + \mu I

If the CLAIM “the optimal value of (ii) is nonnegative” is true, then strong duality says \mu \geq 0 so B \succeq \lambda A which is the conclusion. Thus it remains to prove the CLAIM as follows.

Suppose X^* is the optimal solution for (ii). Since X^*\succeq 0, by diagonalization we know there is D such that X = DD^T. Let P = D^T AD and Q = D^T BD. Then

\text{tr}(P) = \text{tr}(D^T AD) = \text{tr} (ADD^T) = \text{tr}(AX^*)\geq 0

\text{tr}(Q) = \text{tr}(D^T BD) = \text{tr}(BDD^T) = \text{tr}(BX^*)

We want to show \text{tr}(BX^*)\geq 0. Suppose not. Then \text{tr}(Q) = \text{tr}(BX^*)<0. So by Lemma 1 there is e such that e^T Pe \geq 0 and e^T Q e <0. Unwinding these and set x \triangleq De, we see that

x^T A x = (De)^T A (De) = e^T P e \geq 0

x^T Bx = (De)^T B (De) = e^T Q e <0

contradicting the assumption (1). \square

Since x^T Ax and x^T Bx are homogeneous quadratic polynomials, some people call the above S-Lemma the homogeneous S-Lemma. This result also works for inhomogeneous quadratic polynomials. It follows from Theorem 2 and homogenization:

Corollary 3.           Let p(x) = x^T A x + 2 a^T x + \alpha and g_1(x) = x^T A_1 x + 2a_1^T x + \alpha_1 where A,A_1 are symmetric n \times n matrices, \alpha,\alpha_1\in \mathbb{R}^n and \alpha,\alpha_1\in \mathbb{R}. Notice that

p(x) = y^T P y and g_1(x) = y^T P_1 y

where  y =\begin{bmatrix} 1 \\ x \end{bmatrix}, P = \begin{bmatrix} \alpha & a^T \\ a& A\end{bmatrix}P_1 = \begin{bmatrix} \alpha_1 & a_1^T \\ a_1 & A_1\end{bmatrix}.

Suppose there exists x_0\in \mathbb{R}^n with g_1(x_0)>0. Then the following assertions are equivalent:

(1) If x is such that g_1(x)\geq 0, then p(x)\geq 0.

(2) There exists \lambda\geq 0 such that P \succeq \lambda P_1.

(3) There exists \lambda\geq 0 such that p(x)\geq \lambda g_1(x) for all x\in \mathbb{R}^n.

Reference of this article: A. Nemirovski, Lectures on Modern Convex Optimization.

This entry was posted in Applied mathematics, Linear Algebra, Optimization. Bookmark the permalink.

4 Responses to S-Lemma

  1. CHEN ZHI says:

    Hi, thanks for sharing this post. I have some little questions as below:
    (a) (ii) indeed is equivalent to (i) rather than relaxation because X always positive definite.
    (b) The CLAIM is just the restatement of (1) because every feasible x in (i) (i.e., X in (ii)), x’Bx>=0, so the optimal value of (ii) is nonnegative.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s