## Understanding Lagrange multipliers (1)

SECTION 1     INTUITION

The first way to understand is: by picture.

One equality constraint:

For the first picture below, the problem is to minimize $f(x,y)$ subject to $g(x,y)=c$ . The necessary condition is $\nabla f=\lambda \nabla g$. Treat $\nabla f$ as the direction of the movement of $f$ and $\nabla g$ as the normal vector of the curve $g(x,y)=c$. One can observe the optimal value is attained when $\nabla f$ is parallel to $\nabla g$.

More than 1 constraint:

For pictures involving more than one constraint, if it is illustrated in the plane, then usually the feasible set is just one point. However I cannot find a 3-D picture. Maybe first consider a problem with inequality constraints. The formulae are similar.

This time the problem is to maximize $f(x)$, $x\in\mathbb{R}^2$ subject to three inequality constraints $g_i(x)\leq 0$ for $i=1,2,3$. The yellow region above is the feasible set.

May ignore the set $C$ in the figure first. $H$ represents a hyperplane under $f$, that means $H=\{f=\alpha\}$ for some number $\alpha$$x^*$ is a maximizer of the problem as if $H$ is moved further towards the direction of $\nabla f(x^*)$ (i.e. value of $f$ increases), it will leave the feasible set. This means at $x^*$, $f(x^*)$ attains maximum.

Same as before, $\nabla g_1(x^*)$ and $\nabla g_2(x^*)$ are the normal vectors of $g_1(x)=0$ and $g_2(x)=0$ at the point $x^*$. Then observe from the figure that $\nabla f(x^*)$ can be expressed as a linear combination of $\{\nabla g_i(x^*)\}_{i=1,2,3}$ , where the coefficient of $\nabla g_3(x^*)$ is zero. The meaning is that since $g_3$ is inactive at $x^*$ (i.e. $g_3(x^*)<0$), it has zero effect on the point $x^*$.

——————————————–

Next I formulate the example just above in a little bit more mathematical way.

Given smooth functions (e.g. $C^1$) $f,g_1,g_2,g_3:\mathbf{R}^2\rightarrow\mathbf{R}$. Want to solve the problem $(\mathcal{P})$:

$(\mathcal{P})$            min $f(x)$
subject to  $g_i(x)\leq 0$ for $i=1,2,3$

Then the first order necessary condition is stated as follows:

Theorem 1.1     Suppose $x^*$ is a local minimizer for the problem $(\mathcal{P})$, and a certain regularity condition (later) holds. Then there exists $\lambda_i$, $i=1,2,3$ such that

$\lambda_i\geq 0$, $g_i(x^*)\leq 0$ for all $i$,
$\displaystyle\nabla f(x^*)+\sum^3_{i=1}{\lambda_i\nabla g_i(x^*)}=0$, and
$\lambda_i g_i(x^*)=0$ for all $i$

Each sentence can be “interpreted” from picture as follows.

First notice that the picture concerns a maximization problem but the theorem discusses minimization. Of course we have $g_i(x^*)\leq 0$ for all $i$. Observe from the picture that $x^*$ is also a local solution of the problem:

min $f(x)$
subject to $g_i(x)=0$ for $i=1,2$

Apply the Lagrange multiplier rule in multivariable calculus, we have $\nabla f(x^*)+\lambda_1\nabla g_1(x^*)+\lambda_2\nabla g_2(x^*)=0$. From the picture, $\lambda_1,\lambda_2\leq 0$, so in our case, $\lambda_1,\lambda_2\geq 0$. Put $\lambda_3=0$, we thus have $\displaystyle\nabla f(x^*)+\sum^3_{i=1}{\lambda_i\nabla g_i(x^*)}=0$$\lambda_i\geq 0$ and $\lambda_i g_i(x^*)=0$ for all $i$.

In addition, notice that if $x^*$ is in the interior of the feasible set, that is, $g_i(x^*)<0$ for all $i$ (This is not the case in the above picture, but is the case in $f(x,y)=x^2+y^2$ on $[-1,1]\times[-1,1]$), $x^*$ is then a local minimizer of the unconstrained problem $\min{f(x)}$. Since differentiation is a local notion, we can apply the well known Fermat rule that $\nabla f(x^*)=0$ (Remark: although Fermat is renowned in his work on theorems (proved) about numbers, he also gave us a criterion of finding maximum and minimum of a function by looking whether the tangent line at a certain point on the graph of the function is horizontal). We see that this does not contradict the results in Theorem 1.1, as we can take all $\lambda_i$ to be zero.

——————————————–

In fact, the above are the materials covered in a standard undergraduate course on nonlinear programming, so I hope my explanations can be understood by undergrad math students, especially the part concerning inequality constraints.

Perhaps the above describes understanding the very basics of Lagrange multipliers, but it may be not enough for some readers, at least for me. There are still many questions that we can ask.

Question 1: How to prove Theorem 1.1? The above arguments can only taken as explanations but not a rigorous proof.

Question 2: What is the regularity condition? Classically, it is related to the rank of the Jacobian matrix of $g=(g_1,g_2,g_3)^T$. Then the proof relies on the implicit function theorem. Actually this condition is too strong. At least we can prove the results under a milder regularity, and without using the implicit function thoerem.

Question 3: Can we describe the above “geometric” (or “pseudo-geometric”)  phenomenon by rigorous mathematical language?

Question 4: Any other ways to look at this?

Before we proceed, as Question 2 mentions the regularity condition, the following is a simple example showing why regularity is required.

Example 1.2     Consider the problem

$\min{f(x)}=x_1+x_2$
s.t. $g_1(x)=(x_1+1)^2+x_2^2-1=0$
$g_2(x)=(x_1-2)^2+x_2^2-4=0$

illustrated in the following picture:

$x^*=(0,0)$ is the only element in the feasible set. Hence it is the solution. Observe from the picture that $\nabla f(x^*)$ cannot be written as a linear combination of $\nabla g_1(x^*)$ and $\nabla g_2(x^*)$. In other words, Lagrange multipliers do not exist. Recall in multivariable calculus that they exist if $\nabla g_1(x^*)$ and $\nabla g_2(x^*)$ are linearly independent (this is the regularity condition). However, this is not the case in this example.

Advertisements
This entry was posted in Analysis, Applied mathematics. Bookmark the permalink.

### 3 Responses to Understanding Lagrange multipliers (1)

1. wongting says:

Is there a picture for multiple constraints? In that case, grad f = \sum \lambda_i grad g_i.

2. kingleunglee says:

In fact, my understanding is regarding $z=f(x,y)-\lambda g(x,y)$ as an new function with 1 higher degree. then, $d _{\lambda}z=0$ iff $g(x,y)=0$.
(I just forget how to type the notation of partial d )

• KKK says:

I don’t understand what you are talking about. Can you elaborate?