**SECTION 1 INTUITION**

The first way to understand is: by picture.

One equality constraint:

For the first picture below, the problem is to minimize subject to . The necessary condition is . Treat as the direction of the movement of and as the normal vector of the curve . One can observe the optimal value is attained when is parallel to .

More than 1 constraint:

For pictures involving more than one constraint, if it is illustrated in the plane, then usually the feasible set is just one point. However I cannot find a 3-D picture. Maybe first consider a problem with inequality constraints. The formulae are similar.

This time the problem is to maximize , subject to three inequality constraints for . The yellow region above is the feasible set.

May ignore the set in the figure first. represents a hyperplane under , that means for some number . is a maximizer of the problem as if is moved further towards the direction of (i.e. value of increases), it will leave the feasible set. This means at , attains maximum.

Same as before, and are the normal vectors of and at the point . Then observe from the figure that can be expressed as a linear combination of , where the coefficient of is zero. The meaning is that since is inactive at (i.e. ), it has zero effect on the point .

——————————————–

Next I formulate the example just above in a little bit more mathematical way.

Given smooth functions (e.g. ) . Want to solve the problem :

min

subject to for

Then the first order necessary condition is stated as follows:

**Theorem 1.1 **Suppose is a local minimizer for the problem , and a certain regularity condition (later) holds. Then there exists , such that

, for all ,

, and

for all

Each sentence can be “interpreted” from picture as follows.

First notice that the picture concerns a maximization problem but the theorem discusses minimization. Of course we have for all . Observe from the picture that is also a local solution of the problem:

min

subject to for

Apply the Lagrange multiplier rule in multivariable calculus, we have . From the picture, , so in our case, . Put , we thus have , and for all .

In addition, notice that if is in the interior of the feasible set, that is, for all (This is not the case in the above picture, but is the case in on ), is then a local minimizer of the unconstrained problem . Since differentiation is a local notion, we can apply the well known Fermat rule that (Remark: although Fermat is renowned in his work on theorems (proved) about numbers, he also gave us a criterion of finding maximum and minimum of a function by looking whether the tangent line at a certain point on the graph of the function is horizontal). We see that this does not contradict the results in **Theorem 1.1**, as we can take all to be zero.

——————————————–

In fact, the above are the materials covered in a standard undergraduate course on nonlinear programming, so I hope my explanations can be understood by undergrad math students, especially the part concerning inequality constraints.

Perhaps the above describes understanding the very basics of Lagrange multipliers, but it may be not enough for some readers, at least for me. There are still many questions that we can ask.

**Question 1**: How to prove **Theorem 1.1**? The above arguments can only taken as explanations but not a rigorous proof.

**Question 2**: What is the regularity condition? Classically, it is related to the rank of the Jacobian matrix of . Then the proof relies on the implicit function theorem. Actually this condition is too strong. At least we can prove the results under a milder regularity, and without using the implicit function thoerem.

**Question 3**: Can we describe the above “geometric” (or “pseudo-geometric”) phenomenon by rigorous mathematical language?

**Question 4**: Any other ways to look at this?

Before we proceed, as Question 2 mentions the regularity condition, the following is a simple example showing why regularity is required.

**Example 1.2** Consider the problem

s.t.

illustrated in the following picture:

is the only element in the feasible set. Hence it is the solution. Observe from the picture that cannot be written as a linear combination of and . In other words, Lagrange multipliers do not exist. Recall in multivariable calculus that they exist if and are linearly independent (this is the regularity condition). However, this is not the case in this example.

Is there a picture for multiple constraints? In that case, grad f = \sum \lambda_i grad g_i.

In fact, my understanding is regarding as an new function with 1 higher degree. then, iff .

(I just forget how to type the notation of partial d )

I don’t understand what you are talking about. Can you elaborate?