## Introduction to linear programming

1. Introduction

This is intended to be supplementary notes for the course MATH3210 Linear Programming. My purpose is to aid your (and my) understanding by outlining the conceptual structure in just a few pages. I follow the lecture notes closely (you may download them from Professor Wei’s website). For detailed proofs of the theorems, refer to the lecture notes.

Linear programming is an subfield of (convex) optimization. We first formulate the linear programming problem (LPP). Using some linear algebra and convex analysis, we can interpret the problem geometrically. We then develop the simplex method, which relies heavily on basic feasible solution (BFS). We study carefully its implementation. A beautiful concept in linear programming is duality. We recall its key theorems and look at some applications. Finally we apply the theory to study transportation problems.

2. Formulation of LPP

In a LPP, we maximize or minimize a linear functional subject to linear constraints. In feasible canonical form (FCF), the problem is

$\text{maximize } z = c_1x_1 + ... + c_nx_n$

subject to the constraints

$\sum_{j = 1}^n a_{ij} x_j \leq b_i$$i = 1, ..., m$, $x_1, ..., x_n \geq 0$.

It is both convenient and advantageous to use matrix notation. Thus we have the problem

$\displaystyle \text{maximize } z = {\bf c}^T\textbf{x} \text{ subject to } A{\bf x} \leq \textbf{b}, {\bf x} \geq {\bf 0}.$

Another “canonical representation” is the standard form (SF), which is more convenient mathematically since we now have equations.

$\displaystyle (P) \text{ maximize } z = {\bf c}^T{\bf x} \text{ subject to } A{\bf x} = {\bf b}, {\bf x} \geq \textbf{0}.$

We can use either form with no loss of generality. For example, we can transform an LPP from FCF to SF by adding appropriate slack variables.

Until further notice we consider problem ${(P)}$ in SF. Technically, we assume that

$\displaystyle A \in M_{m \times n}({\mathbb{R}}), n > m, \ rank(A) = m.$

The feasible region is ${F = \{{\bf x} \in {\mathbb{R}}: {\bf x} \geq \textbf{0}, A{\bf x} = {\bf b}\}}$. It is a closed convex set (which may be unbounded). Its elements are called feasible solutions.

If ${{\bf x}^* \in F}$ and ${z^* = {\bf c}^T{\bf x}^* \geq {\bf c}^T{\bf x}}$ for all ${{\bf x} \in F}$, we call ${{\bf x}}$ an optimal solution (not necessarily unique), and call ${z^*}$ the optimal value (which is unique).

The first objective of this course is to develop the simplex method which decides whether the LPP has an optimal solution, and, if exists, find one. The theory uses concepts in linear algebra, our next topic.

3. Linear algebra

In secondary school, you should have learned, while sliding your ruler, that we only need to search for the ‘corners’. In convex analysis, these are called extreme points.

Definition 1.6. Let ${C \subset {\mathbb{R}}^n}$ be convex. An element ${x \in C}$ is an extreme point of ${C}$ if

$\displaystyle x = \lambda x_1 + (1 - \lambda) x_2, \ x_1, x_2 \in C, \ \lambda \in (0, 1) \Rightarrow x_1 = x_2 = x.$

Theorem 2.5. If the LPP ${(P)}$ has a finite optimal value, then the optimal value can be attained at an extreme point of the feasible region ${F}$.

Algebraically, an extreme point is a BFS of the linear system.

Definition 1.1. Consider the linear system ${A{\bf x} = {\bf b}, {\bf x} \geq \textbf{0}}$ of ${(P)}$. Pick ${m}$ linearly independent columns of ${A}$ and form a square matrix ${B}$. The solution ${{\bf x}_B}$ of the system ${B{\bf x}_B = {\bf b}}$ is called a basic solution (BS) with basic matrix ${B}$. If also ${{\bf x}_B \geq {\bf 0}}$, then ${{\bf x}_B}$ is called a basic feasible solution (BFS). We identify it with a feasible solution of ${(P)}$ by letting the non-basic variables be ${0}$.

Theorem 2.3, 2.4. The BFS of the linear system ${A{\bf x} = {\bf b}, {\bf x} \geq {\bf 0}}$ correspond precisely to the extreme points of the feasible region ${F}$.

Armed with these theorems, to solve an LPP, it suffices to search through all BFS by brute force and pick one with the largest objective value. However, observe that the number of BFS is roughly of the order

$\displaystyle \left( \begin{array}{c} n \\ m \\ \end{array} \right) = \frac{n!}{(n - m)!m!}.$

For large ${n}$ and ${m}$, it is simply infeasible to compute all BFS. For example, if ${n = 100}$ and ${m = 30}$, there are about ${2.9 \times 10^{25}}$ possible basic matrices.

Clearly, we need a systematic way to search through the BFS. In his ground-breaking work in 1947, George Dantzig invented the now famous simplex method that not only solves the problem but also provides a sound theoretical basis to analyze LPPs.

4. Simplex method

Again we work with a given LPP ${(P)}$ in SF. Suppose we have a BFS ${[{\bf x}_B, {\bf x}_N]^T}$ (${{\bf x}_N = 0}$) and ${A = [B \ N]}$, where ${B}$ is the basic matrix. How do we know whether it is optimal, and, if not, how to find a better one?

The answer is based on the following calculation. Let ${z_0}$ be the current objective value. Then

$\displaystyle \begin{array}{rcl} {\bf c}_B^T{\bf x}_B + {\bf c}_N^T{\bf x}_N &=& z_0\\ B{\bf x}_B + N{\bf x}_N &=& {\bf b} \end{array}$

We may transform it to get

$\displaystyle \begin{array}{rcl} {\bf c}_B^TB^{-1}b + ({\bf c}_N^T - {\bf c}_B^TB^{-1}N){\bf x}_N &=& z_0\\ {\bf x}_B + B^{-1}N{\bf x}_N &=& B^{-1}{\bf b} \end{array}$

(In practice, the transformation is done by row operations. The inverse ${B^{-1}}$, though useful in theory,  need not be computed.) Since ${{\bf x}_N = {\bf 0}}$, the current BFS and ${z_0}$ can be read off directly. Now, suppose that the row vector ${{\bf c}_N^T - {\bf z}_N^T := {\bf c}_N^T - {\bf c}_B^TB^{-1}N}$ contains a positive entry corresponding to ${x_j}$. (They are called reduced cost coefficients.) Then by changing ${x_j}$ from ${0}$ to positive, we may increase the objective value (see below) . On the other hand, if ${{\bf c}_N^T - {\bf z}_N^T \leq {\bf 0}}$, it is easy to see that the current BFS is optimal, i.e. for any feasible solution ${{\bf x}}$, ${{\bf c}^T{\bf x} \leq z_0}$.

Consider again the case that ${[x_B, x_N]^T}$ is not optimal. If we increase ${x_j}$, some basic variables may decrease and we need to check that the new solution is feasible. Let us change notation and write ${B{\bf x}_B = {\bf b}}$ as

$x_{B_1}{\bf b}_1 + ... + x_{B_m}{\bf b}_m = {\bf b}$, —(1)

where ${B = [{\bf b}_1, ..., {\bf b}_m]}$. On the other hand,

$y_{1j}{\bf b}_1 + ... + y_{mj}{\bf b}_j - {\bf a}_j = {\bf 0}$ —(2)

for some uniquely determined coefficients ${y_{ij}}$. To get a new BFS, we consider (1) minus a suitable multiple of (2), so that the coefficent of ${\bf b}_r$ becomes ${0}$ and ${x_j}$ becomes positive. This leads to the feasibility creteria that ${r}$ should be chosen with

$\displaystyle \frac{x_{B_r}}{y_{rj}} = \min\{\frac{x_{B_i}}{y_{ij}}: y_{ij} > 0\}.$

If such an ${r}$ exists, we obtain a new BFS. We may then apply the optimality criteria to this new BFS and iterate, until we find an optimal solution. This is the basic idea of simplex method.

Theorem 2.6, 2.7. Let ${{\bf x}_B}$ be a BFS of ${(P)}$ with basic matrix ${B}$ and objective value ${z_0}$. If (i) there exists a column ${{\bf a}_j}$ not in B such that ${c_j > z_j}$ and (ii) at least one ${y_{ij} > 0}$, then it is possible to obtain a new BFS by replacing one column in ${B}$ by ${{\bf a}_j}$, and the new objective value is at least ${z}$. If (i) does not hold, then ${{\bf x}_B}$ is optimal.

The above procedure can be rephrased in terms of the tableau (Section 3.1 and 3.2). In theory, we should also show that the simplex method solves the LPP within a ‘reasonable’ number of steps. This interesting and non-trivial question lies beyond the scope of this course. Suffice it to say that simplex method works amazingly well in actual problems.

5. Implementation of simplex method

This course emphasizes the implementation issues of simplex method.

5.1 Initialization. Suppose we standardize a LPP in FCF to get ${[A \ I][{\bf x} \ {\bf x}_s]^T = {\bf b}}$, where ${{\bf x}_s}$ is the vector of slack variables. We may simply pick ${B = I}$ and ${{\bf x}_B = {\bf x}_s = {\bf x}}$. However, given an arbitrary LPP in SF, a BFS may not been seen directly. Mathematicians have developed two methods for initialization. Both methods involve introducing artificial variables ${{\bf x}_a}$, so that the constraint becomes ${A{\bf x} + {\bf x}_a = {\bf b}}$, ${{\bf x}, {\bf x}_a \geq {\bf 0}}$, and we may pick ${{\bf x}_a}$ as a BFS of the new system.

In the ${M}$-method (Section 3.3), we maximize instead the functional ${{\bf c}^Tx - M \cdot {\bf 1}^T {\bf x}_a}$, where ${M}$ is very large. The idea is that since ${-M}$ penalizes the objective value, the artificial variables will be eliminated during simplex iteration. If all artificial variables are eliminated, we have found a BFS of ${(P)}$, and we may continue with the usual simplex method.

In the two-phase method (Section 3.4), we maximize instead the functional ${- {\bf 1}^T {\bf x}_a}$. Observe that the optimal value of the new problem is ${0}$ if and only if (P) is feasible, and any optimal BFS is a BFS of (P).

5.2 Dealing with ‘bad’ situations. In Theorem 2.6, 2.7 we saw that we need a ${y_{ij} > 0}$ in order to get a new BFS. If ${y_{ij} \leq 0}$ for all ${j}$, we may go infinitely far in the direction of ${x_j}$, hence the feasible region is unbounded. If also ${z_j > c_j}$, then the optimal value is infinite (Theorem 4.1).

In some situations, at an optimal BFS, the plane corresponding to ${{\bf c}}$ may intersect ${F}$ on a non-degenerate face. This implies that there are infinitely many optimal solutions, and can be detected from the ${z_0}$ row of the optimal value (Section 4.3).

A more serious problem is degeneracy, which occurs when some constraints are redundant. This may lead to cycling (Section 4.4).

6. Duality theory

Duality is a beautiful idea in linear programming. To motivate the dual problem, suppose we are given a primal problem in FCF (here the formulas become more ‘symmetric’ if we use FCF):

$\displaystyle (P') \text{ maximize } z = {\bf c}^T{\bf x} \text{ subject to } A{\bf x} \leq {\bf b}, {\bf x} \geq {\bf 0}.$

Suppose we form a non-negative linear combination ${{\bf u}^TA}$ of the rows of ${A}$, so that the resulting row vector ${{\bf u}^TA}$ dominates ${{\bf c}^T}$. Taking transpose, we get

$\displaystyle A^T{\bf u} \geq {\bf c}.$

Then we obtain an upper bound of the optimal value (write this out yourself):

$\displaystyle z \leq u_0 = {\bf b}^T{\bf u}.$

The minimization of this upper bound is the dual problem.

Definition 5.1. Let ${(P')}$ be the primal problem in FCF. The dual problem of ${(P)}$ is the LPP

$\displaystyle (D') \text{ minimize } u_0 = {\bf b}^T{\bf u} \text{ subject to } A^T{\bf u} \geq {\bf c}, {\bf u} \geq {\bf 0}.$

It is easy to see that the dual of the dual of ${(P')}$ is ${(P')}$, hence the name duality.

Similar to the dual case, each feasible solution of the primal problem gives a lower bound of the optimal value, so the optimal value is somewhere in between. Naturally, we expect that the maximum lower bound equals the minimal upper bound, and the common value is the optimal value.

Theorem. With the above notations, we have the following:

(Weak duality, Theorem 5.2) If ${{\bf x}}$ is feasible to ${(P')}$ and ${{\bf u}}$ is feasible to ${(D')}$, then ${{\bf c}^T{\bf x} \leq {\bf b}^T {\bf u}}$. In particular, if equality holds, then both solutions are optimal.

(Strong duality, Theorem 5.4) If ${(P')}$ has an optimal solution, then so does ${(D')}$.

We note that theory of simplex method are used in the proof of the strong duality theorem.

Another nice relation between ${(P')}$ and ${(D')}$ is complementary slackness:

Theorem 5.7. If ${[{\bf x}, {\bf x}_s]^T}$ is optimal to ${(P')}$ and ${[{\bf u}, {\bf u}_s]^T}$ is optimal to ${(D')}$, then ${\bf x}^T {\bf u}_s + {\bf x}_s^T{\bf u} = {\bf 0}$. Conversely, if a pair of feasible solutions satisfy (5), then both solutions are optimal.

Motivated by the duality theorems, you might think that we may use both the primal and dual problems to device an algorithm. This is possible, and we have the dual simplex method. It is just the usual simplex method applied to the dual problem, but here we work with the primal tableau. Simplex iteration generates both a dual and a primal solution with the same objective value and reduced cost coefficients. The dual solution is always a BFS but the primal solution is not necessarily feasible. When we get a feasible primal solution, we are done by weak duality.

The dual formalism allows us to view an optimal solution in terms of the primal and the dual (Theorem 5.5). This observation, together with the dual simplex method, is useful in post-optimality/sensitivity analysis.

7. Transportation problems.

In the transportation problem ${(TP)}$, we distribute goods from the sources ${\{s_i\}_{i = 1}^m}$ through channels with cost coefficients ${\{c_{ij}\}_{1 \leq i \leq m, 1 \leq j \leq n}}$ to the destinations ${\{d_j\}_{j = 1}^n}$. We assume that supply equals demand, thus ${\sum_{i = 1}^m s_i = \sum_{j = 1}^n d_j}$. The problem is to minimize the transportation cost. Thus, if ${x_{ij}}$ denotes the amount of goods transported from source ${i}$ to destination ${j}$, the problem is

$\text{minimize } \sum_{i = 1}^m \sum_{j = 1}^n c_{ij}x_{ij}$

$\text{subjec to } \sum_{j = 1}^n x_{ij} = s_i, \sum_{i = 1}^m x_{ij} = d_j, x_{ij} \geq 0$, $1 \leq i \leq m, 1 \leq j \leq n$.

It is easy to show that ${(TP)}$ always possesses an optimal solution. Of course we may apply the usual simplex method, but here the point is that we may take advantage of the special structure of the problem and reduce greatly the computational cost. To see this, arrange the decision variables in the vector

$\displaystyle {\bf x}^T = [x_{11}, .., x_{1n}, x_{21}, ..., x_{2n}, ..., x_{m1}, ..., x_{mn}]^T$

and define ${A}$, ${{\bf b}}$ and ${{\bf c}}$ accordingly. Then ${A}$ is an ${m + n}$ by ${mn}$ matrix, and its column ${{\bf a}_{ij}}$ corresponding to ${x_{ij}}$ is simply

$\displaystyle {\bf a}_{ij} = {\bf e}_i + {\bf e}_{m+j}$

where ${\{{\bf e}_k\}}$ is the standard basis of ${{\mathbb{R}}^{n+m}}$. It is not difficult to see that ${Rank(A) = m + n - 1}$ (Theorem 6.1). The structure of $A$ is exploited in the following

Theorem 6.4 (Stepping stone theorem). Let ${B}$ consists of ${n + m - 1}$ independent columns of ${A}$. For any column ${{\bf a}_{ij}}$ of ${A}$, the unqiue representation ${{\bf a}_{ij} = \sum_{\alpha\beta \text{ in } B}y_{(\alpha \beta)(ij)}{\bf a}_{\alpha \beta}}$ takes the form

$\displaystyle {\bf a}_{ij} = {\bf a}_{ii_1} - {\bf a}_{i_2i_1} + {\bf a}_{i_2i_3} +- ... + (-1)^k{\bf a}_{i_kj}.$

In particular, each ${y_{(\alpha \beta)(ij)}}$ is either ${1, -1}$ or ${0}$.

The theorem can be interpreted on the transportation tableau (Section 6.3). Starting at ${(ij)}$, you jump around the tableau along the unique closed path ${[(ij), (ii_1), (i_2i_1), ..., (i_kj)]}$. Each ${(i_li_{l+1})}$ is then a stepping stone.

The stepping stone theorem allows us to calculate ${y_{(\alpha \beta)(ij)}}$ rather quickly by diagram chasing. Moreover, at the BFS ${{\bf x}_B = B^{-1}{\bf b}}$, the reduced cost coefficients are nothing but

$\displaystyle z_{ij} - c_{ij} = c_{ii_1} - c_{i_2i_1} + c_{i_2i_3} -+ ... - c_{ij}.$.

In other words, we may just add and minus along the path. This motivates the table in Section 6.4. We may then develop the corresponding simplex iteration. Note that the transportation tableau has dimensions ${m \times n}$ instead of ${(m + n) \times (mn + m + n)}$ in the usual simplex tableau, a great reduction.

﻿