Constrained Optimization and lagrange Multipliers
This tutorial presents an introduction to optimization problems that involve finding a maximum or a minimum value of an objective function f(x1,x2,…,xn) subject to a constraint of the form g(x1,x2,…,xn) = k.
Maximum and Minimum
Finding optimum values of the function f(x1,x2,…,xn) without a constraint is a well known problem dealt with in calculus courses. One would normally use the gradient to find stationary points. Then check all stationary and boundary points to find optimum values.
Example
- f(x,y) = 2x2 + y2
- fx(x,y) = 4x = 0
- fy(x,y) = 2y = 0
f(x,y) has one stationary point at (0,0).
The Hessian
A common method of determining wether or not a function has an extreme value at a stationary point is to evaluate the hessian[3] of the function at that point. where the hessian is defined as $H(f)= \begin{bmatrix} \frac{{\partial}^2 f}{{\partial}^2 x_1} & \frac{{\partial}^2 f}{\partial x_1 \partial x_2} & \dots & \frac{{\partial}^2 f}{\partial x_1 \partial x_n} \\ \frac{{\partial}^2 f}{\partial x_2 \partial x_1} & \frac{{\partial}^2f}{{\partial}^2 x_2}& \dots & \frac{{\partial}^2f}{\partial x_n \partial x_n}\\ \vdots & \vdots & \ddots & \vdots \\ \frac{{\partial}^2f}{\partial x_n \partial x_1} & \frac{{\partial}^2f}{\partial x_n \partial x_2}& \dots & \frac{{\partial}^2f}{{\partial}^2 x_n}\\ \end{bmatrix}$
Second Derivative Test
The Second derivative test determines the optimality of stationary point x according to the following rules [2]:
- If H(f) > 0 at point x then f has a local minimum at x
- If H(f) < 0 at point x then f has a local maximum at } x
- If H(f) has negative and positive eigenvalues then x is a saddle point
- Otherwise the test is inconclusive
In the above example. $H(f)=\begin{bmatrix} 4 & 0\\ 0& 2 \end{bmatrix}$
Therefore f(x,y) has a minimum at (0,0)
Constrained Maximum and Minimum
When finding the extreme values of f(x1,x2,⋯,xn) subject to a constraint g(x1,x2,⋯,xn) = k, the stationary points found above will not work. This new problem can be thought of as finding extreme values of f(x1,x2,…,xn) when the point (x1,x2,…,xn) is restricted to lie on the surface g(x1,x2,…,xn) = k. The value of f(x1,x2,…,xn) is maximized(minimized) when the surfaces touch each other,i.e , they have a common tangent line. This means that the surfaces gradient vectors at that point are parallel, hence,
- ∇f(x1,x2,…,xn) = λ∇g(x1,x2,…,xn)
The number λ in the equation is known as the Lagrange multiplier.
Lagrange multiplier method
The Lagrange multiplier methods solves the constrained optimization problem by transforming it into a non-constrained optimization problem of the from:
- L(x1,x2,…,xn,λ) = f(x1,x2,…,xn) + λ(k−g(x1,x2,…,xn))
Then finding the gradient and hessian as was done above will determine any optimum values of L(x1,x2,…,xn,λ).
Suppose we now want to find optimum values for f(x,y) = 2x2 + y2 subject to x + y = 1 from [2].
Then the Lagrangian method will result in a non-constrained function.
- L(x,y,λ) = 2x2 + y2 + λ(1−x−y)
The gradient for this new function is
- $\frac{\partial L}{\partial x}(x,y,\lambda)= 4x+\lambda (-1)=0$
- $\frac{\partial L}{\partial y}(x,y,\lambda)= 2y+\lambda (-1)=0$
- $\frac{\partial L}{\partial \lambda}(x,y,\lambda)=1-x-y=0$
Finding the stationary points of the above equations can be obtained from their matrix from.
$\begin{bmatrix} 4 & 0 & -1 \\ 0& 2 & -1 \\ 1 & 1 & 0 \end{bmatrix} \begin{bmatrix} x\\ y \\ \lambda \end{bmatrix}= \begin{bmatrix} 0\\ 0\\ 1 \end{bmatrix}$
This results in x = 1/3, y = 2/3, λ = 4/3.
Next we can use the hessian as before to determine the type of this stationary point.
$H(L)= \begin{bmatrix} 4 & 0 & 0 \\ 0& 2 & 0 \\ 0&0&0 \end{bmatrix}$
Since H(L) > 0 then the solution (1/3,2/3,4/3) minimizes f(x,y) = 2x2 + y2 subject to x + y = 1 with f(x,y) = 2/3.
References
[1] T.K. Moon and W.C. Stirling. Mathematical Methods and Algorithms for Signal Processing. Prentice Hall. 2000.
[2]http://mat.gsia.cmu.edu/QUANT/NOTES/chap4/node1.html
[3]http://www.ece.tamu.edu/~chmbrlnd/Courses/ECEN601/ECEN601-Chap3.pdf