Unit C5: Optimization and Lagrange Multipliers单元 C5:最优化与拉格朗日乘数
Find and classify the extrema of multivariable functions, search closed regions for global extrema, and optimize under one or two constraints with Lagrange multipliers.寻找并判别多元函数的极值,在闭区域上搜索全局极值,并用拉格朗日乘数(Lagrange multiplier)在一个或两个约束(constraint)下求最优解。
critical point)和二阶导数判别法(second derivative test)开始,推广到闭有界区域上的全局极值(global extremum),再建立单约束与双约束下的拉格朗日乘数法。每个思想都立足于一阶条件,即梯度(gradient)为零或与约束梯度对齐;最后我们考察标准判别法失效的退化情形。Critical Points临界点
local extremum)只可能出现在所有一阶偏导数都为零之处。这样的点称为临界点(critical point)。梯度(gradient)为零是必要而非充分条件,因此一个临界点可能是极大值、极小值,或鞍点(saddle point)。Let $f(x,y)$ be defined on an open set $D\subseteq\mathbb{R}^2$. We say $f$ has a local maximum at $(a,b)$ if $f(x,y)\le f(a,b)$ for all $(x,y)$ in some disk centered at $(a,b)$, and a local minimum if the reverse inequality holds.设 $f(x,y)$ 定义在开集 $D\subseteq\mathbb{R}^2$ 上。若在某个以 $(a,b)$ 为中心的圆盘内对所有 $(x,y)$ 都有 $f(x,y)\le f(a,b)$,则称 $f$ 在 $(a,b)$ 处取得局部极大值;若反向不等式成立,则取得局部极小值。
or one of the partials fails to exist. The geometric content is that the tangent plane $z=f(a,b)+f_x(a,b)(x-a)+f_y(a,b)(y-b)$ is horizontal at an interior extremum.或某个偏导数不存在。其几何含义是:在内部极值点处,切平面 $z=f(a,b)+f_x(a,b)(x-a)+f_y(a,b)(y-b)$ 是水平的。
Going deeper: why the gradient must vanish at an interior extremum深入探讨:为何梯度必在内部极值点处为零
Suppose $f$ has a local maximum at the interior point $(a,b)$. Restrict $f$ to the horizontal line $y=b$, giving the single-variable function $g(x)=f(x,b)$. Then $g$ has a local maximum at $x=a$, so by the single-variable Fermat theorem $g'(a)=0$. But $g'(a)=f_x(a,b)$, hence $f_x(a,b)=0$.设 $f$ 在内部点 $(a,b)$ 处取得局部极大值。把 $f$ 限制在水平线 $y=b$ 上,得到单变量函数 $g(x)=f(x,b)$。则 $g$ 在 $x=a$ 处取得局部极大值,由单变量费马定理得 $g'(a)=0$。而 $g'(a)=f_x(a,b)$,故 $f_x(a,b)=0$。
The identical argument along $x=a$ with $h(y)=f(a,y)$ gives $h'(b)=f_y(a,b)=0$. Therefore $\nabla f(a,b)=\mathbf{0}$.沿 $x=a$ 用 $h(y)=f(a,y)$ 作同样论证,得 $h'(b)=f_y(a,b)=0$。因此 $\nabla f(a,b)=\mathbf{0}$。
$$g'(a)=\lim_{x\to a}\frac{f(x,b)-f(a,b)}{x-a}=f_x(a,b)=0.$$Worked Example 1.1: locating critical points例题 1.1:定位临界点
Find all critical points of $f(x,y)=x^3+y^3-3xy$.求 $f(x,y)=x^3+y^3-3xy$ 的所有临界点。
Compute the partials and set them to zero:计算偏导数并令其为零:
$$f_x=3x^2-3y=0,\qquad f_y=3y^2-3x=0.$$From the first equation $y=x^2$. Substituting into the second gives $3x^4-3x=0$, so $3x(x^3-1)=0$, giving $x=0$ or $x=1$. Then $y=x^2$ yields the points由第一个方程得 $y=x^2$。代入第二个方程得 $3x^4-3x=0$,即 $3x(x^3-1)=0$,故 $x=0$ 或 $x=1$。再由 $y=x^2$ 得到点
$$(0,0)\quad\text{and}\quad(1,1).$$Both satisfy $\nabla f=\mathbf{0}$, so these are the only critical points.两者都满足 $\nabla f=\mathbf{0}$,因此这是仅有的临界点。
Worked Example 1.2: a critical point along a line例题 1.2:沿直线的临界点
Find all critical points of $f(x,y)=x^2-2xy+y^2+2x$.求 $f(x,y)=x^2-2xy+y^2+2x$ 的所有临界点。
The partials are偏导数为
$$f_x=2x-2y+2,\qquad f_y=-2x+2y.$$Setting $f_y=0$ gives $x=y$. Substituting into $f_x=0$ gives $2x-2x+2=2\ne 0$, a contradiction. The system is therefore inconsistent, so $f$ has no critical points at all. This is honest behaviour: the quadratic part $x^2-2xy+y^2=(x-y)^2$ is degenerate (a ridge, not a bowl), and the linear term $2x$ tilts the ridge so it never flattens. A function need not have any critical point.令 $f_y=0$ 得 $x=y$。代入 $f_x=0$ 得 $2x-2x+2=2\ne 0$,矛盾。因此方程组无解,$f$ 根本没有临界点。这是合理现象:二次部分 $x^2-2xy+y^2=(x-y)^2$ 是退化的(一条山脊,而非碗状),线性项 $2x$ 又把山脊倾斜,使它永不变平。函数未必有临界点。
Worked Example 1.3: a non-differentiable critical point例题 1.3:不可微的临界点
Locate and describe the extremum of $f(x,y)=\sqrt{x^2+y^2}$, the distance from the origin.定位并描述 $f(x,y)=\sqrt{x^2+y^2}$(即到原点的距离)的极值。
Away from the origin the partials are在原点之外,偏导数为
$$f_x=\frac{x}{\sqrt{x^2+y^2}},\qquad f_y=\frac{y}{\sqrt{x^2+y^2}},$$which never both vanish, since $f_x^2+f_y^2=1$ wherever they are defined. So there is no point where $\nabla f=\mathbf{0}$. Yet $f$ has an obvious global minimum of $0$ at the origin. The resolution is that $f$ is not differentiable at $(0,0)$: the cone has a sharp tip there. The full definition of a critical point includes points where a partial derivative fails to exist, and this is exactly such a point.它们从不同时为零,因为只要有定义就有 $f_x^2+f_y^2=1$。所以不存在 $\nabla f=\mathbf{0}$ 的点。然而 $f$ 显然在原点取得 $0$ 的全局极小值。原因在于 $f$ 在 $(0,0)$ 处不可微:圆锥在那里有一个尖顶。临界点的完整定义包含偏导数不存在的点,而这正是这样一个点。
The Second Derivatives Test二阶导数判别法
Hessian)行列式的符号,连同 $f_{xx}$ 的符号,可把一个非退化临界点判别为局部极大值、局部极小值或鞍点(saddle point)。该判别法读取曲面的局部二次形状。discriminant,即黑塞行列式)Let $(a,b)$ be a critical point of $f$ with continuous second partials. The classification is:设 $(a,b)$ 是 $f$ 的一个临界点,且二阶偏导数连续。判别如下:
When $D>0$ the two eigenvalues of the Hessian share a sign, so $f_{yy}$ has the same sign as $f_{xx}$ and either may be used. When $D<0$ the eigenvalues have opposite signs, producing a directional ascent and a directional descent through the point.当 $D>0$ 时,黑塞矩阵的两个特征值同号,故 $f_{yy}$ 与 $f_{xx}$ 同号,用哪一个都可以。当 $D<0$ 时,特征值异号,于是通过该点存在一个方向上升、一个方向下降。
Going deeper: the test from the quadratic Taylor expansion深入探讨:从二次泰勒展开推导判别法
Near a critical point $(a,b)$, write $\Delta x=x-a$, $\Delta y=y-b$. Because $\nabla f(a,b)=\mathbf{0}$, the second-order Taylor expansion is在临界点 $(a,b)$ 附近,记 $\Delta x=x-a$,$\Delta y=y-b$。由于 $\nabla f(a,b)=\mathbf{0}$,二阶泰勒展开为
$$f(x,y)-f(a,b)\approx \tfrac12\big(f_{xx}\,\Delta x^2+2f_{xy}\,\Delta x\,\Delta y+f_{yy}\,\Delta y^2\big).$$Assume $f_{xx}\ne 0$ and complete the square in $\Delta x$:设 $f_{xx}\ne 0$,对 $\Delta x$ 配方:
$$Q=\frac{1}{2f_{xx}}\Big[(f_{xx}\,\Delta x+f_{xy}\,\Delta y)^2+(f_{xx}f_{yy}-f_{xy}^2)\,\Delta y^2\Big].$$The bracket contains a perfect square plus $D\,\Delta y^2$. If $D>0$ the bracket is positive for all nonzero displacements, so $Q$ has the sign of $f_{xx}$: a minimum when $f_{xx}>0$, a maximum when $f_{xx}<0$. If $D<0$ the bracket changes sign, giving a saddle. This is exactly the test.方括号内是一个完全平方加上 $D\,\Delta y^2$。若 $D>0$,则方括号对所有非零位移都为正,故 $Q$ 与 $f_{xx}$ 同号:当 $f_{xx}>0$ 为极小值,当 $f_{xx}<0$ 为极大值。若 $D<0$,方括号变号,得到鞍点。这恰好就是判别法。
Worked Example 2.1: classifying critical points例题 2.1:判别临界点
Classify the critical points of $f(x,y)=x^3+y^3-3xy$ found earlier, namely $(0,0)$ and $(1,1)$.判别先前求出的 $f(x,y)=x^3+y^3-3xy$ 的临界点,即 $(0,0)$ 和 $(1,1)$。
The second partials are $f_{xx}=6x$, $f_{yy}=6y$, $f_{xy}=-3$, so $D=36xy-9$.二阶偏导数为 $f_{xx}=6x$,$f_{yy}=6y$,$f_{xy}=-3$,故 $D=36xy-9$。
At $(0,0)$: $D=-9<0$, so the origin is a saddle point.在 $(0,0)$:$D=-9<0$,故原点是鞍点。
At $(1,1)$: $D=36-9=27>0$ and $f_{xx}=6>0$, so $(1,1)$ is a local minimum, with value $f(1,1)=-1$.在 $(1,1)$:$D=36-9=27>0$ 且 $f_{xx}=6>0$,故 $(1,1)$ 是局部极小值,值为 $f(1,1)=-1$。
Worked Example 2.2: a function with several critical points例题 2.2:具有多个临界点的函数
Find and classify all critical points of $f(x,y)=x^4+y^4-4xy+1$.求并判别 $f(x,y)=x^4+y^4-4xy+1$ 的所有临界点。
The partials are $f_x=4x^3-4y$ and $f_y=4y^3-4x$. Setting both to zero gives $y=x^3$ and $x=y^3$. Substituting, $x=(x^3)^3=x^9$, so $x^9-x=0$, that is $x(x^8-1)=0$. The real roots are $x=0,\ x=1,\ x=-1$, giving the critical points偏导数为 $f_x=4x^3-4y$ 与 $f_y=4y^3-4x$。令两者为零得 $y=x^3$ 与 $x=y^3$。代入得 $x=(x^3)^3=x^9$,故 $x^9-x=0$,即 $x(x^8-1)=0$。实根为 $x=0,\ x=1,\ x=-1$,对应临界点
$$(0,0),\qquad (1,1),\qquad (-1,-1).$$The second partials are $f_{xx}=12x^2$, $f_{yy}=12y^2$, $f_{xy}=-4$, so $D=144x^2y^2-16$.二阶偏导数为 $f_{xx}=12x^2$,$f_{yy}=12y^2$,$f_{xy}=-4$,故 $D=144x^2y^2-16$。
At $(0,0)$: $D=0-16=-16<0$, a saddle. At $(1,1)$ and $(-1,-1)$: $D=144-16=128>0$ with $f_{xx}=12>0$, so both are local minima, each with value $f=1+1-4+1=-1$. This is a classic two-well surface with a saddle ridge separating the wells.在 $(0,0)$:$D=0-16=-16<0$,为鞍点。在 $(1,1)$ 与 $(-1,-1)$:$D=144-16=128>0$ 且 $f_{xx}=12>0$,故两者都是局部极小值,值均为 $f=1+1-4+1=-1$。这是经典的双井曲面,一条鞍脊把两个井分隔开。
Worked Example 2.3: a local maximum例题 2.3:局部极大值
Classify the critical point of $f(x,y)=4+2x+2y-x^2-y^2$.判别 $f(x,y)=4+2x+2y-x^2-y^2$ 的临界点。
Setting $f_x=2-2x=0$ and $f_y=2-2y=0$ gives the single critical point $(1,1)$. The second partials are $f_{xx}=-2$, $f_{yy}=-2$, $f_{xy}=0$, so令 $f_x=2-2x=0$ 与 $f_y=2-2y=0$ 得唯一临界点 $(1,1)$。二阶偏导数为 $f_{xx}=-2$,$f_{yy}=-2$,$f_{xy}=0$,故
$$D=(-2)(-2)-0^2=4>0,\qquad f_{xx}=-2<0.$$Therefore $(1,1)$ is a local maximum, with value $f(1,1)=4+2+2-1-1=6$. Completing the square confirms it: $f=6-(x-1)^2-(y-1)^2\le 6$, so the maximum is global as well.因此 $(1,1)$ 是局部极大值,值为 $f(1,1)=4+2+2-1-1=6$。配方加以验证:$f=6-(x-1)^2-(y-1)^2\le 6$,所以该极大值也是全局的。
discriminant),而非单个二阶偏导数。正是交叉项 $f_{xy}$ 可能挽救或破坏判别结果。同样常见的是仅凭 $D>0$ 就读作"极小值";此时还须查看 $f_{xx}$ 的符号才能区分极大与极小。而 $D=0$ 绝不意味着"鞍点":它意味着判别法无结论,你必须沿过该点的曲线直接分析 $f$。Absolute Extrema on Closed Regions闭区域上的全局极值
global extremum)。它们出现在内部临界点(critical point)或边界上,因此策略是收集所有内部临界值与所有边界极值,再逐一比较。The closed-region method has three steps:闭区域法分为三步:
On a boundary curve one typically parametrizes the edge and reduces to a single-variable optimization, then checks the endpoints of each edge as well.在边界曲线上,通常将该边参数化,化为单变量最优化问题,并同时检查每条边的端点。
Worked Example 3.1: extrema on a rectangle例题 3.1:矩形上的极值
Find the absolute extrema of $f(x,y)=x^2+y^2-x-y$ on the square $R=[0,1]\times[0,1]$.求 $f(x,y)=x^2+y^2-x-y$ 在正方形 $R=[0,1]\times[0,1]$ 上的全局极值。
Interior.内部。 Setting $f_x=2x-1=0$ and $f_y=2y-1=0$ gives the single interior critical point $(\tfrac12,\tfrac12)$ with $f=\tfrac14+\tfrac14-\tfrac12-\tfrac12=-\tfrac12$.令 $f_x=2x-1=0$ 与 $f_y=2y-1=0$ 得唯一内部临界点 $(\tfrac12,\tfrac12)$,$f=\tfrac14+\tfrac14-\tfrac12-\tfrac12=-\tfrac12$。
Boundary.边界。 On $y=0$: $f=x^2-x$, minimized at $x=\tfrac12$ giving $-\tfrac14$, with endpoint values $f(0,0)=0$, $f(1,0)=0$. By symmetry the edges $x=0$, $x=1$, $y=1$ give the same range, with $f(1,1)=0$.在 $y=0$ 上:$f=x^2-x$,在 $x=\tfrac12$ 处取得极小 $-\tfrac14$,端点值 $f(0,0)=0$、$f(1,0)=0$。由对称性,边 $x=0$、$x=1$、$y=1$ 给出相同范围,且 $f(1,1)=0$。
Compare.比较。 The candidate values are $-\tfrac12$, $-\tfrac14$, and $0$. The absolute minimum is $-\tfrac12$ at $(\tfrac12,\tfrac12)$ and the absolute maximum is $0$ at the corners.候选值为 $-\tfrac12$、$-\tfrac14$ 和 $0$。全局极小值是 $-\tfrac12$,在 $(\tfrac12,\tfrac12)$ 处;全局极大值是 $0$,在各角点处。
Worked Example 3.2: extrema on a triangular region例题 3.2:三角形区域上的极值
Find the absolute extrema of $f(x,y)=1+4x-5y$ on the closed triangle $T$ with vertices $(0,0)$, $(2,0)$, and $(0,3)$.求 $f(x,y)=1+4x-5y$ 在以 $(0,0)$、$(2,0)$、$(0,3)$ 为顶点的闭三角形 $T$ 上的全局极值。
Interior.内部。 Since $f_x=4$ and $f_y=-5$ never vanish, there are no interior critical points. A linear function on a polygon attains its extrema only on the boundary, in fact at the vertices.由于 $f_x=4$ 与 $f_y=-5$ 从不为零,没有内部临界点。多边形上的线性函数只在边界上取得极值,事实上是在顶点处。
Edges.各边。 We still check each edge to be systematic. On $y=0$, $0\le x\le 2$: $f=1+4x$ runs from $1$ to $9$. On $x=0$, $0\le y\le 3$: $f=1-5y$ runs from $1$ down to $-14$. On the hypotenuse from $(2,0)$ to $(0,3)$, parametrize $x=2-2t$, $y=3t$, $t\in[0,1]$: $f=1+4(2-2t)-15t=9-23t$, decreasing from $9$ to $-14$.为系统起见仍逐边检查。在 $y=0$,$0\le x\le 2$:$f=1+4x$ 从 $1$ 变到 $9$。在 $x=0$,$0\le y\le 3$:$f=1-5y$ 从 $1$ 降到 $-14$。在从 $(2,0)$ 到 $(0,3)$ 的斜边上,参数化 $x=2-2t$,$y=3t$,$t\in[0,1]$:$f=1+4(2-2t)-15t=9-23t$,从 $9$ 递减到 $-14$。
Compare.比较。 The vertex values are $f(0,0)=1$, $f(2,0)=9$, $f(0,3)=-14$. The absolute maximum is $9$ at $(2,0)$ and the absolute minimum is $-14$ at $(0,3)$.顶点值为 $f(0,0)=1$、$f(2,0)=9$、$f(0,3)=-14$。全局极大值是 $9$,在 $(2,0)$;全局极小值是 $-14$,在 $(0,3)$。
Worked Example 3.3: a curved boundary handled with substitution例题 3.3:用代入法处理曲边界
Find the absolute extrema of $f(x,y)=x^2+2y^2-x$ on the closed disk $x^2+y^2\le 1$.求 $f(x,y)=x^2+2y^2-x$ 在闭圆盘 $x^2+y^2\le 1$ 上的全局极值。
Interior.内部。 $f_x=2x-1=0$ and $f_y=4y=0$ give the single critical point $(\tfrac12,0)$, which lies inside the disk. There $f=\tfrac14-\tfrac12=-\tfrac14$.$f_x=2x-1=0$ 与 $f_y=4y=0$ 给出唯一临界点 $(\tfrac12,0)$,它落在圆盘内部。在此 $f=\tfrac14-\tfrac12=-\tfrac14$。
Boundary.边界。 On $x^2+y^2=1$ replace $y^2=1-x^2$, with $-1\le x\le 1$. Then在 $x^2+y^2=1$ 上用 $y^2=1-x^2$ 替换,$-1\le x\le 1$。于是
$$f=x^2+2(1-x^2)-x=-x^2-x+2.$$This single-variable function has $\tfrac{d}{dx}(-x^2-x+2)=-2x-1=0$ at $x=-\tfrac12$, giving $f=-\tfrac14+\tfrac12+2=\tfrac94$. Check the interval endpoints $x=\pm 1$ (where $y=0$): $f(1,0)=0$ and $f(-1,0)=2$.这个单变量函数在 $x=-\tfrac12$ 处有 $\tfrac{d}{dx}(-x^2-x+2)=-2x-1=0$,给出 $f=-\tfrac14+\tfrac12+2=\tfrac94$。检查区间端点 $x=\pm 1$(此处 $y=0$):$f(1,0)=0$,$f(-1,0)=2$。
Compare.比较。 Candidates are $-\tfrac14$, $\tfrac94$, $0$, $2$. The absolute minimum is $-\tfrac14$ at $(\tfrac12,0)$ and the absolute maximum is $\tfrac94$ at $\left(-\tfrac12,\pm\tfrac{\sqrt3}{2}\right)$.候选值为 $-\tfrac14$、$\tfrac94$、$0$、$2$。全局极小值是 $-\tfrac14$,在 $(\tfrac12,0)$;全局极大值是 $\tfrac94$,在 $\left(-\tfrac12,\pm\tfrac{\sqrt3}{2}\right)$。
Going deeper: why a closed bounded region guarantees extrema深入探讨:为何闭有界区域能保证极值存在
The extreme value theorem in two variables rests on two topological facts. First, a closed and bounded subset of $\mathbb{R}^2$ is compact (the Heine-Borel theorem). Second, a continuous function maps a compact set to a compact set, and a compact subset of $\mathbb{R}$ is closed and bounded, hence contains its supremum and infimum.二元极值定理依赖两个拓扑事实。第一,$\mathbb{R}^2$ 的闭有界子集是紧的(海涅-博雷尔定理,Heine-Borel theorem)。第二,连续函数把紧集映为紧集,而 $\mathbb{R}$ 的紧子集是闭有界的,因而包含其上确界与下确界。
To see why both hypotheses are needed, drop one at a time. On the open disk $x^2+y^2<1$ the function $f=x$ has supremum $1$ but never attains it, because the boundary where $x=1$ is excluded. On the unbounded strip $0\le x$, $0\le y\le 1$ the function $f=x$ is continuous but has no maximum. Closedness supplies the boundary; boundedness keeps the values from escaping to infinity. Remove either and the guarantee fails.要看清为何两个前提都必要,可逐一去掉。在开圆盘 $x^2+y^2<1$ 上,函数 $f=x$ 的上确界为 $1$ 却永不达到,因为 $x=1$ 的边界被排除。在无界带形 $0\le x$、$0\le y\le 1$ 上,$f=x$ 连续却无极大值。闭性提供边界;有界性使函数值不致逃向无穷。去掉任一个,保证就失效。
This is exactly why the closed-region procedure must scan the entire boundary, not just interior critical points: the theorem promises the extremum exists somewhere on the compact set, and the boundary is precisely where it hides when no interior critical point wins.这正是闭区域流程必须扫描整个边界、而不仅是内部临界点的原因:定理保证极值在紧集的某处存在,而当没有内部临界点胜出时,它恰好藏在边界上。
Lagrange Multipliers: One Constraint拉格朗日乘数:单约束
constraint)$g=k$ 下求 $f$ 的极值,就寻找 $f$ 的梯度(gradient)与约束梯度平行的点。在条件极值处,$f$ 的等值线与约束曲线相切,故两者的梯度互为标量倍数。where the scalar $\lambda$ is the Lagrange multiplier. In two variables this is the system $f_x=\lambda g_x$, $f_y=\lambda g_y$, $g(x,y)=k$, three equations in the three unknowns $x,y,\lambda$. The method requires $\nabla g\ne\mathbf{0}$ on the constraint set.其中标量 $\lambda$ 就是拉格朗日乘数(Lagrange multiplier)。在二元情形中,这是方程组 $f_x=\lambda g_x$、$f_y=\lambda g_y$、$g(x,y)=k$,即三个方程含三个未知量 $x,y,\lambda$。该方法要求在约束集上 $\nabla g\ne\mathbf{0}$。
Going deeper: why the gradients must be parallel深入探讨:为何两个梯度必须平行
Parametrize the constraint curve $g(x,y)=k$ as $\mathbf{r}(t)$. Along it, define $\phi(t)=f(\mathbf{r}(t))$. At a constrained extremum $\phi'(t_0)=0$. By the chain rule,将约束曲线 $g(x,y)=k$ 参数化为 $\mathbf{r}(t)$。沿其定义 $\phi(t)=f(\mathbf{r}(t))$。在条件极值处 $\phi'(t_0)=0$。由链式法则,
$$\phi'(t_0)=\nabla f\cdot \mathbf{r}'(t_0)=0,$$so $\nabla f$ is orthogonal to the tangent $\mathbf{r}'(t_0)$. But $\nabla g$ is also orthogonal to that tangent, since $g$ is constant along the curve and $\tfrac{d}{dt}g(\mathbf{r}(t))=\nabla g\cdot\mathbf{r}'=0$. In the plane, two vectors orthogonal to the same nonzero direction are parallel, hence故 $\nabla f$ 与切向量 $\mathbf{r}'(t_0)$ 正交。但 $\nabla g$ 也与该切向量正交,因为 $g$ 沿曲线恒定,且 $\tfrac{d}{dt}g(\mathbf{r}(t))=\nabla g\cdot\mathbf{r}'=0$。在平面上,与同一非零方向正交的两个向量必平行,因此
$$\nabla f=\lambda\,\nabla g.$$Worked Example 4.1: extremize on a circle例题 4.1:在圆上求极值
Find the extreme values of $f(x,y)=xy$ subject to $x^2+y^2=8$.求 $f(x,y)=xy$ 在约束 $x^2+y^2=8$ 下的极值。
Here $g=x^2+y^2$, so $\nabla f=(y,x)$ and $\nabla g=(2x,2y)$. The Lagrange system is此处 $g=x^2+y^2$,故 $\nabla f=(y,x)$,$\nabla g=(2x,2y)$。拉格朗日方程组为
$$y=2\lambda x,\qquad x=2\lambda y,\qquad x^2+y^2=8.$$Substituting the first into the second gives $x=2\lambda(2\lambda x)=4\lambda^2 x$, so $x=0$ or $\lambda^2=\tfrac14$. If $x=0$ then $y=0$, which violates the constraint, so $\lambda=\pm\tfrac12$, giving $y=\pm x$.把第一式代入第二式得 $x=2\lambda(2\lambda x)=4\lambda^2 x$,故 $x=0$ 或 $\lambda^2=\tfrac14$。若 $x=0$ 则 $y=0$,这违反约束,故 $\lambda=\pm\tfrac12$,得 $y=\pm x$。
With $y=x$: $2x^2=8$, so $x=\pm 2$ and $f=4$. With $y=-x$: $f=-4$. The maximum is $4$ and the minimum is $-4$.当 $y=x$:$2x^2=8$,故 $x=\pm 2$,$f=4$。当 $y=-x$:$f=-4$。极大值为 $4$,极小值为 $-4$。
Worked Example 4.2: maximize a product under a sum constraint例题 4.2:在和约束下最大化乘积
Maximize $f(x,y,z)=xyz$ subject to $x+y+z=12$ with $x,y,z>0$.在 $x+y+z=12$ 且 $x,y,z>0$ 下最大化 $f(x,y,z)=xyz$。
Here $\nabla f=(yz,xz,xy)$ and $\nabla g=(1,1,1)$. The Lagrange system is此处 $\nabla f=(yz,xz,xy)$,$\nabla g=(1,1,1)$。拉格朗日方程组为
$$yz=\lambda,\qquad xz=\lambda,\qquad xy=\lambda,\qquad x+y+z=12.$$From $yz=xz$ and $z>0$ we get $x=y$. From $xz=xy$ and $x>0$ we get $y=z$. Hence $x=y=z$, and the constraint gives $3x=12$, so $x=y=z=4$. The maximum is由 $yz=xz$ 与 $z>0$ 得 $x=y$。由 $xz=xy$ 与 $x>0$ 得 $y=z$。故 $x=y=z$,代入约束得 $3x=12$,所以 $x=y=z=4$。极大值为
$$f(4,4,4)=64.$$This is the three-variable AM-GM inequality in disguise: among positive numbers with a fixed sum, the product is largest when they are equal.这其实是三元算术-几何平均不等式(AM-GM inequality)的化身:在和固定的正数中,乘积在它们相等时最大。
Worked Example 4.3: an inequality proved by Lagrange例题 4.3:用拉格朗日法证明一个不等式
Find the extreme values of $f(x,y)=2x+y$ on the ellipse $x^2+4y^2=1$, and read off the resulting inequality.求 $f(x,y)=2x+y$ 在椭圆 $x^2+4y^2=1$ 上的极值,并由此读出相应的不等式。
With $g=x^2+4y^2$, $\nabla f=(2,1)$ and $\nabla g=(2x,8y)$. The system is取 $g=x^2+4y^2$,$\nabla f=(2,1)$,$\nabla g=(2x,8y)$。方程组为
$$2=2\lambda x,\qquad 1=8\lambda y,\qquad x^2+4y^2=1.$$So $x=\tfrac{1}{\lambda}$ and $y=\tfrac{1}{8\lambda}$. Substituting into the constraint, $\tfrac{1}{\lambda^2}+4\cdot\tfrac{1}{64\lambda^2}=1$, that is $\tfrac{1}{\lambda^2}\left(1+\tfrac{1}{16}\right)=1$, so $\lambda^2=\tfrac{17}{16}$ and $\lambda=\pm\tfrac{\sqrt{17}}{4}$.于是 $x=\tfrac{1}{\lambda}$,$y=\tfrac{1}{8\lambda}$。代入约束得 $\tfrac{1}{\lambda^2}+4\cdot\tfrac{1}{64\lambda^2}=1$,即 $\tfrac{1}{\lambda^2}\left(1+\tfrac{1}{16}\right)=1$,故 $\lambda^2=\tfrac{17}{16}$,$\lambda=\pm\tfrac{\sqrt{17}}{4}$。
Then $f=2x+y=\tfrac{2}{\lambda}+\tfrac{1}{8\lambda}=\tfrac{17}{8\lambda}=\pm\tfrac{17}{8}\cdot\tfrac{4}{\sqrt{17}}=\pm\tfrac{\sqrt{17}}{2}$. So the maximum is $\tfrac{\sqrt{17}}{2}$ and the minimum is $-\tfrac{\sqrt{17}}{2}$. Equivalently $|2x+y|\le \tfrac{\sqrt{17}}{2}$ for every point on the ellipse, a sharp bound attained at the two tangency points.则 $f=2x+y=\tfrac{2}{\lambda}+\tfrac{1}{8\lambda}=\tfrac{17}{8\lambda}=\pm\tfrac{17}{8}\cdot\tfrac{4}{\sqrt{17}}=\pm\tfrac{\sqrt{17}}{2}$。故极大值为 $\tfrac{\sqrt{17}}{2}$,极小值为 $-\tfrac{\sqrt{17}}{2}$。等价地,对椭圆上每个点都有 $|2x+y|\le \tfrac{\sqrt{17}}{2}$,这是在两个切点处达到的紧界。
Lagrange Multipliers: Two Constraints拉格朗日乘数:双约束
In three variables this gives five equations ($f_x=\lambda g_x+\mu h_x$ and its $y,z$ analogues, plus the two constraints) in the five unknowns $x,y,z,\lambda,\mu$. The method requires $\nabla g$ and $\nabla h$ to be linearly independent along the intersection curve.在三元情形中,这给出五个方程($f_x=\lambda g_x+\mu h_x$ 及其 $y,z$ 类比,再加上两个约束),含五个未知量 $x,y,z,\lambda,\mu$。该方法要求 $\nabla g$ 与 $\nabla h$ 沿交线线性无关。
The geometry: the intersection curve has tangent direction $\nabla g\times\nabla h$. At an extremum $\nabla f$ must be orthogonal to this tangent, which forces $\nabla f$ into the span of $\nabla g$ and $\nabla h$.几何上:交线的切方向为 $\nabla g\times\nabla h$。在极值处 $\nabla f$ 须与该切向量正交,这迫使 $\nabla f$ 落入 $\nabla g$ 与 $\nabla h$ 张成的空间内。
Worked Example 5.1: two planes constraint例题 5.1:两平面约束
Maximize $f(x,y,z)=x+2y+3z$ on the curve where the plane $x+y+z=1$ meets the cylinder constraint... here take the second constraint $x-y+z=0$, the intersection line of two planes.在平面 $x+y+z=1$ 与第二个约束相交的曲线上最大化 $f(x,y,z)=x+2y+3z$,这里取第二个约束 $x-y+z=0$,即两平面的交线。
Gradients: $\nabla f=(1,2,3)$, $\nabla g=(1,1,1)$, $\nabla h=(1,-1,1)$. The condition $\nabla f=\lambda\nabla g+\mu\nabla h$ gives the component equations各梯度:$\nabla f=(1,2,3)$,$\nabla g=(1,1,1)$,$\nabla h=(1,-1,1)$。条件 $\nabla f=\lambda\nabla g+\mu\nabla h$ 给出分量方程
$$1=\lambda+\mu,\qquad 2=\lambda-\mu,\qquad 3=\lambda+\mu.$$The first and third equations conflict ($1\ne 3$), so $\nabla f$ does not lie in the span of $\nabla g,\nabla h$ at any point. Geometrically $f$ is linear and the feasible set is a line, so $f$ is unbounded along it and has no finite extremum. This illustrates that the Lagrange system being inconsistent flags the absence of a constrained extremum.第一与第三个方程冲突($1\ne 3$),故 $\nabla f$ 在任何点都不落在 $\nabla g,\nabla h$ 张成的空间内。几何上 $f$ 是线性的而可行集是一条直线,所以 $f$ 沿其无界、没有有限极值。这说明拉格朗日方程组无解正是条件极值不存在的标志。
Worked Example 5.2: plane meets sphere例题 5.2:平面与球面相交
Find the extreme values of $f(x,y,z)=z$ on the intersection of the plane $g=x+y+z=0$ and the sphere $h=x^2+y^2+z^2=1$.求 $f(x,y,z)=z$ 在平面 $g=x+y+z=0$ 与球面 $h=x^2+y^2+z^2=1$ 的交线上的极值。
Gradients: $\nabla f=(0,0,1)$, $\nabla g=(1,1,1)$, $\nabla h=(2x,2y,2z)$. The system $\nabla f=\lambda\nabla g+\mu\nabla h$ reads各梯度:$\nabla f=(0,0,1)$,$\nabla g=(1,1,1)$,$\nabla h=(2x,2y,2z)$。方程组 $\nabla f=\lambda\nabla g+\mu\nabla h$ 写作
$$0=\lambda+2\mu x,\quad 0=\lambda+2\mu y,\quad 1=\lambda+2\mu z.$$Subtracting the first two gives $2\mu(x-y)=0$, so $\mu=0$ (impossible, since then $\lambda=0$ and $1=0$) or $x=y$. With $x=y$, the plane gives $z=-2x$. The sphere gives $x^2+x^2+4x^2=1$, so $x^2=\tfrac16$, $x=\pm\tfrac{1}{\sqrt6}$. Then $z=-2x=\mp\tfrac{2}{\sqrt6}$.前两式相减得 $2\mu(x-y)=0$,故 $\mu=0$(不可能,因为这时 $\lambda=0$ 且 $1=0$)或 $x=y$。当 $x=y$,由平面得 $z=-2x$。由球面得 $x^2+x^2+4x^2=1$,故 $x^2=\tfrac16$,$x=\pm\tfrac{1}{\sqrt6}$。于是 $z=-2x=\mp\tfrac{2}{\sqrt6}$。
$$z_{\max}=\frac{2}{\sqrt6},\qquad z_{\min}=-\frac{2}{\sqrt6}.$$Worked Example 5.3: nearest point on a line in space例题 5.3:空间直线上离原点最近的点
Find the point on the line of intersection of the planes $x+y+z=1$ and $x-y+2z=2$ that is closest to the origin.求平面 $x+y+z=1$ 与 $x-y+2z=2$ 交线上离原点最近的点。
Minimize $f=x^2+y^2+z^2$ subject to $g=x+y+z=1$ and $h=x-y+2z=2$. With $\nabla f=(2x,2y,2z)$, $\nabla g=(1,1,1)$, $\nabla h=(1,-1,2)$, the condition $\nabla f=\lambda\nabla g+\mu\nabla h$ gives在 $g=x+y+z=1$ 与 $h=x-y+2z=2$ 下最小化 $f=x^2+y^2+z^2$。取 $\nabla f=(2x,2y,2z)$,$\nabla g=(1,1,1)$,$\nabla h=(1,-1,2)$,条件 $\nabla f=\lambda\nabla g+\mu\nabla h$ 给出
$$2x=\lambda+\mu,\qquad 2y=\lambda-\mu,\qquad 2z=\lambda+2\mu.$$So $x=\tfrac{\lambda+\mu}{2}$, $y=\tfrac{\lambda-\mu}{2}$, $z=\tfrac{\lambda+2\mu}{2}$. Impose the two constraints. The sum $x+y+z=1$ gives $\tfrac{3\lambda+2\mu}{2}=1$, that is $3\lambda+2\mu=2$. For the second, $x-y=\mu$ and $2z=\lambda+2\mu$, so $x-y+2z=\mu+(\lambda+2\mu)=\lambda+3\mu=2$.于是 $x=\tfrac{\lambda+\mu}{2}$,$y=\tfrac{\lambda-\mu}{2}$,$z=\tfrac{\lambda+2\mu}{2}$。代入两个约束。和 $x+y+z=1$ 给出 $\tfrac{3\lambda+2\mu}{2}=1$,即 $3\lambda+2\mu=2$。对第二个约束,$x-y=\mu$ 且 $2z=\lambda+2\mu$,故 $x-y+2z=\mu+(\lambda+2\mu)=\lambda+3\mu=2$。
Solve the linear system $3\lambda+2\mu=2$, $\lambda+3\mu=2$. From the second, $\lambda=2-3\mu$; substituting into the first, $3(2-3\mu)+2\mu=2$, so $6-7\mu=2$ and $\mu=\tfrac{4}{7}$. Then $\lambda=2-3\cdot\tfrac{4}{7}=\tfrac{2}{7}$. The closest point is解线性方程组 $3\lambda+2\mu=2$、$\lambda+3\mu=2$。由第二式 $\lambda=2-3\mu$;代入第一式得 $3(2-3\mu)+2\mu=2$,故 $6-7\mu=2$,$\mu=\tfrac{4}{7}$。于是 $\lambda=2-3\cdot\tfrac{4}{7}=\tfrac{2}{7}$。最近点为
$$x=\tfrac{\lambda+\mu}{2}=\tfrac{3}{7},\quad y=\tfrac{\lambda-\mu}{2}=-\tfrac{1}{7},\quad z=\tfrac{\lambda+2\mu}{2}=\tfrac{5}{7}.$$A quick check: both $x+y+z=\tfrac{3-1+5}{7}=1$ and $x-y+2z=\tfrac{3+1+10}{7}=2$ hold, confirming the solution sits on the line.快速验证:$x+y+z=\tfrac{3-1+5}{7}=1$ 与 $x-y+2z=\tfrac{3+1+10}{7}=2$ 都成立,确认该解落在直线上。
Going deeper: why the gradient lies in the span of the constraint gradients深入探讨:为何梯度落在约束梯度张成的空间内
Suppose $g$ and $h$ are smooth with $\nabla g$ and $\nabla h$ linearly independent at a point $\mathbf{x}_0$ on the intersection curve $C=\{g=k_1,\ h=k_2\}$. By the implicit function theorem $C$ is a smooth curve near $\mathbf{x}_0$, and its tangent direction is $\mathbf{T}=\nabla g\times\nabla h$, since $\mathbf{T}$ is orthogonal to both surface normals.设 $g$ 与 $h$ 光滑,且在交线 $C=\{g=k_1,\ h=k_2\}$ 上的点 $\mathbf{x}_0$ 处 $\nabla g$ 与 $\nabla h$ 线性无关。由隐函数定理(implicit function theorem),$C$ 在 $\mathbf{x}_0$ 附近是光滑曲线,其切方向为 $\mathbf{T}=\nabla g\times\nabla h$,因为 $\mathbf{T}$ 与两个曲面法向量都正交。
Parametrize $C$ as $\mathbf{r}(t)$ with $\mathbf{r}(t_0)=\mathbf{x}_0$, and set $\phi(t)=f(\mathbf{r}(t))$. At a constrained extremum, $\phi'(t_0)=\nabla f\cdot\mathbf{r}'(t_0)=0$, so $\nabla f\perp\mathbf{T}$.将 $C$ 参数化为 $\mathbf{r}(t)$,其中 $\mathbf{r}(t_0)=\mathbf{x}_0$,并令 $\phi(t)=f(\mathbf{r}(t))$。在条件极值处 $\phi'(t_0)=\nabla f\cdot\mathbf{r}'(t_0)=0$,故 $\nabla f\perp\mathbf{T}$。
Now $\{\nabla g,\nabla h,\mathbf{T}\}$ is an orthogonal-style spanning set for $\mathbb{R}^3$: $\mathbf{T}$ is perpendicular to the plane spanned by $\nabla g$ and $\nabla h$. Any vector perpendicular to $\mathbf{T}$ must therefore lie in that plane. Since $\nabla f\perp\mathbf{T}$, we conclude $\nabla f=\lambda\nabla g+\mu\nabla h$ for some scalars. The independence of $\nabla g,\nabla h$ is exactly what guarantees the plane is two-dimensional and the multipliers are determined.现在 $\{\nabla g,\nabla h,\mathbf{T}\}$ 是 $\mathbb{R}^3$ 的一组正交式张成集:$\mathbf{T}$ 垂直于 $\nabla g$ 与 $\nabla h$ 张成的平面。因此任何垂直于 $\mathbf{T}$ 的向量都必落在该平面内。由 $\nabla f\perp\mathbf{T}$,我们得出存在某些标量使 $\nabla f=\lambda\nabla g+\mu\nabla h$。$\nabla g,\nabla h$ 的线性无关恰好保证该平面是二维的,且乘数被唯一确定。
Applications应用
where $f^{*}(k)$ is the optimal value of $f$ when the constraint is $g=k$. Thus $\lambda$ measures how much the optimum improves per unit relaxation of the constraint, the shadow price in economics.其中 $f^{*}(k)$ 是当约束为 $g=k$ 时 $f$ 的最优值。因此 $\lambda$ 度量约束每放松一个单位时最优值的改善量,即经济学中的影子价格(shadow price)。
Worked Example 6.1: minimal-surface box例题 6.1:最小表面积的盒子
Find the dimensions of the closed rectangular box of volume $V=32$ with minimum surface area.求体积 $V=32$ 的封闭长方体盒子中表面积最小者的尺寸。
Minimize $S=2(xy+yz+zx)$ subject to $g=xyz=32$. With $\nabla S=\lambda\nabla g$:在 $g=xyz=32$ 下最小化 $S=2(xy+yz+zx)$。由 $\nabla S=\lambda\nabla g$:
$$2(y+z)=\lambda yz,\quad 2(x+z)=\lambda xz,\quad 2(x+y)=\lambda xy.$$Multiply the first by $x$, the second by $y$, the third by $z$, and compare; subtracting pairs forces $x=y=z$ by symmetry. Then $x^3=32$... here adjust to a clean cube: with $V=27$, $x=y=z=3$. For $V=32$ the optimum is the cube $x=y=z=32^{1/3}$, the expected result that the cube minimizes surface area for fixed volume.第一式乘 $x$、第二式乘 $y$、第三式乘 $z$ 后比较;逐对相减由对称性迫使 $x=y=z$。于是 $x^3=32$……这里调整为整洁的立方体:当 $V=27$ 时 $x=y=z=3$。对 $V=32$,最优解是立方体 $x=y=z=32^{1/3}$,正是预期结果:体积固定时立方体的表面积最小。
Worked Example 6.2: closest point on a plane例题 6.2:平面上最近的点
Find the point on the plane $x+2y+3z=6$ closest to the origin.求平面 $x+2y+3z=6$ 上离原点最近的点。
Minimize $f=x^2+y^2+z^2$ subject to $g=x+2y+3z=6$. The condition $\nabla f=\lambda\nabla g$ gives $2x=\lambda$, $2y=2\lambda$, $2z=3\lambda$, so $x=\tfrac{\lambda}{2}$, $y=\lambda$, $z=\tfrac{3\lambda}{2}$.在 $g=x+2y+3z=6$ 下最小化 $f=x^2+y^2+z^2$。条件 $\nabla f=\lambda\nabla g$ 给出 $2x=\lambda$、$2y=2\lambda$、$2z=3\lambda$,故 $x=\tfrac{\lambda}{2}$、$y=\lambda$、$z=\tfrac{3\lambda}{2}$。
Substituting into the constraint: $\tfrac{\lambda}{2}+2\lambda+\tfrac{9\lambda}{2}=6$, that is $7\lambda=6$, so $\lambda=\tfrac{6}{7}$. The closest point is代入约束:$\tfrac{\lambda}{2}+2\lambda+\tfrac{9\lambda}{2}=6$,即 $7\lambda=6$,故 $\lambda=\tfrac{6}{7}$。最近点为
$$\left(\tfrac{3}{7},\tfrac{6}{7},\tfrac{9}{7}\right),\qquad \text{distance}=\frac{6}{\sqrt{14}}.$$Worked Example 6.3: maximum-volume box with fixed surface area例题 6.3:表面积固定下体积最大的盒子
An open-top rectangular box (no lid) is to be built from $12$ square metres of material. Find the dimensions that maximize the volume.用 $12$ 平方米材料制作一个无盖的长方体盒子。求使体积最大的尺寸。
Let the base be $x$ by $y$ and the height $z$. The volume is $V=xyz$. An open-top box has base area $xy$ and four sides of total area $2xz+2yz$, so the surface constraint is设底面为 $x$ 乘 $y$,高为 $z$。体积为 $V=xyz$。无盖盒子的底面积为 $xy$,四个侧面总面积为 $2xz+2yz$,故表面约束为
$$g=xy+2xz+2yz=12.$$The system $\nabla V=\lambda\nabla g$ is $yz=\lambda(y+2z)$, $xz=\lambda(x+2z)$, $xy=\lambda(2x+2y)$. Subtracting the first two, $z(y-x)=\lambda(y-x)$, so $x=y$ (the other branch $z=\lambda$ leads back to the same place). With $x=y$ the equations reduce, and one finds $x=y=2z$. Substituting into the constraint, $4z^2+4z^2+4z^2=12$, that is $12z^2=12$, so $z=1$, $x=y=2$.方程组 $\nabla V=\lambda\nabla g$ 为 $yz=\lambda(y+2z)$、$xz=\lambda(x+2z)$、$xy=\lambda(2x+2y)$。前两式相减得 $z(y-x)=\lambda(y-x)$,故 $x=y$(另一支 $z=\lambda$ 仍回到同一处)。当 $x=y$,方程简化,可得 $x=y=2z$。代入约束 $4z^2+4z^2+4z^2=12$,即 $12z^2=12$,故 $z=1$,$x=y=2$。
The maximal volume is $V=2\cdot 2\cdot 1=4$ cubic metres. Note the base is square and twice the height, the standard shape for an open box.最大体积为 $V=2\cdot 2\cdot 1=4$ 立方米。注意底面是正方形且为高的两倍,这是无盖盒子的标准形状。
Worked Example 6.4: maximize utility on a budget (economics)例题 6.4:预算下最大化效用(经济学)
A consumer maximizes the Cobb-Douglas utility $U(x,y)=x^{1/2}y^{1/2}$ subject to the budget $p_x x+p_y y=I$, with prices $p_x=2$, $p_y=4$ and income $I=80$.一位消费者在预算 $p_x x+p_y y=I$ 下最大化柯布-道格拉斯效用(Cobb-Douglas utility)$U(x,y)=x^{1/2}y^{1/2}$,价格 $p_x=2$、$p_y=4$,收入 $I=80$。
The condition $\nabla U=\lambda\nabla g$ with $g=2x+4y$ gives取 $g=2x+4y$,条件 $\nabla U=\lambda\nabla g$ 给出
$$\tfrac12 x^{-1/2}y^{1/2}=2\lambda,\qquad \tfrac12 x^{1/2}y^{-1/2}=4\lambda.$$Dividing the first by the second eliminates $\lambda$: $\dfrac{y}{x}=\dfrac{2}{4}=\dfrac12$, so $x=2y$. The budget $2(2y)+4y=80$ gives $8y=80$, so $y=10$ and $x=20$. The optimal bundle is $(20,10)$ with utility $U=\sqrt{200}=10\sqrt2$.第一式除以第二式消去 $\lambda$:$\dfrac{y}{x}=\dfrac{2}{4}=\dfrac12$,故 $x=2y$。预算 $2(2y)+4y=80$ 给出 $8y=80$,于是 $y=10$、$x=20$。最优组合为 $(20,10)$,效用 $U=\sqrt{200}=10\sqrt2$。
The multiplier here is the marginal utility of income: from the second equation $\lambda=\tfrac{1}{8}x^{1/2}y^{-1/2}=\tfrac{1}{8}\sqrt{20/10}=\tfrac{\sqrt2}{8}$, the rate at which the maximal utility rises per extra dollar of income.这里的乘数就是收入的边际效用(marginal utility of income):由第二式 $\lambda=\tfrac{1}{8}x^{1/2}y^{-1/2}=\tfrac{1}{8}\sqrt{20/10}=\tfrac{\sqrt2}{8}$,即收入每增加一美元时最大效用的上升率。
Going deeper: deriving the shadow-price interpretation of $\lambda$深入探讨:推导 $\lambda$ 的影子价格诠释
Let $\mathbf{x}^{*}(k)$ be the optimizer of $f$ subject to $g(\mathbf{x})=k$, and let $f^{*}(k)=f(\mathbf{x}^{*}(k))$ be the optimal value as the constraint level $k$ varies. Differentiate $f^{*}$ using the chain rule:设 $\mathbf{x}^{*}(k)$ 是 $f$ 在 $g(\mathbf{x})=k$ 下的最优解,$f^{*}(k)=f(\mathbf{x}^{*}(k))$ 是约束水平 $k$ 变化时的最优值。用链式法则对 $f^{*}$ 求导:
$$\frac{df^{*}}{dk}=\nabla f(\mathbf{x}^{*})\cdot\frac{d\mathbf{x}^{*}}{dk}.$$At the optimizer the Lagrange condition gives $\nabla f=\lambda\nabla g$, so在最优解处,拉格朗日条件给出 $\nabla f=\lambda\nabla g$,故
$$\frac{df^{*}}{dk}=\lambda\,\nabla g(\mathbf{x}^{*})\cdot\frac{d\mathbf{x}^{*}}{dk}.$$Now differentiate the constraint identity $g(\mathbf{x}^{*}(k))=k$ with respect to $k$: the left side is $\nabla g\cdot\tfrac{d\mathbf{x}^{*}}{dk}$ and the right side is $1$. Substituting,现在对约束恒等式 $g(\mathbf{x}^{*}(k))=k$ 关于 $k$ 求导:左边是 $\nabla g\cdot\tfrac{d\mathbf{x}^{*}}{dk}$,右边是 $1$。代入得
$$\frac{df^{*}}{dk}=\lambda\cdot 1=\lambda.$$So the multiplier is exactly the sensitivity of the optimal value to a unit relaxation of the constraint, the shadow price. In Worked Example 6.4 this is the marginal utility of income; in a production problem it is the marginal value of one more unit of a scarce resource.所以乘数恰是最优值对约束放松一个单位的灵敏度,即影子价格。在例题 6.4 中它是收入的边际效用;在生产问题中它是多一单位稀缺资源的边际价值。
Going Deeper深入探讨
bordered Hessian)和二次型理论统一起来。退化情形($D=0$,或约束梯度为零)需要更高阶的分析或直接论证。A constrained critical point is a local maximum when $\bar H>0$ and a local minimum when $\bar H<0$, the sign convention being opposite to the unconstrained test because the border row encodes the tangency.当 $\bar H>0$ 时条件临界点是局部极大值,当 $\bar H<0$ 时是局部极小值,其符号约定与无约束判别法相反,因为加边的那一行编码了相切关系。
When the second derivatives test fails ($D=0$) the quadratic terms do not determine the shape and one must inspect $f$ along curves through the point. The function $f(x,y)=x^4+y^4$ has $D=0$ at the origin yet a clear minimum, while $f(x,y)=x^3$ has $D=0$ and no extremum, showing both outcomes are possible.当二阶导数判别法失效($D=0$)时,二次项无法决定形状,必须沿过该点的曲线考察 $f$。函数 $f(x,y)=x^4+y^4$ 在原点 $D=0$ 却有明确的极小值,而 $f(x,y)=x^3$ 在 $D=0$ 时没有极值,说明两种结果都可能出现。
Going deeper: a degenerate critical point深入探讨:一个退化的临界点
Consider $f(x,y)=x^2-y^4$ at the origin. The first partials $f_x=2x$, $f_y=-4y^3$ both vanish at $(0,0)$, so it is critical. The second partials give $f_{xx}=2$, $f_{yy}=-12y^2=0$, $f_{xy}=0$ at the origin, hence考察 $f(x,y)=x^2-y^4$ 在原点处。一阶偏导数 $f_x=2x$、$f_y=-4y^3$ 在 $(0,0)$ 处都为零,故为临界点。二阶偏导数在原点给出 $f_{xx}=2$、$f_{yy}=-12y^2=0$、$f_{xy}=0$,因此
$$D=(2)(0)-0^2=0,$$and the test is inconclusive. Inspect directly: along the $x$-axis $f=x^2\ge 0$ rises, while along the $y$-axis $f=-y^4\le 0$ falls. Since $f$ takes both signs arbitrarily near the origin, the origin is a saddle even though $D=0$. The lesson is that a zero discriminant demands a direct path analysis.判别法无结论。直接考察:沿 $x$ 轴 $f=x^2\ge 0$ 上升,而沿 $y$ 轴 $f=-y^4\le 0$ 下降。由于 $f$ 在原点任意附近都取两种符号,尽管 $D=0$,原点仍是鞍点。教训是:判别式为零就要求直接的路径分析。
Worked Example 7.1: when Lagrange misses a corner例题 7.1:拉格朗日法漏掉尖点之时
Extremize $f(x,y)=x$ on the constraint $g=x^3-y^2=0$. The gradient $\nabla g=(3x^2,-2y)$ vanishes at the origin $(0,0)$, which lies on the curve. The Lagrange condition $\nabla f=\lambda\nabla g$ gives $1=3\lambda x^2$ and $0=-2\lambda y$, which has no solution at the cusp because $\nabla g=\mathbf{0}$ there.在约束 $g=x^3-y^2=0$ 上求 $f(x,y)=x$ 的极值。梯度 $\nabla g=(3x^2,-2y)$ 在原点 $(0,0)$ 处为零,而原点在曲线上。拉格朗日条件 $\nabla f=\lambda\nabla g$ 给出 $1=3\lambda x^2$ 与 $0=-2\lambda y$,在尖点处无解,因为那里 $\nabla g=\mathbf{0}$。
Yet the curve has a cusp at the origin and $x\ge 0$ along it (since $x^3=y^2\ge 0$), so $x=0$ is the constrained minimum, attained at the very point where $\nabla g=\mathbf{0}$. This shows the Lagrange method can miss extrema where the constraint gradient degenerates, so such points must be checked separately.然而曲线在原点有一个尖点,且沿曲线 $x\ge 0$(因为 $x^3=y^2\ge 0$),所以 $x=0$ 是条件极小值,恰好在 $\nabla g=\mathbf{0}$ 的那一点达到。这说明拉格朗日法会漏掉约束梯度退化处的极值,因此这类点必须单独检查。
Worked Example 7.2: resolving a degenerate test by path analysis例题 7.2:用路径分析解决退化判别
The origin is a critical point of $f(x,y)=x^2+y^4$ and also of $h(x,y)=x^2-y^4$. Both have $D=0$ there, since $f_{yy}=12y^2=0$ and $h_{yy}=-12y^2=0$ at the origin. The discriminant cannot tell them apart, so analyse each directly.原点既是 $f(x,y)=x^2+y^4$ 的临界点,也是 $h(x,y)=x^2-y^4$ 的临界点。两者在那里都有 $D=0$,因为原点处 $f_{yy}=12y^2=0$、$h_{yy}=-12y^2=0$。判别式无法区分它们,故对每个直接分析。
For $f=x^2+y^4$: every term is nonnegative and $f(0,0)=0$, so $f\ge 0$ with equality only at the origin. The origin is a strict local (indeed global) minimum.对 $f=x^2+y^4$:每一项都非负且 $f(0,0)=0$,故 $f\ge 0$,仅在原点取等。原点是严格的局部(实为全局)极小值。
For $h=x^2-y^4$: along the $x$-axis $h=x^2\ge 0$, but along the $y$-axis $h=-y^4\le 0$. The function takes both signs in every neighbourhood of the origin, so the origin is a saddle. Identical second-order data, opposite conclusions: this is the precise sense in which $D=0$ carries no information.对 $h=x^2-y^4$:沿 $x$ 轴 $h=x^2\ge 0$,但沿 $y$ 轴 $h=-y^4\le 0$。函数在原点的每个邻域内都取两种符号,故原点是鞍点。二阶数据完全相同,结论却相反:这正是 $D=0$ 不携带任何信息的确切含义。
Worked Example 7.3: confirming a constrained max with the bordered Hessian例题 7.3:用加边黑塞矩阵确认条件极大值
Maximize $f(x,y)=xy$ subject to $g=x+y=10$, and verify the candidate is a maximum using the bordered Hessian.在 $g=x+y=10$ 下最大化 $f(x,y)=xy$,并用加边黑塞矩阵验证该候选是极大值。
Lagrange: $\nabla f=(y,x)$, $\nabla g=(1,1)$, so $y=\lambda$, $x=\lambda$, and $x+y=10$ gives $x=y=5$, $\lambda=5$, with $f=25$. Now build the bordered Hessian. Here $g$ is linear so $g_{xx}=g_{yy}=g_{xy}=0$, and $f_{xx}=0$, $f_{yy}=0$, $f_{xy}=1$. With $g_x=g_y=1$,拉格朗日:$\nabla f=(y,x)$,$\nabla g=(1,1)$,故 $y=\lambda$、$x=\lambda$,而 $x+y=10$ 给出 $x=y=5$、$\lambda=5$,$f=25$。现在构造加边黑塞矩阵。此处 $g$ 是线性的,故 $g_{xx}=g_{yy}=g_{xy}=0$,且 $f_{xx}=0$、$f_{yy}=0$、$f_{xy}=1$。由 $g_x=g_y=1$,
$$\bar H=\det\begin{pmatrix}0&1&1\\ 1&0&1\\ 1&1&0\end{pmatrix}.$$Expanding along the first row: $\bar H=0\cdot(0-1)-1\cdot(0-1)+1\cdot(1-0)=0+1+1=2>0$. By the sign convention ($\bar H>0$ means a constrained local maximum), $(5,5)$ is a maximum, confirming the obvious result that the product of two numbers with fixed sum is largest when they are equal.沿第一行展开:$\bar H=0\cdot(0-1)-1\cdot(0-1)+1\cdot(1-0)=0+1+1=2>0$。按符号约定($\bar H>0$ 表示条件局部极大值),$(5,5)$ 是极大值,验证了显然的结论:和固定的两数之积在相等时最大。
Flashcards记忆卡片
Unit Quiz单元测验
Readiness Checklist备考清单
Tap each item you can do without notes. 点击你无需参考资料即可完成的项目。0 / 8 mastered0 / 8 已掌握
- Find all critical points of a function of two variables by solving $\nabla f=\mathbf{0}$.通过求解 $\nabla f=\mathbf{0}$ 找出二元函数的所有临界点。
- Compute the discriminant $D$ and classify a critical point as a max, min, or saddle.计算判别式 $D$ 并把临界点判别为极大、极小或鞍点。
- Explain why the second derivatives test is inconclusive when $D=0$ and analyze such a point directly.解释为何 $D=0$ 时二阶导数判别法无结论,并直接分析这类点。
- Find the absolute extrema of a continuous function on a closed bounded region.求连续函数在闭有界区域上的全局极值。
- Set up and solve the Lagrange system $\nabla f=\lambda\nabla g$ for a single constraint.为单约束建立并求解拉格朗日方程组 $\nabla f=\lambda\nabla g$。
- Set up the two-constraint system $\nabla f=\lambda\nabla g+\mu\nabla h$ and solve it.建立双约束方程组 $\nabla f=\lambda\nabla g+\mu\nabla h$ 并求解。
- Interpret the Lagrange multiplier as the sensitivity of the optimal value to the constraint.把拉格朗日乘数诠释为最优值对约束的灵敏度。
- Recognize degenerate cases where $\nabla g=\mathbf{0}$ and check those points separately.识别 $\nabla g=\mathbf{0}$ 的退化情形并单独检查这些点。