抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

定义

A function f:RnRf: \mathbf{R}^n \rightarrow \mathbf{R} is convex if domf\operatorname{dom} f is a convex set and if for all xx, ydomfy \in \operatorname{dom} f, and θ\theta with 0θ10 \leq \theta \leq 1, we have

f(θx+(1θ)y)θf(x)+(1θ)f(y)f(\theta x+(1-\theta) y) \leq \theta f(x)+(1-\theta) f(y)

严格下凸(strictly convex)函数:当 xyx\not=y,则不等号严格成立。

严格上凸(strictly concave)函数:f-f 严格下凸。

为什么定义域要求是凸集,因为 θx+(1θ)y\theta x+(1-\theta)y 永远在定义域内。

扩展

It is often convenient to extend a convex function to all of Rn\mathbf{R}^n by defining its value to be \infty outside its domain. If ff is convex we define its extended-value extension f~:RnR{}\tilde{f}: \mathbf{R}^n \rightarrow \mathbf{R} \cup\{\infty\} by

f~(x)={f(x)xdomfxdomf\tilde{f}(x)= \begin{cases}f(x) & x \in \operatorname{dom} f \\ \infty & x \notin \operatorname{dom} f\end{cases}

The extension f~\tilde{f} is defined on all Rn\mathbf{R}^n, and takes values in R{}\mathbf{R} \cup\{\infty\}. We can recover the domain of the original function ff from the extension f~\tilde{f} as domf={xf~(x)<\operatorname{dom} f=\{x \mid \tilde{f}(x)< }\infty\}.

The extension can simplify notation, since we do not need to explicitly describe the domain, or add the qualifier 'for all xdomfx \in \operatorname{dom} f ’ every time we refer to f(x)f(x). Consider, for example, the basic defining inequality (3.1). In terms of the extension f~\tilde{f}, we can express it as: for 0<θ<10<\theta<1

f~(θx+(1θ)y)θf~(x)+(1θ)f~(y)\tilde{f}(\theta x+(1-\theta) y) \leq \theta \tilde{f}(x)+(1-\theta) \tilde{f}(y)

示性函数 Indicator

I~C(x)={0xCxC\tilde{I}_C(x)= \begin{cases}0 & x \in C \\ \infty & x \notin C\end{cases}

扩展到比较小的数总有反例。

一阶条件

Suppose ff is differentiable (i.e., its gradient f\nabla f exists at each point in domf\operatorname{dom} f, which is open). Then ff is convex if and only if domf\operatorname{dom} f is convex and

f(y)f(x)+f(x)T(yx)f(y) \geq f(x)+\nabla f(x)^T(y-x)

holds for all x,ydomfx, y \in \operatorname{dom} f. This inequality is illustrated in figure 3.23.2.

如果 f(x)=0\nabla f(x)=0,则是最小值点。也可以是最优解集。

一阶条件和定义等价

一阶条件推定义

x,ydomfty+(1t)xdomft~y+(1t~)xdomf\forall x,y \in \operatorname{dom} f\\ ty+(1-t)x \in \operatorname{dom} f\\ \tilde{t}y+(1-\tilde{t})x \in \operatorname{dom} f

一阶条件:

f(ty+(1t)x)f(t~y+(1t~)x)+f(t~y+(1t~)x)T(ty+(1t)xt~y+(1t~)x)=f(t~y+(1t~)x)+f(t~y+(1t~)x)T(yx)(tt~)f(ty+(1-t)x) \ge f(\tilde{t}y+(1-\tilde{t})x)+\nabla f(\tilde{t}y+(1-\tilde{t})x)^T(ty+(1-t)x-\tilde{t}y+(1-\tilde{t})x)\\=f(\tilde{t}y+(1-\tilde{t})x)+\nabla f(\tilde{t}y+(1-\tilde{t})x)^T(y-x)(t- \tilde{t})

定义要求 f(ty+(1t)x)tf(y)+(1t)f(x)f(ty+(1-t)x) \ge tf(y)+(1-t)f(x)

定义 g(t)=f(ty+(1t)x),g(t~)=f(t~y+(1t~)x)g(t)=f(ty+(1-t)x),g(\tilde{t})=f(\tilde{t}y+(1-\tilde{t})x)

g(t~)=f(t~y+(1t~)x)T(yx)g'(\tilde{t})=\nabla f(\tilde{t}y+(1-\tilde{t})x)^T(y-x)

g(t)g(t~)+g(t~)(tt~)g(t)\ge g(\tilde{t})+g'(\tilde t)(t-\tilde{t})

f(x)f(x) 为凸。

二阶条件

f:RnRf:\R ^n\to \R 二阶可微,则 ff 为凸 \Leftrightarrow domf\operatorname{dom} f 为凸,2f(x)0xdomf\nabla^2 f(x) \succeq 0 \forall x \in \operatorname{dom} f

但是 ff 严格凸不能推出 2f(x)0\nabla^2 f(x) \succ 0,例如 f(x)=x4f(x)=x^4

二次函数

f:RnRf(x)=12xTPx+qTx+rf:\R^n \to \R\\ f(x)=\frac{1}{2} x^T P x + q^Tx+r

一般要求 PSn,qRn,rRP \in S^n,q\in \R^n ,r\in \R

使用二阶条件:

2f(x)=P\nabla^2 f(x)=P 只要求半正定。

例:f(x)=1x2f(x)=\frac{1}{x^2}f(x)=6x4>0f''(x)=6x^{-4}>0,但是定义域非凸。

仿射函数

f(x)=Ax+b,2f(x)=0f(x)=Ax+b,\nabla^2 f(x)=0

指数函数

f(x)=eax,xRf(x)=e^{ax},x\in \R

f(x)=a2eaxf''(x)=a^2e^{ax}

幂函数

f(x)=xa,xR++f(x)=x^a,x\in \R_{++}

2f(x)0,a1ora0;0,0a1\nabla^2 f(x) \ge 0 ,a\ge 1 or a\le0;\le 0 ,0\le a\le1

绝对值的幂函数

对数函数

f(x)=logx,xR++f(x)=\log x,x\in \R_{++}

f(x)=1x2<0f''(x)=-\frac{1}{x^2} <0

凹函数

负熵函数

f(x)=xlogx,xR++f(x)=x\log x,x\in \R_{++}

(x+logx)=logx+1,(xlogx)=1x>0(x+\log x)'=\log x+1,(x\log x)''=\frac{1}{x} >0

严格凸,凹凸性翻转

极小化负熵函数比较容易。

范数

Rn\R ^n 空间的范数 p(x),xRnp(x),x\in \R^n

满足三个性质:

  1. P(ax)=aP(x)P(ax)=|a|P(x)
  2. P(x+y)P(x)+P(y)P(x+y)\le P(x)+P(y)
  3. P(x)=0x=0P(x)=0 \Leftrightarrow x=0

用原始定义

x,yRn,0θ1\forall x,y\in \R^n,\forall 0\le \theta\le 1

P(θx+(1θ)y)P(θx)+P((1θ)y)=θP(x)+(1θ)P(y)P(\theta x+(1-\theta)y)\le P(\theta x)+P((1-\theta)y)=\theta P(x)+(1-\theta)P(y)

二范数:可以看做旋转抛物面。

零范数:

x0=||x||_0=非零元素个数

但是范数,也不是凸函数。

极大值函数

f(x)=max{x1,,xn},xRnf(x)=\max\{x_1,\cdots,x_n\},x\in \R^n

x,yRn,0θ1\forall x,y\in \R^n,\forall 0\le \theta\le 1

f(θx+(1θ)y)=max{θxi+(1θ)yi,i=1,,n}θmax{xi,i=1,,n}+(1θ)max{yi,i=1,,n}=θf(x)+(1θ)f(y)f(\theta x+(1-\theta)y)=\max\{\theta x_i+(1-\theta)y_i,i=1,\cdots,n\}\\ \le \theta \max\{x_i,i=1,\cdots,n\}+(1-\theta)\max\{y_i,i=1,\cdots,n\}\\ =\theta f(x)+(1-\theta)f(y)

等号成立:极大值下标相同。

极大极小问题

minxmaxyf(x,y)\min_x\max_y f(x,y)

log-sum-up 函数

f(x)=log(ex1++exn),xRnf(x)=\log(e^{x_1}+\cdots+e^{x_n}),x\in \R^n

max{x1,,xn}f(x)max{x1,,xn}+logn\max\{x_1,\cdots,x_n\}\le f(x)\le \max\{x_1,\cdots,x_n\}+\log n

等号成立当 x1==xnx_1=\cdots=x_n

逼近的函数也是凸函数。

Hessen 矩阵:

z=[ex1,,exn]Tz=[e^{x_1},\cdots,e^{x_n}]^T

H=1(1Tz)2((1Tz)diag(z)zzT)H=\frac{1}{(1^Tz)^2}((1^Tz)diag(z)-zz^T)

用定义。

VTKV=(1Tz)(VTdiag(z)V)VTZZTVV^TKV=(1^Tz)(V^T diag(z) V)-V^TZZ^TV

=(zi)(Vi2zi)(Vizi)2=(\sum z_i)(\sum V_i^2 z_i)-(\sum V_iz_i)^2

柯西不等式

几何平均

f(x)=(x1+x2++xn)1n,xR++nf(x)=(x_1+x_2+\cdots+x_n)^\frac{1}{n} ,x\in \R_{++}^n

取等……

行列式的对数

f(x)=logdetx,domf=S++nf(x)=\log\det x,\operatorname{dom} f=S_{++}^n

凹函数,可以把 xx 看成标量。

n=1n=1 时……

n>1n >1 时,

g(t)=f(z+tv)=logdet(z+tv)=logdet(z12(I+tz12vz12)z12)g(t)=f(z+tv)=\log\det(z+tv)=\log\det(z^\frac{1}{2}(I+tz^{-\frac{1}{2}}vz^\frac{1}{2})z^\frac{1}{2})

logdetz+log(1+tλi)\log\det z+\sum\log(1+t\lambda_i)

λi\lambda_i 表示 z12vz12z^{-\frac{1}{2}}vz^\frac{1}{2} 的特征值。

vQΛQT,QQT=Iv\Rightarrow Q\Lambda Q^T,QQ^T=I

det(I+tz12vz12)=det(QQT+tQΛQT)=det(Q(I+tΛ)QT)\det(I+tz^{-\frac{1}{2}}vz^{-\frac{1}{2}})=\det(QQ^T+tQ\Lambda Q^T)=\det(Q(I+t\Lambda)Q^T)

g(t)0g''(t)\le0

评论