凸函数 - StevenMengのBlog

定义

A function $f: \mathbf{R}^n \rightarrow \mathbf{R}$ is convex if $\operatorname{dom} f$ is a convex set and if for all $x$ , $y \in \operatorname{dom} f$ , and $\theta$ with $0 \leq \theta \leq 1$ , we have

f(\theta x+(1-\theta) y) \leq \theta f(x)+(1-\theta) f(y)

严格下凸(strictly convex)函数：当 $x\not=y$ ，则不等号严格成立。

严格上凸(strictly concave)函数： $-f$ 严格下凸。

为什么定义域要求是凸集，因为 $\theta x+(1-\theta)y$ 永远在定义域内。

扩展

It is often convenient to extend a convex function to all of $\mathbf{R}^n$ by defining its value to be $\infty$ outside its domain. If $f$ is convex we define its extended-value extension $\tilde{f}: \mathbf{R}^n \rightarrow \mathbf{R} \cup\{\infty\}$ by

\tilde{f}(x)= \begin{cases}f(x) & x \in \operatorname{dom} f \\ \infty & x \notin \operatorname{dom} f\end{cases}

The extension $\tilde{f}$ is defined on all $\mathbf{R}^n$ , and takes values in $\mathbf{R} \cup\{\infty\}$ . We can recover the domain of the original function $f$ from the extension $\tilde{f}$ as $\operatorname{dom} f=\{x \mid \tilde{f}(x)<$ $\infty\}$ .

The extension can simplify notation, since we do not need to explicitly describe the domain, or add the qualifier 'for all $x \in \operatorname{dom} f$ ’ every time we refer to $f(x)$ . Consider, for example, the basic defining inequality (3.1). In terms of the extension $\tilde{f}$ , we can express it as: for $0<\theta<1$

\tilde{f}(\theta x+(1-\theta) y) \leq \theta \tilde{f}(x)+(1-\theta) \tilde{f}(y)

示性函数 Indicator

\tilde{I}_C(x)= \begin{cases}0 & x \in C \\ \infty & x \notin C\end{cases}

扩展到比较小的数总有反例。

一阶条件

Suppose $f$ is differentiable (i.e., its gradient $\nabla f$ exists at each point in $\operatorname{dom} f$ , which is open). Then $f$ is convex if and only if $\operatorname{dom} f$ is convex and

f(y) \geq f(x)+\nabla f(x)^T(y-x)

holds for all $x, y \in \operatorname{dom} f$ . This inequality is illustrated in figure $3.2$ .

如果 $\nabla f(x)=0$ ，则是最小值点。也可以是最优解集。

一阶条件和定义等价

一阶条件推定义

\forall x,y \in \operatorname{dom} f\\ ty+(1-t)x \in \operatorname{dom} f\\ \tilde{t}y+(1-\tilde{t})x \in \operatorname{dom} f

一阶条件：

f(ty+(1-t)x) \ge f(\tilde{t}y+(1-\tilde{t})x)+\nabla f(\tilde{t}y+(1-\tilde{t})x)^T(ty+(1-t)x-\tilde{t}y+(1-\tilde{t})x)\\=f(\tilde{t}y+(1-\tilde{t})x)+\nabla f(\tilde{t}y+(1-\tilde{t})x)^T(y-x)(t- \tilde{t})

定义要求 $f(ty+(1-t)x) \ge tf(y)+(1-t)f(x)$

定义 $g(t)=f(ty+(1-t)x),g(\tilde{t})=f(\tilde{t}y+(1-\tilde{t})x)$

则

g'(\tilde{t})=\nabla f(\tilde{t}y+(1-\tilde{t})x)^T(y-x)

g(t)\ge g(\tilde{t})+g'(\tilde t)(t-\tilde{t})

则 $f(x)$ 为凸。

二阶条件

若 $f:\R ^n\to \R$ 二阶可微，则 $f$ 为凸 $\Leftrightarrow$ $\operatorname{dom} f$ 为凸， $\nabla^2 f(x) \succeq 0 \forall x \in \operatorname{dom} f$ 。

但是 $f$ 严格凸不能推出 $\nabla^2 f(x) \succ 0$ ，例如 $f(x)=x^4$ 。

二次函数

f:\R^n \to \R\\ f(x)=\frac{1}{2} x^T P x + q^Tx+r

一般要求 $P \in S^n,q\in \R^n ,r\in \R$ 。

使用二阶条件：

$\nabla^2 f(x)=P$ 只要求半正定。

例： $f(x)=\frac{1}{x^2}$ ， $f''(x)=6x^{-4}>0$ ，但是定义域非凸。

仿射函数

f(x)=Ax+b,\nabla^2 f(x)=0

指数函数

f(x)=e^{ax},x\in \R

f''(x)=a^2e^{ax}

幂函数

f(x)=x^a,x\in \R_{++}

\nabla^2 f(x) \ge 0 ,a\ge 1 or a\le0;\le 0 ,0\le a\le1

绝对值的幂函数

对数函数

f(x)=\log x,x\in \R_{++}

f''(x)=-\frac{1}{x^2} <0

凹函数

负熵函数

f(x)=x\log x,x\in \R_{++}

(x+\log x)'=\log x+1,(x\log x)''=\frac{1}{x} >0

严格凸，凹凸性翻转

极小化负熵函数比较容易。

范数

$\R ^n$ 空间的范数 $p(x),x\in \R^n$ 。

满足三个性质：

$P(ax)=|a|P(x)$
$P(x+y)\le P(x)+P(y)$
$P(x)=0 \Leftrightarrow x=0$ 。

用原始定义

$\forall x,y\in \R^n,\forall 0\le \theta\le 1$

P(\theta x+(1-\theta)y)\le P(\theta x)+P((1-\theta)y)=\theta P(x)+(1-\theta)P(y)

二范数：可以看做旋转抛物面。

零范数：

||x||_0=非零元素个数

但是范数，也不是凸函数。

极大值函数

f(x)=\max\{x_1,\cdots,x_n\},x\in \R^n

$\forall x,y\in \R^n,\forall 0\le \theta\le 1$

f(\theta x+(1-\theta)y)=\max\{\theta x_i+(1-\theta)y_i,i=1,\cdots,n\}\\ \le \theta \max\{x_i,i=1,\cdots,n\}+(1-\theta)\max\{y_i,i=1,\cdots,n\}\\ =\theta f(x)+(1-\theta)f(y)

等号成立：极大值下标相同。

极大极小问题

\min_x\max_y f(x,y)

log-sum-up 函数

f(x)=\log(e^{x_1}+\cdots+e^{x_n}),x\in \R^n

\max\{x_1,\cdots,x_n\}\le f(x)\le \max\{x_1,\cdots,x_n\}+\log n

等号成立当 $x_1=\cdots=x_n$ 。

逼近的函数也是凸函数。

Hessen 矩阵：

z=[e^{x_1},\cdots,e^{x_n}]^T

H=\frac{1}{(1^Tz)^2}((1^Tz)diag(z)-zz^T)

用定义。

$V^TKV=(1^Tz)(V^T diag(z) V)-V^TZZ^TV$

$=(\sum z_i)(\sum V_i^2 z_i)-(\sum V_iz_i)^2$

柯西不等式

几何平均

f(x)=(x_1+x_2+\cdots+x_n)^\frac{1}{n} ,x\in \R_{++}^n

取等……

行列式的对数

f(x)=\log\det x,\operatorname{dom} f=S_{++}^n

凹函数，可以把 $x$ 看成标量。

当 $n=1$ 时……

当 $n >1$ 时，

g(t)=f(z+tv)=\log\det(z+tv)=\log\det(z^\frac{1}{2}(I+tz^{-\frac{1}{2}}vz^\frac{1}{2})z^\frac{1}{2})

\log\det z+\sum\log(1+t\lambda_i)

$\lambda_i$ 表示 $z^{-\frac{1}{2}}vz^\frac{1}{2}$ 的特征值。

$v\Rightarrow Q\Lambda Q^T,QQ^T=I$ 。

\det(I+tz^{-\frac{1}{2}}vz^{-\frac{1}{2}})=\det(QQ^T+tQ\Lambda Q^T)=\det(Q(I+t\Lambda)Q^T)

$g''(t)\le0$ 。