抱歉,您的浏览器无法访问本站
本页面需要浏览器支持(启用)JavaScript
了解详情 >

Using the Chain Rule to calculate derivatives

Name:w1,w2,w3,b1,b2,b3w_1,w_2,w_3,b_1,b_2,b_3.

Estimate b3b_3. Assume we have the optimal values for all of the parameters except for the last bias term b3b_3.

Activation function is the softplus activation function with the expression of:

f(x)=log(1+ex)f(x)=\log(1+e^x)

b3b_3 initial value: initialized to 00.

Quantify differences: using residual

Residual=(ObservedPredicted)\rm Residual=(Observed-Predicted)

The sum of the squared residuals(SSR):

Residual2\sum \rm Residual^2

Plugging the derivatives into Gradient Descent to optimize parameters

calculate

dSSR/db3=dSSR/dPredicted×dPredicted/db3\mathrm d SSR/\mathrm d b_3=\mathrm dSSR/\mathrm dPredicted \times \mathrm d Predicted / \mathrm d b_3

Predicted=const+b3dPredicted/db3=1Predicted=const+b_3 \Rightarrow \mathrm d Predicted / \mathrm d b_3=1

dSSR/dPredicted=2(ObservedPredicted)\mathrm dSSR/\mathrm dPredicted =\sum -2 \rm (Observed-Predicted)

Apply to multiple parameters simultaneously

don’t know w3,w4,b3w_3,w_4,b_3

initial value b3=0,w3,w4=randomb_3=0,w_3,w_4=random.

Fancy Notation

x1,i=inputi×w1+b1;x2,i=inputi×w2+b2x_{1,i}=input_i \times w_1 +b_1;x_{2,i}=input_i \times w_2 +b_2

y1,i=f(x1,i);y2,i=f(x2,i)y_{1,i}=f(x_{1,i});y_{2,i}=f(x_{2,i})

Predicted=y1,iw3+y2,iw4+b3Predicted=y_{1,i}w_3+y_{2,i}w_4+b_3

dSSR/dw3=dSSR/dPredicted×dPredicted/dw3\mathrm d SSR/\mathrm d w_3=\mathrm dSSR/\mathrm dPredicted \times \mathrm d Predicted / \mathrm d w_3

Predicted=const+y1,iw3dPredicted/dw3=y1,iPredicted=const+y_{1,i}w_3 \Rightarrow \mathrm d Predicted / \mathrm d w_3=y_{1,i}

Apply to w1,b1,w2,b2w_1,b_1,w_2,b_2

dSSR/dw1\mathrm d SSR/\mathrm d w_1

dSSR/dPredicted(known)\mathrm dSSR/\mathrm dPredicted (known)

dPredicted/dy1=w3\mathrm d Predicted / \mathrm d y_1 =w_3

dy1/dx1=ex11+ex1\mathrm d y_1 / \mathrm d x_1 =\frac{e^{x_1}}{1+e^{x_1}}

dx1/dw1=Inputi\mathrm d x_1 / \mathrm d w_1 = Input_i

评论