Skip to content

Commit 3091889

Browse files
Merge pull request #135 from gustavdelius/adjoint-ODE-corrections
Correcting one equation and improving some phrases
2 parents 4792fbc + 955438c commit 3091889

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

_weave/lecture11/adjoints.jmd

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -281,10 +281,10 @@ That was just a re-arrangement. Now, let's require that
281281
$$\lambda^\prime = -\frac{df}{du}^\ast \lambda + \left(\frac{dg}{du} \right)^\ast$$
282282
$$\lambda(T) = 0$$
283283

284-
This means that the boundary term of the integration by parts is zero, and also one of those integral terms are perfectly zero.
284+
This means that one of the boundary term of the integration by parts is zero, and also one of those integrals is perfectly zero.
285285
Thus, if $\lambda$ satisfies that equation, then we get:
286286

287-
$$\frac{dG}{dp} = \lambda^\ast(t_0)\frac{dG}{du}(t_0) + \int_{t_0}^T \left(g_p + \lambda^\ast f_p \right)dt$$
287+
$$\frac{dG}{dp} = \lambda^\ast(t_0)\frac{du(t_0)}{dp} + \int_{t_0}^T \left(g_p + \lambda^\ast f_p \right)dt$$
288288

289289
which gives us our adjoint derivative relation.
290290

@@ -296,8 +296,8 @@ in which case
296296

297297
$$g_u(t_i) = 2(d_i - u(t_i,p))$$
298298

299-
at the data points $(t_i,d_i)$. Therefore, the derivative of an ODE solution
300-
with respect to a cost function is given by solving for $\lambda^\ast$ using an
299+
at the data points $(t_i,d_i)$. Therefore, the derivatives of a cost function with respect to
300+
the parameters is obtained by solving for $\lambda^\ast$ using an
301301
ODE for $\lambda^T$ in reverse time, and then using that to calculate $\frac{dG}{dp}$.
302302
Note that $\frac{dG}{dp}$ can be calculated simultaneously by appending a single
303303
value to the reverse ODE, since we can simply define the new ODE term as
@@ -327,15 +327,15 @@ on-demand. There are three ways which this can be done:
327327
numerically this is unstable and thus not always recommended (ODEs are
328328
reversible, but ODE solver methods are not necessarily going to generate the
329329
same exact values or trajectories in reverse!)
330-
2. If you solve the forward ODE and receive a continuous solution $u(t)$, you
331-
can interpolate it to retrieve the values at any given the time reverse pass
330+
2. If you solve the forward ODE and receive a solution $u(t)$, you
331+
can interpolate it to retrieve the values at any time at which the reverse pass
332332
needs the $\frac{df}{du}$ Jacobian. This is fast but memory-intensive.
333333
3. Every time you need a value $u(t)$ during the backpass, you re-solve the
334334
forward ODE to $u(t)$. This is expensive! Thus one can instead use
335-
*checkpoints*, i.e. save at finitely many time points during the forward
335+
*checkpoints*, i.e. save at a smaller number of time points during the forward
336336
pass, and use those as starting points for the $u(t)$ calculation.
337337

338-
Alternative strategies can be investigated, such as an interpolation which
338+
Alternative strategies can be investigated, such as an interpolation that
339339
stores values in a compressed form.
340340

341341
### The vjp and Neural Ordinary Differential Equations
@@ -348,11 +348,11 @@ backpass
348348
$$\lambda^\prime = -\frac{df}{du}^\ast \lambda - \left(\frac{dg}{du} \right)^\ast$$
349349
$$\lambda(T) = 0$$
350350

351-
can be improved by noticing $\frac{df}{du}^\ast \lambda$ is a vjp, and thus it
351+
can be improved by noticing $\lambda^\ast \frac{df}{du}$ is a vjp, and thus it
352352
can be calculated using $\mathcal{B}_f^{u(t)}(\lambda^\ast)$, i.e. reverse-mode
353353
AD on the function $f$. If $f$ is a neural network, this means that the reverse
354354
ODE is defined through successive backpropagation passes of that neural network.
355-
The result is a derivative with respect to the cost function of the parameters
355+
The result is a derivative of the cost function with respect to the parameters
356356
defining $f$ (either a model or a neural network), which can then be used to
357357
fit the data ("train").
358358

@@ -385,7 +385,7 @@ spline:
385385
![](https://user-images.githubusercontent.com/1814174/66883762-fc662500-ef9c-11e9-91c7-c445e32d120f.PNG)
386386

387387
If that's the case, one can use the fit spline in order to estimate the derivative
388-
at each point. Since the ODE is defined as $u^\prime = f(u,p,t)$, one then then
388+
at each point. Since the ODE is defined as $u^\prime = f(u,p,t)$, one can then
389389
use the cost function
390390

391391
$$C(p) = \sum_{i=1}^N \Vert\tilde{u}^{\prime}(t_i) - f(u(t_i),p,t)\Vert$$

0 commit comments

Comments
 (0)