@@ -281,10 +281,10 @@ That was just a re-arrangement. Now, let's require that
281
281
$$\lambda^\prime = -\frac{df}{du}^\ast \lambda + \left(\frac{dg}{du} \right)^\ast$$
282
282
$$\lambda(T) = 0$$
283
283
284
- This means that the boundary term of the integration by parts is zero, and also one of those integral terms are perfectly zero.
284
+ This means that one of the boundary term of the integration by parts is zero, and also one of those integrals is perfectly zero.
285
285
Thus, if $\lambda$ satisfies that equation, then we get:
286
286
287
- $$\frac{dG}{dp} = \lambda^\ast(t_0)\frac{dG}{du} (t_0) + \int_{t_0}^T \left(g_p + \lambda^\ast f_p \right)dt$$
287
+ $$\frac{dG}{dp} = \lambda^\ast(t_0)\frac{du (t_0)}{dp} + \int_{t_0}^T \left(g_p + \lambda^\ast f_p \right)dt$$
288
288
289
289
which gives us our adjoint derivative relation.
290
290
@@ -296,8 +296,8 @@ in which case
296
296
297
297
$$g_u(t_i) = 2(d_i - u(t_i,p))$$
298
298
299
- at the data points $(t_i,d_i)$. Therefore, the derivative of an ODE solution
300
- with respect to a cost function is given by solving for $\lambda^\ast$ using an
299
+ at the data points $(t_i,d_i)$. Therefore, the derivatives of a cost function with respect to
300
+ the parameters is obtained by solving for $\lambda^\ast$ using an
301
301
ODE for $\lambda^T$ in reverse time, and then using that to calculate $\frac{dG}{dp}$.
302
302
Note that $\frac{dG}{dp}$ can be calculated simultaneously by appending a single
303
303
value to the reverse ODE, since we can simply define the new ODE term as
@@ -327,15 +327,15 @@ on-demand. There are three ways which this can be done:
327
327
numerically this is unstable and thus not always recommended (ODEs are
328
328
reversible, but ODE solver methods are not necessarily going to generate the
329
329
same exact values or trajectories in reverse!)
330
- 2. If you solve the forward ODE and receive a continuous solution $u(t)$, you
331
- can interpolate it to retrieve the values at any given the time reverse pass
330
+ 2. If you solve the forward ODE and receive a solution $u(t)$, you
331
+ can interpolate it to retrieve the values at any time at which the reverse pass
332
332
needs the $\frac{df}{du}$ Jacobian. This is fast but memory-intensive.
333
333
3. Every time you need a value $u(t)$ during the backpass, you re-solve the
334
334
forward ODE to $u(t)$. This is expensive! Thus one can instead use
335
- *checkpoints*, i.e. save at finitely many time points during the forward
335
+ *checkpoints*, i.e. save at a smaller number of time points during the forward
336
336
pass, and use those as starting points for the $u(t)$ calculation.
337
337
338
- Alternative strategies can be investigated, such as an interpolation which
338
+ Alternative strategies can be investigated, such as an interpolation that
339
339
stores values in a compressed form.
340
340
341
341
### The vjp and Neural Ordinary Differential Equations
@@ -348,11 +348,11 @@ backpass
348
348
$$\lambda^\prime = -\frac{df}{du}^\ast \lambda - \left(\frac{dg}{du} \right)^\ast$$
349
349
$$\lambda(T) = 0$$
350
350
351
- can be improved by noticing $\frac{df}{du}^\ast \lambda $ is a vjp, and thus it
351
+ can be improved by noticing $\lambda^\ast \ frac{df}{du}$ is a vjp, and thus it
352
352
can be calculated using $\mathcal{B}_f^{u(t)}(\lambda^\ast)$, i.e. reverse-mode
353
353
AD on the function $f$. If $f$ is a neural network, this means that the reverse
354
354
ODE is defined through successive backpropagation passes of that neural network.
355
- The result is a derivative with respect to the cost function of the parameters
355
+ The result is a derivative of the cost function with respect to the parameters
356
356
defining $f$ (either a model or a neural network), which can then be used to
357
357
fit the data ("train").
358
358
@@ -385,7 +385,7 @@ spline:
385
385

386
386
387
387
If that's the case, one can use the fit spline in order to estimate the derivative
388
- at each point. Since the ODE is defined as $u^\prime = f(u,p,t)$, one then then
388
+ at each point. Since the ODE is defined as $u^\prime = f(u,p,t)$, one can then
389
389
use the cost function
390
390
391
391
$$C(p) = \sum_{i=1}^N \Vert\tilde{u}^{\prime}(t_i) - f(u(t_i),p,t)\Vert$$
0 commit comments