20
Probabilistic ODE Solvers with Runge-Kutta Means Michael Schober * , David Duvenaud , Philipp Hennig * * Research Group Elementary Intelligence Department of Empirical Inference Max Planck Institute for Intelligent Systems Tübingen, Germany Computational and Biological Learning Lab Department of Engineering Cambridge University

Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

  • Upload
    vancong

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Probabilistic ODE Solverswith Runge-Kutta Means

Michael Schober∗, David Duvenaud‡, Philipp Hennig∗

∗Research Group Elementary IntelligenceDepartment of Empirical InferenceMax Planck Institute for Intelligent SystemsTübingen, Germany

‡Computational and Biological Learning LabDepartment of EngineeringCambridge University

Page 2: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Can we assign a probability distributionover the solution to

an ordinary differential equation(initial value problem)?

x(t0) = x0 x′(t) = f(x(t), t)

1 ,

Page 3: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

The Probabilistic View on Computationcomputing as the collection of information [Poincaré, 1896, Diaconis, 1988, O’Hagan, 1992]

A numerical methodestimates a function’s latent property

given the result of computations.

quadrature estimates ∫ ba f(x)dx given {f(xi)}linear algebra estimates x s.t. Ax = b given {As = y}

optimization estimates x s.t. ∇f(x) = 0 given {∇f(xi)}analysis estimates x(t) s.t. x′ = f(x, t), given {f(xi, ti)}

▸ computations yield “data” / “observations”▸ non-analytic quantities are “latent”▸ even deterministic quantities can be uncertain.

2 ,

Page 4: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Numerical Methods and Statistical Estimatorsseveral classic numerical algorithms identified precisely as maximum a-posteriori estimators

quadrature [Diaconis, 1988, O’Hagan, 1991]Gaussian quadrature Gaussian process regression

linear algebra [Hennig, 2015]conjugate gradients Gaussian conditioning

nonlinear optimization [Hennig & Kiefel, 2013]

BFGS autoregressive filtering

ordinary differential equations [Schober et al., 2014]Runge-Kutta Gauss-Markov extrapolation

3 ,

Page 5: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]

t0

t

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s

i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi

4 ,

Page 6: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]

t0 t0 + c1 t0 + c2 t0 + ht

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s

i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi

4 ,

Page 7: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]

t0 t0 + c1 t0 + c2 t0 + ht

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s

i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi

4 ,

Page 8: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Runge-Kutta methodsare linear extrapolators of high convergence order [Hairer et al., 1987]

t0 t0 + c1 t0 + c2 t0 + ht

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

Y1 = f(1x0, t0 + 0)Y2 = f (1x0 +w11Y1, t0 + c1)Ys+1 = f (1x0 +∑s

i wsiYi, t0 + cs)x̂(t0 + h) = 1x0 +∑i biYi

4 ,

Page 9: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Gaussian process solversare also linear extrapolators

▸ Linear extrapolation suggests Gaussian process model▸ Gaussian process solvers previously studied

[Skilling (1991), Chrekbtii et al. (2014), Hennig & Hauberg (2014)]

5 ,

Page 10: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Some properties of Gaussian measuresThe only two equations you really need (in this group)

▸ closure under affine transformations (x ∈ RN ,y ∈ RM )

p(x) ∼N (m,P ), p(y∣x) ∼ N (Hx + ν,R)⇒ p([x

y]) ∼N ([ m

Hm + ν] , [ P PH⊺HP HPH⊺ +R])

▸ inference involves only linear algebra operations

p([xy]) ∼N ([m1

m2] , [P 1 C

C⊺ P 2])

p(x ∣y) ∼N (m1 +CP −12 (y −m2),P 1 −CP −1

2 C⊺)

⇒ sequential Gaussian inference at linear cost (‘filtering’)

6 ,

Page 11: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Gaussian process solversimplicitly define a Butcher tableau

t0 t0 + c1 t0 + c2 t0 + ht

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

y1 = f (µ ∣x0(t0 + 0), t0 + 0)

y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)

x̂(t0 + h) = µ ∣x0,yi(t0 + h)

µ ∣x0(t0) ∶= [k(t0, t0)] [k(t0, t0)]−1 (x0)

= 1x0 7 ,

Page 12: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Gaussian process solversimplicitly define a Butcher tableau

t0 t0 + c1 t0 + c2 t0 + ht

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

y1 = f (µ ∣x0(t0 + 0), t0 + 0)

y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)

ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)x̂(t0 + h) = µ ∣x0,yi(t0 + h)

µ ∣x0,y1(t0 + c1) ∶= [k(t0 + c1, t0) k∂(t0 + c1, t0)] [ k(t0, t0) k∂(t0, t0)

k∂ (t0, t0) k∂ ∂(t0, t0)]−1 (x0

y1)

= w10x0 +w11y1 7 ,

Page 13: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Gaussian process solversimplicitly define a Butcher tableau

t0 t0 + c1 t0 + c2 t0 + ht

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

y1 = f (µ ∣x0(t0 + 0), t0 + 0)

y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)

x̂(t0 + h) = µ ∣x0,yi(t0 + h)

µ ∣x0,yi(t0 + cs) ∶= [k(t0 + cs, t0) k∂(t0 + cs, t0 + ci)]K−1 (x0

yi)

= w20x0 +∑si=1w2iyi 7 ,

Page 14: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Gaussian process solversimplicitly define a Butcher tableau

t0 t0 + c1 t0 + c2 t0 + ht

x(t)

0

c1

c2

h

1

1 w11

1 w21 w22

1 b1 b2 b3

y1 = f (µ ∣x0(t0 + 0), t0 + 0)

y2 = f (µ ∣x0,y1(t0 + c1), t0 + c1)ys+1 = f (µ ∣x0,yi(t0 + cs), t0 + cs)

x̂(t0 + h) = µ ∣x0,yi(t0 + h)µ ∣x0,yi

(t0 + h) ∶= [k(t0 + h, t0) k∂(t0 + h, t0 + ci)]K−1 (x0

yi)

= b0x0 +∑si=1 biyi 7 ,

Page 15: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Gauss-Markov-Runge-Kutta methodsa GP solver whose mean matches RK exactly

▸ RK choose (c,w, b) such that ∥x̂(t0 + h) − x(t0 + h)∥ = O(hp)▸ polynomial form suggests integrated Wiener (polynomial spline)

process

p(x(t)) = GP(x(t); 0, ks(t, t′)) where

ks(t, t′) =[ t

τ[ t′

τmin(t̃, t̃′)dt̃ dt̃′

▸ τ _−∞: improper prior p(x(t)), proper posterior after sobservations.

▸ kth-times integrated Wiener process gives k-order RK solver!▸ Inherets RK guarantees. Gives closed-form solution for tableau (used

to use numerical search!)▸ a Markov (state-space) model, so inference is O(s) (as opposed to

usual O(s3) cost

8 ,

Page 16: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Calibrating Uncertaintywithin the parametrized class

▸ posterior mean µ ∣y = kK−1y invariant under k_ θ2k

▸ posterior covariance k ∣y = k − kK−1k scaled by θ2

▸ initial ideas for uncertainty calibration in paper (more to come)

9 ,

Page 17: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Multi-Step Extension[A. Nordsieck, 1962]

Naïve chaining Smoothing Probabilistic continuation

0.20.40.60.81

x

t0 +⋯ h 2h 3h 4h

0

2

4 ⋅10−2

t

x(t)−

f(t)

t0 +⋯ h 2h 3h 4h

⋅10−2

t

t0 +⋯ h 2h 3h 4h

⋅10−2

t

▸ probabilistic interpretation questions RK beyond s steps▸ ‘obvious’ solution is to continue filtering process▸ result very similar, though not identical, to multi-step methods

10 ,

Page 18: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Some Conceptual Open Questionsprecise interpretation of posterior measure still evolving

How precise can the connection to multi-step methods be?▸ order / stability conditions currently not fully understood▸ flexibility is also a design criterion▸ what about stiff problems?

What, precisely, does the posterior mean?▸ width of Gaussian posterior should be inferred from regularity of

‘observed’ gradients. How, precisely, should this be done? (We haveone particular solution)

▸ is the Gaussian family enough? How expensive is it to move beyondGauss?

11 ,

Page 19: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

What we’ve done so far:▸ Numerical methods can be interpreted as performing statistical

inference from noise-free data▸ in some cases, e.g. Runge-Kutta, this link can be made precise▸ Inherets convergence guarantees, but also get extensibility &

uncertainty estimates

What we’re working on next:▸ understand the connection to multi-step methods▸ construct a robust probabilistic IVP solver▸ Continue finding model-based interpretations of numerical solvers.

12 ,

Page 20: Probabilistic ODE Solvers with Runge-Kutta Meansduvenaud/talks/odes_runge_kutta_nips.pdfProbabilistic ODE Solvers with Runge-Kutta Means Michael Schober∗, David Duvenaud‡, Philipp

Bibliography

P. Diaconis. Bayesian numerical analysis. Statistical decision theory and related topics, IV(1):163–175,1988.

E. Hairer, S.P. Nørsett, and G. Wanner. Solving Ordinary Differential Equations I – Nonstiff Problems.Springer, 1987.

S. Hauberg, M. Schober, M. Liptrot, P. Hennig, and A. Feragen. A random riemannian metric forprobabilistic shortest-path tractography. In Medical Image Computing and Computer AssistedIntervention–MICCAI 2015. Springer, 2015.

P. Hennig. Probabilistic interpretation of linear solvers. SIAM J on Optimization, 25(1):210–233, 2015.

P. Hennig and M. Kiefel. Quasi-Newton Methods – a new direction. Journal of Machine LearningResearch, 14:834–865, March 2013.

A. O’Hagan. Bayes–Hermite quadrature. J of Statistical Planning and Inference, 29(3):245–260, 1991.

A. O’Hagan. Some Bayesian Numerical Analysis. Bayesian Statistics, 4:345–363, 1992.

H. Poincaré. Calcul des probabilités. Gauthier-Villars, Paris, 1896.

S. Särkkä. Recursive Bayesian Inference on Stochastic Differential Equations. PhD thesis, HelsinkiUniversity of Technology, 2006.

M. Schober, D. Duvenaud, and P. Hennig. Probabilistic ODE Solvers with Runge-Kutta Means.Advances in Neural Information Processing Systems (NIPS), 2014.

M. Schober, N. Kasenburg, A. Feragen, P. Hennig, and S. Hauberg. Probabilistic shortest pathtractography in DTI using Gaussian Process ODE solvers. In Medical Image Computing andComputer-Assisted Intervention–MICCAI 2014. Springer, 2014.

13 ,