Constraining the Dynamics of Deep Probabilistic Modelsfilippon/Talks/icml2018_poster.pdf · • Gaussian (DGP-G), • Student-t (DGP-t), • … n ∏p(Y n |F,ψ) Data likelihood:

AimofthisworkAgeneralandefficientformulationtoimposesetofconstraints(equalityandinequality)onDeepProbabilisticModelsandtheirderivativesofanyorder,focusingonDeepGaussianProcesses.

ConstrainingtheDynamicsofDeepProbabilisticModelsMarcoLorenzi1&MaurizioFilippone2

1-UniversitéCôted’Azur,InriaSophiaAntipolis,France2-EURECOM,SophiaAntipolis,France

ChallengeThetranslationofcomplexlearningmethodsinnaturalscienceischallengedbytheneedofinterpretablesolutionsfollowinggiven

mechanisticconstraints.

φ(tΩ)

t

Ω(1) W (1)

φ '(tΩ)ΩTdt

dt

Ω(1) W (1)

f (t)

df (t)

Scalability:Solvinghigh-dimensionalequationsLorenz96model

•  32noisyobservation,•  upto1000equations,•  2/3ofequationsonlyfor

training(observedstates)

Inequalityconstraintsformonotonicregression

MonotonicregressionwithPoissonlikelihood

RegressionwithPoissonlikelihood

Datafrom

[Broffitetal,1988]

Scalability:LargeN

GPandderivativesasBayesianNeuralNetworks

• Efficiency• Flexibility• Extensionto“deep”models• Extends[Cutajaretal,2017]

f ~GP(0,Σ(x,x ')) ddxf ~GP(0, d

2

dxdx 'Σ(x,x '))

Observation1.GPsareclosedstochasticprocessesunderlinearoperations

Observation2.GPscanbeapproximatedviaspectralrepresentationofkernels[Rahimi&Recht,2008]

Σrbf (xi ,x j ) ≈1/ NRF [cos(x jTωr ),sin(x j

Tωr )]r∑ [cos(x j

Tωr ),sin(x jTωr )]

T

Chi = f (t) s.t. d h fi (t)dth

= Hhi (t, f ,dfdt,..., d

q fdtq

,θ )|t⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Constraintsonfunctionderivatives:

Chi = f (t) s.t. d h fi (t)dth

≥ Hhi (t, f ,dfdt,..., d

q fdtq

,θ )|t⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

C = Chih,i∩

log[ p(Y ,C | t,Ω,ψ ,ψD )]≥ Eq(W ) log[ p(Y |Ω,W ,ψ )]+ Eq(W )q(θ ) log[ p(C |Ω,W ,ψD ,θ )]

−DKL[q(W ) | p(W )]−DKL[q(θ ) | p(θ )]

Derivingalowerboundforthelog-marginal

VariationalapproximationsforGPandconstraintparameters

q(W ) = p(Wjk(l ) )

i , j ,l∏ = N (mjk

(l ) ,(s2 ) jk(l ) )

FormulatingGPregressionwithconstraintsThelog-marginal:

Equality

Inequality

Ω( 2 ) W (2)

Ω(2) W (2)

Efficientimplementationofgradient-basedoptimizationthroughautomaticdifferentiationandstochasticgradientdescent

ExperimentalvalidationODEModelingFitzHugh-Nagumoequations

Estimationerrorin5folds

N=1000N=80

MoreODEs

Acknowledgments.ThisworkhasbeensupportedbytheFrenchgovernment,throughtheUCAJEDIInvestmentsintheFutureprojectwiththereferencenumberANR-15-IDEX-01.MFgratefullyacknowledgessupportfromtheAXAResearchFund.

q(θ ) = N (µθ ,Σθ )

p(Y ,C | t ,ψ ,ψD ) = p(Y | F ,ψ )p(C | !F ,θ,ψD )∫ p(F , !F | t,ψ,θ )p(θ)dFdGdθ

h,i∏ p(Chi | !F ,θ ,ψD )

Constraintlikelihood:•  Gaussian(DGP-G),•  Student-t(DGP-t),•  …

n∏ p(Yn | F ,ψ)

Datalikelihood:•  Gaussian,•  Poisson,•  …

PrioronGPandderivatives

Prioronconstraintparameters

F!F

References[1]K.Cutajar,E.V.Bonilla,P.Michiardi,andM.Filippone.RandomfeatureexpansionsfordeepGaussianprocesses.ICML2017.[2]A.Rahimi,andB.Recht.RandomFeaturesforLarge-ScaleKernelMachines.NIPS2008

Documents

Constraining the Dynamics of Deep Probabilistic Modelsfilippon/Talks/icml2018_poster.pdf · • Gaussian (DGP-G), • Student-t (DGP-t), • … n ∏p(Y n |F,ψ) Data likelihood: