1
Aim of this work A general and efficient formulation to impose set of constraints (equality and inequality) on Deep Probabilistic Models and their derivatives of any order, focusing on Deep Gaussian Processes. Constraining the Dynamics of Deep Probabilistic Models Marco Lorenzi 1 & Maurizio Filippone 2 1 - Université Côte d’Azur, Inria Sophia Antipolis, France 2- EURECOM, Sophia Antipolis, France Challenge The translation of complex learning methods in natural science is challenged by the need of interpretable solutions following given mechanistic constraints. φ ( tΩ) t Ω (1) W (1) φ '( tΩ)Ω T dt dt Ω (1) W (1) f ( t ) df ( t ) Scalability: Solving high-dimensional equations Lorenz96 model 32 noisy observation, up to 1000 equations, 2/3 of equations only for training (observed states) Inequality constraints for monotonic regression Monotonic regression with Poisson likelihood Regression with Poisson likelihood Data from [Broffit et al, 1988] Scalability: Large N GP and derivatives as Bayesian Neural Networks Efficiency Flexibility Extension to “deep” models Extends [Cutajar et al, 2017] f ~ GP (0, Σ( x , x ')) d dx f ~ GP (0, d 2 dxdx ' Σ( x , x ')) Observation 1 . GPs are closed stochastic processes under linear operations Observation 2 . GPs can be approximated via spectral representation of kernels [Rahimi & Recht, 2008] Σ rbf ( x i , x j ) 1/ N RF [cos( x j T ω r ),sin( x j T ω r )] r [cos( x j T ω r ),sin( x j T ω r )] T C hi = f ( t ) s.t. d h f i ( t ) dt h = H hi ( t , f , df dt ,..., d q f dt q , θ ) | t Constraints on function derivatives: C hi = f ( t ) s.t. d h f i ( t ) dt h H hi ( t , f , df dt ,..., d q f dt q , θ ) | t C = C hi h , i log[ p ( Y , C | t , Ω, ψ , ψ D )] E q (W ) log[ p ( Y | Ω, W , ψ )] + E q (W ) q (θ ) log[ p ( C | Ω, W , ψ D , θ )] DKL [ q( W )| p ( W )] DKL [ q( θ )| p ( θ )] Deriving a lower bound for the log-marginal Variational approximations for GP and constraint parameters q( W ) = p ( W jk ( l ) ) i , j ,l = N ( m jk ( l ) ,( s 2 ) jk ( l ) ) Formulating GP regression with constraints The log-marginal: Equality Inequality Ω (2) W (2) Ω (2) W (2) Efficient implementation of gradient-based optimization through automatic differentiation and stochastic gradient descent Experimental validation ODE Modeling FitzHugh-Nagumo equations Estimation error in 5 folds N = 1000 N = 80 More ODEs Acknowledgments. This work has been supported by the French government, through the UCAJEDI Investments in the Future project with the reference number ANR-15- IDEX-01. MF gratefully acknowledges support from the AXA Research Fund. q( θ ) = N ( μ θ , Σ θ ) p ( Y , C | t , ψ , ψ D ) = p ( Y | F , ψ ) p ( C | ! F , θ, ψ D ) p ( F , ! F | t , ψ, θ ) p ( θ) dFdGdθ h , i p ( C hi | ! F ,θ , ψ D ) Constraint likelihood: Gaussian (DGP-G), Student-t (DGP-t), n p ( Y n | F , ψ ) Data likelihood: Gaussian, Poisson, Prior on GP and derivatives Prior on constraint parameters F ! F References [1] K. Cutajar, E. V. Bonilla, P. Michiardi, and M. Filippone. Random feature expansions for deep Gaussian processes. ICML 2017. [2] A. Rahimi, and B. Recht. Random Features for Large-Scale Kernel Machines. NIPS 2008

Constraining the Dynamics of Deep Probabilistic Modelsfilippon/Talks/icml2018_poster.pdf · • Gaussian (DGP-G), • Student-t (DGP-t), • … n ∏p(Y n |F,ψ) Data likelihood:

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Constraining the Dynamics of Deep Probabilistic Modelsfilippon/Talks/icml2018_poster.pdf · • Gaussian (DGP-G), • Student-t (DGP-t), • … n ∏p(Y n |F,ψ) Data likelihood:

AimofthisworkAgeneralandefficientformulationtoimposesetofconstraints(equalityandinequality)onDeepProbabilisticModelsandtheirderivativesofanyorder,focusingonDeepGaussianProcesses.

ConstrainingtheDynamicsofDeepProbabilisticModelsMarcoLorenzi1&MaurizioFilippone2

1-UniversitéCôted’Azur,InriaSophiaAntipolis,France2-EURECOM,SophiaAntipolis,France

ChallengeThetranslationofcomplexlearningmethodsinnaturalscienceischallengedbytheneedofinterpretablesolutionsfollowinggiven

mechanisticconstraints.

φ(tΩ)

t

Ω(1) W (1)

φ '(tΩ)ΩTdt

dt

Ω(1) W (1)

f (t)

df (t)

Scalability:Solvinghigh-dimensionalequationsLorenz96model

•  32noisyobservation,•  upto1000equations,•  2/3ofequationsonlyfor

training(observedstates)

Inequalityconstraintsformonotonicregression

MonotonicregressionwithPoissonlikelihood

RegressionwithPoissonlikelihood

Datafrom

[Broffitetal,1988]

Scalability:LargeN

GPandderivativesasBayesianNeuralNetworks

• Efficiency• Flexibility• Extensionto“deep”models• Extends[Cutajaretal,2017]

f ~GP(0,Σ(x,x ')) ddxf ~GP(0, d

2

dxdx 'Σ(x,x '))

Observation1.GPsareclosedstochasticprocessesunderlinearoperations

Observation2.GPscanbeapproximatedviaspectralrepresentationofkernels[Rahimi&Recht,2008]

Σrbf (xi ,x j ) ≈1/ NRF [cos(x jTωr ),sin(x j

Tωr )]r∑ [cos(x j

Tωr ),sin(x jTωr )]

T

Chi = f (t) s.t. d h fi (t)dth

= Hhi (t, f ,dfdt,..., d

q fdtq

,θ )|t⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

Constraintsonfunctionderivatives:

Chi = f (t) s.t. d h fi (t)dth

≥ Hhi (t, f ,dfdt,..., d

q fdtq

,θ )|t⎧⎨⎪

⎩⎪

⎫⎬⎪

⎭⎪

C = Chih,i∩

log[ p(Y ,C | t,Ω,ψ ,ψD )]≥ Eq(W ) log[ p(Y |Ω,W ,ψ )]+ Eq(W )q(θ ) log[ p(C |Ω,W ,ψD ,θ )]

−DKL[q(W ) | p(W )]−DKL[q(θ ) | p(θ )]

Derivingalowerboundforthelog-marginal

VariationalapproximationsforGPandconstraintparameters

q(W ) = p(Wjk(l ) )

i , j ,l∏ = N (mjk

(l ) ,(s2 ) jk(l ) )

FormulatingGPregressionwithconstraintsThelog-marginal:

Equality

Inequality

Ω( 2 ) W (2)

Ω(2) W (2)

Efficientimplementationofgradient-basedoptimizationthroughautomaticdifferentiationandstochasticgradientdescent

ExperimentalvalidationODEModelingFitzHugh-Nagumoequations

Estimationerrorin5folds

N=1000N=80

MoreODEs

Acknowledgments.ThisworkhasbeensupportedbytheFrenchgovernment,throughtheUCAJEDIInvestmentsintheFutureprojectwiththereferencenumberANR-15-IDEX-01.MFgratefullyacknowledgessupportfromtheAXAResearchFund.

q(θ ) = N (µθ ,Σθ )

p(Y ,C | t ,ψ ,ψD ) = p(Y | F ,ψ )p(C | !F ,θ,ψD )∫ p(F , !F | t,ψ,θ )p(θ)dFdGdθ

h,i∏ p(Chi | !F ,θ ,ψD )

Constraintlikelihood:•  Gaussian(DGP-G),•  Student-t(DGP-t),•  …

n∏ p(Yn | F ,ψ)

Datalikelihood:•  Gaussian,•  Poisson,•  …

PrioronGPandderivatives

Prioronconstraintparameters

F!F

References[1]K.Cutajar,E.V.Bonilla,P.Michiardi,andM.Filippone.RandomfeatureexpansionsfordeepGaussianprocesses.ICML2017.[2]A.Rahimi,andB.Recht.RandomFeaturesforLarge-ScaleKernelMachines.NIPS2008