Upload
dinhkhue
View
227
Download
0
Embed Size (px)
Citation preview
Big Data Meets Big Models:Solution of Large-Scale Bayesian Inverse Problems
Omar Ghattas
joint work with:
Tan Bui-Thanh, Carsten Burstedde, James MartinNoemi Petra, Georg Stadler, Hari Sundar, Lucas Wilcox
Institute for Computational Engineering & SciencesDepartments of Geological Sciences and Mechanical Engineering
The University of Texas at Austin
NSF Cyberbridges 2013Arlington, VAJuly 15, 2013
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 1 / 49
Outline
1 Background, motivation, and goals
2 Langevin MCMC methods and stochastic Newton
3 Low rank Hessian approximation and scalability
4 Example: Full waveform global seismic inversion
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 2 / 49
Outline
1 Background, motivation, and goals
2 Langevin MCMC methods and stochastic Newton
3 Low rank Hessian approximation and scalability
4 Example: Full waveform global seismic inversion
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 3 / 49
The inverse problem:The quest for knowledge from data and models
Input parameters, computational model,and output observables
The forward problem:
Given input parameters, solve model toyield output observables
Well-posed: solution exists, is unique,and is stable to perturbations in inputs
Causal: later-time solutions dependonly on earlier time solutions
Local: the forward operator includesderivatives that couple nearby solutionsin space and time
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 4 / 49
The inverse problem:The quest for knowledge from data and models
Input parameters, computational model,and output observables
The inverse problem:
Given output observations and model,infer input parameters
Ill-posed: observations are usuallysparse; many different parameter valuesmay be consistent with the data
Non-causal: the inverse operatorcouples earlier time solutions with latertime ones
Global: the inverse operator couplessolution values across all of space andtime
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 4 / 49
The inverse problem:The quest for knowledge from data and models
Input parameters, computational model,and output observables
Uncertainty is a fundamental feature of ill-posed inverse problems:
Deterministic approach to ill-posedness:employ regularization to penalizeunwanted solution features, guaranteeunique solution
Bayesian approach to ill-posedness:describe probability of all models thatare consistent with the data, themodel, and any prior knowledge of theparameters
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 4 / 49
Global seismology inverse problem: Observational data
120 110 100 90
30
40
50
-108 -106 -104 -102 -100 -9838
40
42
44
46
48
-107 -105 -103 -101 -99 -9738
39
40
41
42
43
44
45
46
47
48
30 sec
Black Hills
Figure : Left. USArray network of 400 broadband seismic stations with 70 km spacingover 1000 km aperture. Past/present stations in green, future stations in blue. Right.Shear waves from deep South American earthquake plotted on top of map of arrivaltime. Early arrivals in blue, late arrivals in red. (Courtesy D. Helmberger, Caltech)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 5 / 49
Global seismology inverse problem: Parameter fieldS40RTS 1229
Figure 6. Maps of shear-velocity heterogeneity at, from top to bottom, 100 km, 600 km, 1500 km and 2800 km depth for, from left to right, a total number ofunknowns equal to N = 5000, N = 8000 and N = 11 000 (eq. 5). Model S40RTS has 8000 unknowns.
Typically, N is estimated from misfit curves and by cross-validation (Hastie & Tibshirani 1990). Fig. 8 illustrates how modelmisfit varies as a function of N . Shown is the misfit of fundamental-mode and overtone Rayleigh waves at a period of 62 s (Fig. 8a),the traveltimes of S, SS, ScS and SKS (Fig. 8b), and the combinedsplitting functions (Fig. 8c). As expected, misfit decreases when Nincreases. The misfit is lowest for the fundamental mode Rayleighwave which propagates through the strongly heterogeneous crustand uppermost mantle with well-resolved long-wavelength velocityheterogeneity. Misfit curves for phase delays for Rayleigh waves atother periods and the traveltimes of other body-wave phases havesimilar behaviour. For each data type, the decline in misfit is rela-tively small when N is larger than 8000.
Selecting N is, to large extent, subjective since we do not fullyunderstand the measurement errors, the systematic errors originat-ing from unmodelled crustal effects and the effects of theoreticalsimplifications. Ideally, model uncertainties are evaluated by theanalysis of 3-D synthetics (Komatitsch et al. 2002; Qin et al. 2009;Bozdag & Trampert 2010). On the basis of the misfit curves ofFig. 8 and inspection of maps and cross sections we adopt S40RTSas the model with N = 8000 effective unknowns but we emphasize
that the misfit varies little for models with N between 5000 and11 000.
4.1 Model images
Since S20RTS and S40RTS are based on the same data types andmodelling approaches, it is not surprising that they correlate ex-tremely well. Many of the model characteristics of S20RTS thatwe have discussed previously are still present in S40RTS. Althoughdifferences between S20RTS and S40RTS are subtle, they may haveimportant implications for model interpretations.
Fig. 9 shows images of the upper mantle beneath the At-lantic and surrounding regions. Low velocity anomalies be-neath the Mid-Atlantic Ridge, the Red Sea and East AfricanRift are narrower in S40RTS since lateral resolution is higher.The low-velocity anomaly beneath Iceland extends much deeperthan elsewhere along the Mid-Atlantic Ridge (Montagner &Ritsema 2001) but it does not extend below the 660-km discontinu-ity. In S40RTS, this anomaly is significantly stronger (>3 per cent)than in S20RTS and may inhibit a pure thermal explanation.
C 2010 The Authors, GJI, 184, 12231236Geophysical Journal International C 2010 RAS
Maps of shear velocity heterogeneity at different depths and resolutions(source: J. Ritsema, et al., Geophysical Journal International, 2011)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 6 / 49
Bayesian inference framework for inverse problem
Given:pr(m) := prior p.d.f. of model parameters m
obs(d) := prior p.d.f. of the observables d
model(d|m) := conditional p.d.f. relating d and mThen posterior p.d.f. of model parameters is given by:
post(m)def= post(m|dobs)
pr(m)D
obs(d)model(d|m)(d)
dd
pr(m)(dobs|m)
From A. Tarantola, Inverse Problem Theory, SIAM, 2005
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 7 / 49
Markov chain Monte Carlo method
Explore the Bayesian posterior probability density post(m)
m are model parameters; f(m) is the parameter-to-observable map; dobs aredata; pr and obs are prior and noise covariances
post(m) exp(12 f(m) dobs 21noise
12 mmpr 21pr
)
Example Probability Density
Given a probability density (m):
How do we explore the distribution?
Often high dimensional
Computationally expensive
The MCMC Approach
Replace (m) by a sample chain {mk}Compute using ergodic averages
E[f(M)] =Rn
f(m)(dm) 1
N
Nj=1
f(mk)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 10 / 49
Markov chain Monte Carlo method
Explore the Bayesian posterior probability density post(m)
m are model parameters; f(m) is the parameter-to-observable map; dobs aredata; pr and obs are prior and noise covariances
post(m) exp(12 f(m) dobs 21noise
12 mmpr 21pr
)
Sampled Probability Density
Given a probability density (m):
How do we explore the distribution?
Often high dimensional
Computationally expensive
The MCMC Approach
Replace (m) by a sample chain {mk}Compute using ergodic averages
E[f(M)] =Rn
f(m)(dm) 1
N
Nj=1
f(mk)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 10 / 49
High dimensional space: the final frontier
The curse of dimensionality: Consider a hypersphere inscribed in ahypercube; what is probability that a random sample will lie inhypersphere as dimension increases?
dimension hypersphere/hypercube1 1.002 0.7853 0.5364 0.3085 0.164
10 0.00249100 1.87 1070158 5.76 10126
1000 2.87 10118710, 000 6.65 1016,851
1, 000, 000 8.53 102,684,797
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 11 / 49
Metropolis-Hastings algorithm to sample (m)
1 mk m02 k 03 Choose a point y from the proposal density q(mk, )4 min
(1,
(y)q(y,mk)
(mk)q(mk,y)
)5 If > rand([0, 1]) Then
Accept: mk+1 = y
Otherwise
Reject: mk+1 = mk
End If
6 k k + 17 Repeat from step 3
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 12 / 49
Challenges in large-scale Bayesian inversion
Method of choice is to sample the posterior density using Markov chainMonte Carlo (MCMC); growth in the 1980s transformed Bayesian inference
For inverse problems characterized by high-dimensional parameter spacesand expensive forward simulations, conventional MCMC is prohibitive
Conventional MCMC methods view the parameter-to-observable map as ablack-box
Goals: overcome bottlenecks of MCMC:
avoid black-box MCMC (might be embarrassingly parallel, butalgorithmic scaling is embarrassingly poor!)develop specialized MCMC algorithms that reduce effective problemdimension by exploiting infinite-dimensional structure of the Hessianstructure-exploiting algorithms must map well onto extreme-scalesystems, and scale independently of parameter dimension, statedimension, data dimension, and number of cores
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 13 / 49
References for -D Bayesian inversionJ. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, A Stochastic Newton MCMCmethod for large-scale statistical inverse problems with application to seismic inversion,SIAM Journal on Scientific Computing, 34(3):A1460-A1487, 2012.T. Bui-Thanh, O. Ghattas, J. Martin, and G. Stadler, A computational framework forinfinite-dimensional Bayesian inverse problems. Part I: The linearized case, withapplications to global seismic inversion. SIAM Journal on Scientific Computing,submitted, 2012.T. Bui-Thanh and O. Ghattas, A scalable MAP solver for Bayesian inverse problems withBesov priors, Inverse Problems, submitted, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part III: Inverse medium scattering of electromagnetic wave in three dimensions, InverseProblems, submitted, 2012.T. Bui-Thanh and O. Ghattas, An analysis of infinite dimensional Bayesian inverse shapeacoustic scattering and its numerical approximation, SIAM Journal on Numerical Analysis,submitted, 2012.T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov chainMonte Carlo simulations, SIAM Journal on Uncertainty Quantification, submitted, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part II: Inverse medium scattering of acoustic waves. Inverse Problems, 28(5):055002,2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems.Part I: Inverse shape scattering of acoustic waves. Inverse Problems, 28(5):055001, 2012.
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 15 / 49
Outline
1 Background, motivation, and goals
2 Langevin MCMC methods and stochastic Newton
3 Low rank Hessian approximation and scalability
4 Example: Full waveform global seismic inversion
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 16 / 49
Langevin MCMC (Grenander & Miller, 1994)
Given the target density (m), the associated Langevin SDE is given by:
dmt = Am( log )dt+
2A1/2dW t
Discretize with timestep t to derive proposal for Metropolis-Hastings:
mpropk+1 = mk Am( log )t+
2tA1/2N(0, I)
Notes:
Preconditioner A must be symmetric positive definite
Process is ergodic (convergence of time averages)
W t is i.i.d. vector of standard Brownian motions
W t has independent increments given by
W (t+t) W t N(0,t I
)See work by A. Stuart, Y. Efendiev, ...
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 17 / 49
Stochastic Newtons method
Langevin Metropolis Hastings MCMC proposal given by:
mpropk+1 = mk Am( log )t+
2tA1/2N(0, I)
Take A to be the inverse of the (local) Hessian and set t = 1:
A = H(m)1 2m( log (m))1
=(F T1noiseF +
1pr
)1(local covariance matrix)
Then we have the stochastic equivalent of Newtons method:
mpropk+1 = mk H1m( log ) + N(0,H1)
Often leads to several orders of magnitude reduction in number of samples.
Details in: J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, A stochastic Newton MCMCmethod for large-scale statistical inverse problems with application to seismic inversion, SIAMJournal on Scientific Computing, 34(3):A1460-A1487, 2012.
T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov chain MonteCarlo simulations, submitted.
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 18 / 49
Rosenbrock illustration: Gaussian random walk
0.5 0 0.5 10.5
0
0.5
1
x
y
mpropk+1 = mk + N(0, I)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 19 / 49
Rosenbrock illustration: Unpreconditioned Langevin
0.5 0 0.5 10.5
0
0.5
1
x
y
mpropk+1 = mk tm( log ) +
2t N(0, I)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 20 / 49
Rosenbrock illustration: Hessian-preconditioned Langevin
0.5 0 0.5 10.5
0
0.5
1
x
y
mpropk+1 = mk H1m( log ) + N(0,H1)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 21 / 49
Convergence comparison: different MCMC methods
Multivariate potential scale reduction factor (MPSRF) convergencestatistic for 65-parameter problem
unpreconditioned Langevin vs. stochastic Newton vs. Adaptive MetropolisICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 23 / 49
Outline
1 Background, motivation, and goals
2 Langevin MCMC methods and stochastic Newton
3 Low rank Hessian approximation and scalability
4 Example: Full waveform global seismic inversion
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 25 / 49
Large-scale local covariance estimates
Stochastic Newton requires a (local) Gaussian approximation whose covariance isgiven by the inverse of the Hessian, which is formally a dense operator. Key idea:never form H (every column would requires a forward solve); instead:
recognize that H is sum of data misfit term, which is often equivalent to acompact operator, and (the inverse of) a prior, which is often equivalent toa differential operator:
H = F T1noiseF + 1pr
invoke low rank (truncated spectral decomposition) approximation of datamisfit operator using randomized SVD; often requires constant number offorward/adjoint solves, independent of problem size
combine with Sherman-Morrison-Woodbury to invert/factor
Details in: H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, andO. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverseproblems based on low-rank partial Hessian approximations, SIAM Journal on ScientificComputing, 33(1):407432, 2011.
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 26 / 49
Low rank approximation of data misfit Hessian
post = H1
=(F T1noiseF +
1pr
)1=
1/2pr
(1/2pr F
T1noiseF 1/2pr + I
)11/2pr
1/2pr(V rrV
Tr + I
)11/2pr
= 1/2pr
[I V rDrV Tr + O
(n
i=r+1
ii + 1
)]1/2pr
where V r,r are truncated eigenvector/values of prior-preconditioneddata misfit Hessian, and Dr = diag(i/(i + 1))
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 27 / 49
Computations required for stochastic Newton
Never need to form dense Hessian:
H = 1/2pr[V rrV
Tr + I
]1/2pr
H1g = 1/2pr{V r[(r + Ir)
1 Ir]V Tr + I
}1/2pr g (Newton step)
H1/2x = 1/2pr{V r[(r + Ir)
1/2 Ir]V Tr + I
}x (drawing a sample)
det(H1/2) = (det pr)1/2
ri=1
(i + 1)1/2 (accept/reject criterion of M-H)
Complexity of these operations is scalable (i.e. requires a number of forward PDE solvesthat is independent of the parameter dimension) when:
prior-preconditioned data misfit Hessian is compact with mesh-independentdominant spectrum (theoretical results)
dominant spectrum is captured in a number of matvecs that is a constant multipleof number of dominant eigenvalues (e.g., using Lanczos or randomized SVD)
Hessian-vector products carried out matrix-free using adjoint methods
square root prior 1/2pr taken as inverse of elliptic operator; fast elliptic solver for
computing its action 1/2pr z
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 29 / 49
Outline
1 Background, motivation, and goals
2 Langevin MCMC methods and stochastic Newton
3 Low rank Hessian approximation and scalability
4 Example: Full waveform global seismic inversion
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 30 / 49
Elastic/acoustic wave equationsGoverning equations in velocity-strain form
E
t=
1
2
(v + vT
)in B
v
t= ( tr(E)I + 2E) + f in B
Sn = tbc(t) on B
v = v0(x) at t = 0
E = E0(x) at t = 0
E strain tensor
S stress tensor
mass density
v displacement velocity
f body force per unit mass
and Lame parameters
I identity tensor
tbc traction bc
v0,E0 initial conditions
t time
x point in the body
B solution bodyICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 31 / 49
Forward discontinuous Galerkin wave propagation solver
seven mortars
M1M2
M3M4
M5M6
M7
+PeimiPmieiPe0mi
Pmie0
() July 15, 2011 1 / 1
0
3
1
4
2
nonconforming hexahedral elements with Koprivas mortar approach forhyperbolic equations same convergence rate as conforming elementstensor product Lagrange basis on the Legendre-Gauss-Lobatto (LGL) nodesLGL quadrature (diagonal mass matrix)time integration by classical 4-stage/RK4integrated parallel mesh generation/adaptivity
L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkinmethod for wave propagation through coupled elastic-acoustic media, Journal of ComputationalPhysics, 229(24):93739396, 2010.
T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkinspectral element method for wave propagations, SIAM Journal on Numerical Analysis,50(3):1801826, 2012.
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 32 / 49
Point source approximation of M9 Tohoku earthquake
Animation by Greg Abram, TACC
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 33 / 49
japan_quake.movMedia File (video/quicktime)
Strong scalability of global seismic wave propagationExcellent strong scalability on Jaguar for meshing+wave propagation
# proc meshing wave prop par eff Tflopscores time (s) per step (s) wave32,640 6.32 12.76 1.00 25.665,280 6.78 6.30 1.01 52.2
130,560 17.76 3.12 1.02 105.5223,752
Extreme granularity limits for strong scaling of forward DGwave propagation solver on ORNL Cray XK6
#cores cpu per step (ms) elem/core efficiency (%)256 1630.80 4712 100.0512 832.46 2356 98.0
1024 411.54 1178 99.18192 61.69 148 82.6
65536 11.79 19 54.0131072 7.09 10 44.9262144 4.07 5 39.2
table shows wall clock time per time step in ms, elements per core,and parallel efficiency for 3 orders of magnitude increase in core count
just 1.21 million 3rd order DG elements (694 million unknowns)
parallel efficiency remains at 39% with just 4 or 5 elements/core
T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox, Extreme-scaleUQ for Bayesian inverse problems governed by PDEs, Proceedings of IEEE/ACM SC12, 2012(Gordon Bell Prize Finalist).
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 35 / 49
Gradient and Hessian for full waveform seismic inversion
Would like to compute gradients and Hessian actions w.r.t. c of
J(c) :=1
2
T0
(Bv(c) vobs)T 1noise (Bv(c) vobs)dx dt+ Rpr(c)
where the dependence of v on c is given by solving the forward wavepropagation equations:
vt (c2e) = g in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0},
e = 0 on (0, T ).
v, e are velocity and strain dilation
c is the uncertain local wave speed parameter
and g are known density and seismic source
vobs are observations at receivers, B(x) is an observation operator
noise is the noise covariance
Rpr is the prior term involving 1prior
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 36 / 49
The gradient computation
Gradient expression w.r. to c given by
G(c) := 2c
T0
e( w) dt+ Rpr(c)
where v, e satisfy the forward wave propagation equations
vt (c2e) = g in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0},
e = 0 on (0, T ).
w, d (adjoint velocity, dilation) satisfy the adjoint wave propagation equations
wt +(c2d) = B 1noise(Bv vobs) in (0, T ),
dt + w = 0 in (0, T ),w = 0, d = 0 in {t = T} ,
d = 0 on (0, T ).
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 37 / 49
Computation of action of Hessian in given direction
Action of the Hessian operator in direction c at a point c given by
H(c)c := 2
T0
ce( w)+ce( w)+ce( w) dt+ Rpr(c)(c),
where v, e satisfy the incremental forward wave propagation equations
vt (c2e) = (2cce) in (0, T ),et v = 0 in (0, T ),v = 0, e = 0 in {t = 0} ,
e = 0 on (0, T ).
and w, d satisfy the incremental adjoint wave propagation equations
wt +(c2d) = (2ccd) B1noiseBv in (0, T ),
dt + w = 0 in (0, T ),
w = 0, d = 0 in {t = T} ,
d = 0 on (0, T ).
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 38 / 49
Application to synthetic global seismic inversion
invert for anomaly from radially-varying PREM model (left)observations: from laterally-varying S20RTS model (right)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 39 / 49
The prior
2 fields prior samples
ground truth
Prior is defined by square of generalized anisotropic Poisson operatorA := + , with
= (I3 (r)rrT
)with (r) :=
1 r2
(2r r2
)if r 6= 0
0 if r = 0,ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 40 / 49
Samples from prior and Gaussianized posterior distributions
1.07 million uncertain acoustic wave speed parameters
0.07 Hz maximum frequency, 3rd order DG elements, 630 million wavepropagation unknowns, 2400 time steps (1000s inversion time window)
up to 100K cores on Jaguar XK6 (single forward solve is 1 minute on 64K cores)
2000 reduction in problem dimension (488 dominant eigenvectors)Top row: Samples from prior
Bottom row: Samples from the posterior
Right: true earth model (black dots=5 sources, white dots=100 receivers)
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 41 / 49
Comparison of true model (S20RTS, left) with MAPsolution (right)
black dots = 3 earthquake sources; white dots = 130 receivers
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49
Comparison of true model (S20RTS, left) with MAPsolution (right)
black dots = 3 earthquake sources; white dots = 130 receivers
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49
Comparison of true model (S20RTS, left) with MAPsolution (right)
black dots = 3 earthquake sources; white dots = 130 receivers
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49
Comparison of true model (S20RTS, left) with MAPsolution (right)
black dots = 3 earthquake sources; white dots = 130 receivers
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49
Comparison of true model (S20RTS, left) with MAPsolution (right)
black dots = 3 earthquake sources; white dots = 130 receivers
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49
Comparison of true model (S20RTS, left) with MAPsolution (right)
black dots = 3 earthquake sources; white dots = 130 receivers
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49
Comparison of true model (S20RTS, left) with MAPsolution (right)
black dots = 3 earthquake sources; white dots = 130 receivers
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 42 / 49
MCMC for posterior distribution
Solving the full UQ problem:
Repeated Hessian evaluations too expensive for this problem
Use Gaussian approximation at MAP as a proposal for MCMC
Accept/Reject framework corrects for errors in approximation
Sampling performance for a coarser problem (with 78k parameters):
15,587 MCMC samples (each requires 1 forward PDE solve)
4399 samples accepted (28%)
Integrated autocorrelation time of about 1620 effective samplesize of about 800
Total runtime of about 96 hours on 2048 cores
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 43 / 49
Samples and point marginals
sampleno. 1 fromposterior
distribution
pointwiseprior
variance
pointwiseposteriorvariance
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49
Samples and point marginals
sampleno. 2 fromposterior
distribution
pointwiseprior
variance
pointwiseposteriorvariance
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49
Samples and point marginals
sampleno. 3 fromposterior
distribution
pointwiseprior
variance
pointwiseposteriorvariance
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49
Samples and point marginals
sampleno. 4 fromposterior
distribution
pointwiseprior
variance
pointwiseposteriorvariance
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 44 / 49
Spectral decay for refined parameter meshesSpectrum of prior-preconditioned misfit Hessian for global seismic inversion problem
0 100 200 300 400 500 600 70010
1
100
101
102
103
104
105
106
107
number
eig
en
va
lue
40,842 parameters
67,770 parameters
431,749 parameters
largest 700 eigenvalues of prior preconditioned data misfit Hessian fordifferent discretizations
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 45 / 49
Summary and conclusions
Stochastic Newton MCMC sampling algorithm reduces number of samples versusconventional MCMC by several orders of magnitude, and makes UQ for Bayesianinverse problems tractable
Compactness of local data-misfit Hessian operator provides several orders ofmagnitude effective dimension reduction without introducing bias
Randomized SVD extracts low rank approximation of data-misfit Hessian indimension-independent number of matvecs
Matrix-free Hessian matvecs implemented through consistent first and secondorder adjoints
Adaptive discontinuous Galerkin forward/adjoint wave propagation solver scales to262K cores with small number of elements per core
Scalability of elliptic solve for action of prior operator assured by hybridGMG-AMG on forest of octrees scalability to 262K cores
Stochastic Newton MCMC applied to synthetic inverse problem in 3D globalseismology with 1M earth model parameters and 630M forward unknowns, on upto 100K cores, leading to 3 orders of magnitude dimension reduction
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 46 / 49
References
A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas, A-Optimal design for infinite-dimensional Bayesian linear inverseproblems with regularized `0-sparsification, 2013.N. Petra, J. Martin, G. Stadler, and O. Ghattas, A computational framework for infinite-dimensional Bayesian inverseproblems. Part II: Stochastic Newton MCMC with application to ice sheet flow inverse problems, 2013.T. Bui-Thanh and O. Ghattas, A scalable MAP solver for Bayesian inverse problems with Besov priors, submitted.T. Bui-Thanh and O. Ghattas, A scaled stochastic Newton algorithm for Markov Chain Monte Carlo simulations,submitted.H. Sundar, G. Biros, C. Burstedde, J. Rudi, G. Stadler, Parallel geometric-algebraic multigrid on unstructured forests ofoctrees, submitted, Proceedings of IEEE/ACM SC12, 2012,T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox, Extreme-scale UQ for Bayesian inverseproblems governed by PDEs, Proceedings of IEEE/ACM SC12, 2012. (2012 Gordon Bell Prize Finalist)T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part II: Inverse mediumscattering of acoustic waves. Inverse Problems, 28(5):055002, 2012.T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part I: Inverse shape scatteringof acoustic waves. Inverse Problems, 28(5):055001, 2012.J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, Uncertainty quantification in inverse problems with stochasticNewton MCMC, SIAM Journal on Scientific Computing, 34(3):A1460-A1487, 2012.T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkin spectral element method forwave propagation, SIAM Journal on Numerical Analysis, 50(3):180-1826, 2012.T. Isaac, C. Burstedde, and O. Ghattas, Low-Cost Parallel Algorithms for 2:1 Octree Balance, Proceedings of IPDPS 12.H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, and O. Ghattas, Fast algorithms for Bayesianuncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAMJournal on Scientific Computing, 33(1):407432, 2011.C. Burstedde, L.C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for parallel adaptive mesh refinement on forestsof octrees, SIAM Journal on Scientific Computing, 33(3):11031133, 2011.T. Bui-Thanh, O. Ghattas, and D. Higdon, Adaptive Hessian-based non-stationary Gaussian process response surfacemethod for probability density approximation with application to Bayesian solution of large-scale inverse problems, SIAMJournal on Scientific Computing, 2011, submitted.L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkin method for wavepropagation through coupled elastic-acoustic media, Journal of Computational Physics, 229(24):93739396, 2010.C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, L.C. Wilcox, Extreme-Scale AMR,Proceedings of ACM/IEEE SC10, 2010.
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 47 / 49
Acknowledgements
Research program supported by:
NSF CMMI-1028889 (CDI), ARC-0941678 (CDI)
AFOSR grant FA9550-12-1-0484 (Computational Math)
DOE grants DE-SC0009286 (MMICCs), DE-SC0006656 (SciDAC),DE-FG02-08ER25860 (ASCR), DE-SC0002710 (SciDAC)
Resources on ORNL Jaguar Cray XT-5/XK-6 supercomputer providedthrough ALCC award at ORNL Leadership Computing Facility
Resources on TACC Lonestar, Longhorn, and Stampede systemsprovided through awards from TACC and XSEDE
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 48 / 49
Discussion questions
1 How can big data and big models be integrated to produce betterpredictive models?
2 What are promising new ideas for exploring high-dimensional space?
3 What are promising new ideas for quantifying uncertainties inmodeling and simulation?
4 How can we adapt/reinvent the important algorithms of CS&E sothey better map onto high-throughput accelerators? Onto systemswith massive numbers of cores?
5 How can we transform our universities and federal agencies to becomemore hospitable to cross-cutting research/education at the interfacesof science/engineering, mathematics, statistics, and computing?
ICES (UT-Austin) Large-Scale Bayesian Inverse Problems NSF Cyberbridges 2013 49 / 49
Background, motivation, and goalsLangevin MCMC methods and stochastic NewtonLow rank Hessian approximation and scalabilityExample: Full waveform global seismic inversion