9
Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California, Berkeley [email protected] University of California, Berkeley [email protected] Abstract Detecting weak seismic events from noisy sensors is a difficult perceptual task. We for- mulate this task as Bayesian inference and propose a generative model of seismic events and signals across a network of spatially dis- tributed stations. Our system, SIGVISA, is the first to directly model seismic waveforms, allowing it to incorporate a rich represen- tation of the physics underlying the signal generation process. We use Gaussian pro- cesses over wavelet parameters to predict de- tailed waveform fluctuations based on his- torical events, while degrading smoothly to simple parametric envelopes in regions with no historical seismicity. Evaluating on data from the western US, we recover three times as many events as previous work, and re- duce mean location errors by a factor of four while greatly increasing sensitivity to low- magnitude events. 1 Introduction The world contains structure: objects with dynamics governed by physical law. Intelligent systems must infer this structure from noisy, jumbled, and lossy sensory data. Bayesian statistics provides a natu- ral framework for designing such systems: given a prior distribution p(z) on underlying descriptions of the world, and a forward model p(x|z) describing the process by which observations are generated, math- ematical probability defines the posterior p(z|x) p(z)p(x|z) over worlds given observed data. Bayesian generative models have recently shown exciting results in applications ranging from visual scene understand- Proceedings of the 20 th International Conference on Artifi- cial Intelligence and Statistics (AISTATS) 2017, Fort Laud- erdale, Florida, USA. JMLR: W&CP volume 54. Copy- right 2017 by the author(s). ing (Kulkarni et al., 2015; Eslami et al., 2016) to in- ferring celestial bodies (Regier et al., 2015). In this paper we apply the Bayesian framework directly to a challenging perceptual task: monitoring seismic events from a network of spatially distributed sensors. This task is motivated by the Comprehensive Test Ban Treaty (CTBT), which bans testing of nuclear weapons and provides for the establishment of an International Monitoring System (IMS) to detect nuclear explosions, primarily from the seismic signals that they generate. The inadequacy of existing monitoring systems was cited as a factor in the US Senate’s 1999 decision not to ratify the treaty. Our system, SIGVISA (Signal-based Vertically Inte- grated Seismic Analysis), consists of a generative prob- ability model of seismic events and signals, with in- terpretable latent variables for physically meaningful quantities such as the arrival times and amplitudes of seismic phases. A previous system, NETVISA (Arora et al., 2013), assumed that signals had been prepro- cessed into discrete detections; we extend this by di- rectly modeling seismic waveforms. This allows our model to capture rich physical structure such as path- dependent modulation as well as predictable travel times and attenuations. Inference in our model recov- ers a posterior over event histories directly from wave- form traces, combining top-down with bottom-up pro- cessing to produce a joint interpretation of all observed data. In particular, Bayesian inference provides a principled approach to combining evidence from phase travel times and waveform correlations, a previously unsolved problem in seismology. A full description of our system is given by Moore (2016); this paper describes the core model and key points of training and inference algorithms. Evaluat- ing against existing systems for seismic monitoring, we show that SIGVISA significantly increases event recall at the same precision, detecting many additional low- magnitude events while reducing mean location error by a factor of four. Initial results indicate we also perform as well or better than existing systems at de- tecting events with no nearby historical seismicity.

Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

Signal-based Bayesian Seismic Monitoring

David A. Moore Stuart J. RussellUniversity of California, Berkeley

[email protected]

University of California, [email protected]

Abstract

Detecting weak seismic events from noisysensors is a difficult perceptual task. We for-mulate this task as Bayesian inference andpropose a generative model of seismic eventsand signals across a network of spatially dis-tributed stations. Our system, SIGVISA, isthe first to directly model seismic waveforms,allowing it to incorporate a rich represen-tation of the physics underlying the signalgeneration process. We use Gaussian pro-cesses over wavelet parameters to predict de-tailed waveform fluctuations based on his-torical events, while degrading smoothly tosimple parametric envelopes in regions withno historical seismicity. Evaluating on datafrom the western US, we recover three timesas many events as previous work, and re-duce mean location errors by a factor of fourwhile greatly increasing sensitivity to low-magnitude events.

1 Introduction

The world contains structure: objects with dynamicsgoverned by physical law. Intelligent systems mustinfer this structure from noisy, jumbled, and lossysensory data. Bayesian statistics provides a natu-ral framework for designing such systems: given aprior distribution p(z) on underlying descriptions ofthe world, and a forward model p(x|z) describing theprocess by which observations are generated, math-ematical probability defines the posterior p(z|x) ∝p(z)p(x|z) over worlds given observed data. Bayesiangenerative models have recently shown exciting resultsin applications ranging from visual scene understand-

Proceedings of the 20th International Conference on Artifi-cial Intelligence and Statistics (AISTATS) 2017, Fort Laud-erdale, Florida, USA. JMLR: W&CP volume 54. Copy-right 2017 by the author(s).

ing (Kulkarni et al., 2015; Eslami et al., 2016) to in-ferring celestial bodies (Regier et al., 2015).

In this paper we apply the Bayesian framework directlyto a challenging perceptual task: monitoring seismicevents from a network of spatially distributed sensors.This task is motivated by the Comprehensive Test BanTreaty (CTBT), which bans testing of nuclear weaponsand provides for the establishment of an InternationalMonitoring System (IMS) to detect nuclear explosions,primarily from the seismic signals that they generate.The inadequacy of existing monitoring systems wascited as a factor in the US Senate’s 1999 decision notto ratify the treaty.

Our system, SIGVISA (Signal-based Vertically Inte-grated Seismic Analysis), consists of a generative prob-ability model of seismic events and signals, with in-terpretable latent variables for physically meaningfulquantities such as the arrival times and amplitudes ofseismic phases. A previous system, NETVISA (Aroraet al., 2013), assumed that signals had been prepro-cessed into discrete detections; we extend this by di-rectly modeling seismic waveforms. This allows ourmodel to capture rich physical structure such as path-dependent modulation as well as predictable traveltimes and attenuations. Inference in our model recov-ers a posterior over event histories directly from wave-form traces, combining top-down with bottom-up pro-cessing to produce a joint interpretation of all observeddata. In particular, Bayesian inference provides aprincipled approach to combining evidence from phasetravel times and waveform correlations, a previouslyunsolved problem in seismology.

A full description of our system is given by Moore(2016); this paper describes the core model and keypoints of training and inference algorithms. Evaluat-ing against existing systems for seismic monitoring, weshow that SIGVISA significantly increases event recallat the same precision, detecting many additional low-magnitude events while reducing mean location errorby a factor of four. Initial results indicate we alsoperform as well or better than existing systems at de-tecting events with no nearby historical seismicity.

Page 2: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

Signal-based Bayesian Seismic Monitoring

events

detections

waveform signals

stationprocessing

NET-VISA

SIG-VISA

modelinference

modelinference

GA/ SEL3

network processing(classical approach)

Figure 1: High level structure of traditionaldetection-based monitoring (GA), Bayesian monitor-ing (NETVISA), and signal-based Bayesian monitor-ing (SIGVISA, this work). Compared to detection-based approaches, inference in a signal-based modelincorporates rich information from seismic waveforms.

Figure 2: Aligned seismic waveforms (station MDJ,amplitude-normalized and filtered to 0.8-4.5Hz) fromfive events at the North Korean nuclear test site, show-ing strong inter-event correlation. By modeling thisrepeated structure, new events at this site can be de-tected even if only observed by a single station.

2 Seismology background

We model seismic events as point sources, localizedin space and time, that release energy in the form ofseismic waves. These include compression and shear(P and S) waves that travel through the solid earth, aswell as surface waves such as Love and Rayleigh waves.Waves are further categorized into phases according tothe path traversed from the source to a detecting sta-tion; for example, we distinguish P waves propagatingdirectly upwards to the surface (Pg phases) from thesame waves following a path guided along the crust-mantle boundary (Pn phases), among other options.

Given an event’s location, it is possible to predict ar-rival times for each phase by considering the lengthand characteristics of the event–station path. Seis-mologists have developed a number of travel time mod-els, ranging from simple models that condition only onevent depth and event-station distance (Kennett andEngdahl, 1991), to more sophisticated models that usean event’s specific location to provide more accuratepredictions incorporating the local velocity structure(Simmons et al., 2012). Inverting the predictions of atravel-time model allows events to be located by tri-angulation given a set of arrival times.

Seismic stations record continuous ground motionalong one or more axes,1 including background noisefrom natural and human sources, as well as the sig-nals generated by arriving seismic phases. The de-tailed fluctuations in these signals are a function of thesource as well as a path-dependent transfer function,in which seismic energy is modulated and distortedby the geological characteristics of the event–stationpath. Since geology does not change much over time,events with similar locations and depths tend to gener-ate highly correlated waveforms (Figure 2); the length-scale at which such correlations are observed dependson the local geology and may range from hundreds ofmeters up to tens of kilometers.

2.1 Seismic monitoring systems

Traditional architectures for seismic monitoring oper-ate via bottom-up processing (Figure 1). The wave-form at each station is thresholded to produce a featurerepresentation consisting of discrete “detections” ofpossible phase arrivals. Network processing attemptsto associate detections from all stations into a coher-ent set of events. Potential complications include falsedetections caused by noise, and missed detections fromarrivals for which the station processing failed to trig-ger. In addition, the limited information (estimatedarrival time, amplitude, and azimuth) in a single de-tection means that an event typically requires detec-tions from at least three stations to be formed.

The NETVISA system (Arora et al., 2010, 2013) re-places heuristic network processing with Bayesian in-ference in a principled model of seismic events anddetections, including probabilities of false and missingdetections. Maximizing posterior probability in thismodel via hill-climbing search yields an event bulletinthat incorporates all available data accounting for un-certainty. Because NETVISA separates inference fromthe construction of an explicit domain model, domainexperts can improve system performance simply by re-fining the model. Compared to the IMS’s previous net-work processing system (Global Association, or GA),NETVISA provides significant improvements in loca-tion accuracy and a 60% reduction in missed events;as of this writing, it has been proposed by the UN asthe new production monitoring system for the CTBT.

Recently, new monitoring approaches have been pro-posed using the principle of waveform matching, whichexploits correlations between signals from nearbyevents to detect and locate new events by matchingincoming signals against a library of historical signals.These promise the ability to detect events up to anorder of magnitude below the threshold of a detection-

1This work considers vertical motion only.

Page 3: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

David A. Moore, Stuart J. Russell

Figure 3: Global prior on seismic event locations.

based system (Gibbons and Ringdal, 2006; Schaff andWaldhauser, 2010), and to locate such events evenfrom a single station (Schaff et al., 2012). However,adoption has been hampered by the inability to detectevents in locations with no historical seismicity, a cru-cial requirement for nuclear monitoring. In addition,it has not been clear how to quantify the reliability ofevents detected by waveform correlation, how to recon-cile correlation evidence from multiple stations, or howto combine correlation and detection-based methods ina principled way. Our work resolves these questions byshowing that both triangulation and waveform match-ing behaviors emerge naturally during inference in aunified generative model of seismic signals.

3 Modeling seismic waveforms

SIGVISA (our current work) extends NETVISA byincorporating waveforms directly in the generativemodel, eliminating the need for bottom-up detectionprocessing. Our model describes a joint distributionp(E,S) = p(S|E)p(E) on nE seismic events E and sig-nals S observed across nS stations. We model eventoccurrence as a time-homogenous Poisson process,

p(nE) = Poisson(λT ), p(E) = p(nE)nE !

nE∏i=1

p(ei),

so that the number of events generated during a timeperiod of length T is itself random, and each event issampled independently from a prior p(ei) over surfacelocation, depth, origin time, and magnitude. The ho-mogeneous process implies a uniform prior on origintimes, with a labeling symmetry that we correct bymultiplying by the permutation count nE !. Locationand depth priors, along with the event rate λ, are esti-mated from historical seismicity as described by Aroraet al. (2013). The location prior is a mixture of a ker-nel density estimate of historical events, and a uniformcomponent to allow explosions and other events in lo-cations with no previous seismicity (Figure 3).

We assume that signals at different stations are condi-tionally independent, given events, and introduce aux-

arrivaltime𝜏

amplitudeα

risetimeρ decayrates𝛾,β

(a) Parametric form g(t; θ) modeling the signal enve-lope of a single arriving phase. (eq. (2))

(b) Signal for an arriving phase, generated by multiply-ing the parametric envelope g (shaded) by a zero-meanmodulation signal m(t;w). (eq. (3))

(c) The final generated signal sj sums the contribu-tions of all arriving phases with an autoregressive back-ground noise process. (eqs. (4) and (5))

Figure 4: Steps of the SIGVISA forward model.

iliary variables θ and w governing signal generation ateach station, so that the forward model has the form

p(S|E) =

nS∏j=1

(∫∫p(sj |θj ,wj)p(θj |E)p(wj |E)dwjdθj

).

(1)The parameters θ describe an envelope shape for eacharriving phase, and w describes repeatable modulationprocesses that multiply these envelopes to generate ob-served waveforms (Figure 4). In particular, we de-compose these into independent (conditioned on eventlocations) components θi,j,k and wi,j,k describing thearrival of phase k of event i at station j.

The envelope of each phase is modeled by a linear onsetfollowed by a poly-exponential decay (Figure 4a),

g(t; θi,j,k) =

0 if t ≤ τα(t− τ)/ρ if τ < t ≤ τ + ρα(t− τ + 1)−γe−β(t−τ) otherwise

(2)with parameters θi,j,k = (τ, ρ, α, γ, β)i,j,k consisting ofan arrival time τ , rise time ρ, amplitude α, and de-cay rates γ and β governing respectively the envelopepeak and its coda, or long-run decay. This decay form

Page 4: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

Signal-based Bayesian Seismic Monitoring

(a) Co-located sources. (b) Adding a distant source.

Figure 5: Samples from a GP prior on db4 wavelets.

is inspired by previous work modeling seismic coda(Mayeda et al., 2003), while the linear onset followsthat used by Cua (2005) for seismic early warning.

To produce a signal, the envelope is multiplied by amodulation process m (Figure 4b), parameterized bywavelet coefficients wi,j,k so that

m(t; wi,j,k) =

{(Dwi,j,k)(t) if 0 ≤ t < 20sε(t) otherwise

(3)

where D is a discrete wavelet transform matrix, andε(t) ∼ N (0, 1) is a Gaussian white noise process.We explicitly represent coefficients describing the first20 seconds2 of each arrival, and model the modula-tion as random after that point. We use an order-4Daubechies wavelet basis (Daubechies, 1992), so thatfor 10Hz signals each wi,j,k is a vector of 220 coeffi-cients. As described below, we model w jointly acrossevents using a Gaussian process, so that our modula-tion processes are repeatable: events in nearby loca-tions will generate correlated signals (Figure 5).

Summing the signals from all arriving phases yieldsthe predicted signal sj ,

sj(t) =∑i,k

g(t; θi,j,k) ·m(t− τi,j,k; wi,j,k); (4)

and we generate the observed signal sj (Figure 4c) byadding an order-R autoregressive noise process,

p(sj |θj ,wj) = pAR(sj − sj − µj ;σ2j , φj) (5)

pAR(z;σ2, φ) =

T∏t=1

N

(z(t);

R∑r=1

φrz(t− r), σ2

).

We adapt the noise process online during inference, fol-lowing station-specific priors on the mean p(µj), vari-ance p(σ2

j ), and autoregressive coefficients p(φj).

3.1 Repeatable signal descriptions

For each station j and phase k, we model the signal de-scriptions p(θj,k|E) and p(wj,k|E) jointly across events

2This cutoff was chosen to capture repeatability of theinitial arrival period, which is typically the most clearlyobserved, while still fitting historical models in memory.

Parameter Features φ(e)Arrival time (τ) n/aAmplitude (log α)

(1,∆, sin( ∆

15000 ), cos( ∆15000 )

)Onset (log ρ) (1, mb)Peak decay (log γ) (1, mb, ∆)Coda decay (log β) (1, mb, ∆)Wavelet coefs (w) n/a

Table 1: Feature representations in terms of magni-tude mb and event–station distance ∆ (km).

as Gaussian process transformations of the event spaceE (Rasmussen and Williams, 2006), so that events innearby locations will tend to generate similar envelopeshapes and correlated modulation signals, allowing oursystem to detect and locate repeated events even froma weak signal recorded at a single station. Since theevents are unobserved, this is effectively a GP latentvariable model (Lawrence, 2004), with the twist thatin our model the outputs θ,w are themselves latentvariables, observed only indirectly through the signalmodel p(S|θ,w).

We borrow models from geophysics to predict the ar-rival time and amplitude of each phase: given theorigin time, depth, and event-station distance, theIASPEI-91 (Kennett and Engdahl, 1991) travel-timemodel predicts an arrival time τ , and the Brune sourcemodel (Brune, 1970) predicts a source (log) amplitudelog α as a function of event magnitude. We then useGPs to model deviations from these physics-based pre-dictions; that is, we take τ and log α as the mean func-tions for GPs modeling τ and logα respectively. Theremaining shape parameters ρ, γ, β (in log space) andthe 220 wavelet coefficients in wi,j,k are each modeledby independent zero-mean GPs.

All of our GPs share a common covariance form,

k(e, e′) = φ(e)TBφ(e′) + σ2fkMatern(d`(e, e

′)) + σ2nδ,

where the first term models an unknown function thatis linear in some feature representation φ, chosen sep-arately for each parameter (Table 1). This representsgeneral regularities such as distance decay that we ex-pect to hold even in regions with no observed train-ing data. For efficiency and to avoid degenerate co-variances we represent this component in weight space(Rasmussen and Williams, 2006, section 2.7); at testtime we choose B to be the posterior covariance giventraining data. The second term models an unknowndifferentiable function using a stationary Matern (ν =3/2) kernel (Rasmussen and Williams, 2006, Chap-ter 4), with great-circle distance metric controlled bylengthscale hyperparameters `; this allows our modelto represent detailed local seismic structure. We alsoinclude iid noise σ2

nδ to encode unmodeled variation

Page 5: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

David A. Moore, Stuart J. Russell

between events in the same location.

To enable efficient training and test-time GP predic-tions, we partition the training data into spatially lo-cal regions using k-means clustering, and factor thenonparametric (Matern) component into independentregional models. This allows predictions in each re-gion to be made efficiently using only a small numberof nearby events, avoiding a naıve O(n2) dependenceon the entire training set. Each region is given sep-arate hyperparameters, allowing our models to adaptto spatially varying seismicity.

3.2 Collapsed signal model

The explicit model described thus far exhibits tightcoupling between envelope parameters and modula-tion coefficients; a small shift in a phase’s arrival timemay significantly change the wavelets needed to ex-plain an observed signal, causing inference moves thatdo not account for this joint structure to fail. Fortu-nately, it is possible to exactly marginalize out thecoefficients w so that they do not need to be rep-resented in inference. This follows from the linearGaussian structure of our signal model: GPs inducea Gaussian distribution on wavelet coefficients, whichare observed under linear projection (a wavelet trans-form followed by envelope scaling) with autoregressivebackground noise. Thus, the collapsed distributionp(sj |θj ,E) =

∫p(sj |θj ,wj)p(wj |E)dwj is multivariate

Gaussian, and in principle can be evaluated directly.

Doing this efficiently in practice requires exploitinggraphical model structure. Specifically, we formulatethe signal model at each station as a linear Gaussianstate space model, with a state vector that tracks theAR noise process as well as the set of wavelet coef-ficients that actively contribute to the signal. Dueto the recursive structure of wavelet bases, the num-ber of such coefficients is only logarithmic at eachtimestep. We exploit this structure, the same used byfast wavelet transforms, to efficiently3 compute coef-ficient posteriors and marginal likelihoods by Kalmanfiltering (Grewal and Andrews, 2014).

We assume for efficiency that test-time signals fromdifferent events are independent given the trainingdata. During training, however, it is necessary tocompute the joint density of signals involving multipleevents so that we can find correct alignments. This re-quires us to pass messages fj(wj) = p(sj |wj , θj) fromeach signal upwards to the GP prior (Koller and Fried-

3Requiring time linear in the signal length:O(T (K logC + R)2) for a signal of length T withorder-R AR noise and at most K simultaneous phasearrivals, each described by C wavelet coefficients.

man, 2009). We compute a diagonal approximation

fj(wj) =1

Zj

220∏c=1

N (wj,c; νj,c, ξj,c)

by dividing the Kalman filtering posterior on waveletcoefficients by the (diagonal Gaussian) prior. Theproduct of these messages with the GP priorsp(wc|E) ∼ N (µc(E),Kc(E)), integrated over coeffi-cients w, gives an approximate joint density

p(S|θ,E) ≈

N∏j=1

1

Zj

∏c

N(νc;µc(E),Kc(E) + ξc

),

(6)in which the wavelet GPs are evaluated at the valuesof (and with added variance given by) the approxi-mate messages from observed signals. We target thisapproximate joint density during training, and con-dition our test-time models on the upwards messagesgenerated by training signals.

4 Training

We train using a bulletin of historical event locationswhich we take as ground truth (relaxing this assump-tion to incorporate noisy training locations, or evenfully unsupervised training, is important future work).Given observed events, we use training signals to esti-mate the GP models—hyperparameters as well as theupwards messages from training signals—along withpriors p(µj), p(σ

2j ), and p(φj) on background noise

processes at each station. We use the EM algorithm(Dempster et al., 1977) to search for maximum likeli-hood parameters integrating over the unobserved noiseprocesses and envelope shapes.

The E step runs MCMC inference (Section 5) tosample from the posterior over the latent variablesθ, µ, σ2, φ under the collapsed objective describedabove. We approximate the sampled posterior on θby univariate Gaussians to compute approximate up-wards messages. These are used in the M step to fitGP hyperparameters via gradient-based optimizationof the marginal likelihood (6). The priors on back-ground noise means p(µj) and coefficients p(φj) are fitas Gaussian and multivariate Gaussian respectively.For the noise variance σ2

j at each station, we fit log-normal, inverse Gamma, and truncated Gaussian pri-ors and select the most likely; this adapts for differentnoise distributions between stations.

5 Inference

We perform inference using reversible jump MCMC(Hastie and Green, 2012) applied to the collapsed

Page 6: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

Signal-based Bayesian Seismic Monitoring

model. Our algorithm consists of a cyclic sweep ofsingle-site, random-walk Metropolis-Hastings movesover all currently instantiated envelope parameters θ,autoregressive noise parameters µ, σ2, φ at each sta-tion, and event descriptions (ei)

nEi=1 including surface

location, depth, time, and magnitude. We also includecustom moves that propose swapping the associationsof consecutive arrivals, aligning observed signals withGP predicted signals, and shifting envelope peak timesto match those of the observed signal.

To improve event mixing, we augment our model to in-clude unassociated arrivals: phase arrivals not gener-ated by any particular event, with envelope parametersand modulation from a fixed Gaussian prior. Unasso-ciated arrivals are useful in allowing events to be builtand destroyed piecewise, so that we are not requiredto perfectly propose an event and all of its phases in asingle shot. They can also be viewed as small eventswhose locations have been integrated out. Our birthproposal generates unassociated arrivals with proba-bility proportional to the signal envelope, so that pe-riods of high signal energy are quickly explained byunassociated arrivals which may then be associatedinto larger events.4

Event birth moves are constructed using two comple-mentary proposals. The first is based on a Houghtransform of unassociated arrivals; it grids the 5Devent space (longitude, latitude, depth, time, magni-tude) and scores each bin using the log likelihood ofarrivals greedily associated with an event in that bin.The second proposal is a mixture of Gaussians cen-tered at the training events, with weights determinedby waveform correlations against test signals. Thisallows us to recover weak events that correlate withtraining signals, while the Hough proposal can con-struct events in regions with no previous seismicity.

In proposing a new event we must also propose en-velope parameters for all of its phases. Each phasemay associate a currently unassociated arrival; wherethere are no plausible arrivals to associate we parent-sample envelope parameters given the proposed event,then run auxiliary Metropolis-Hastings steps (Storvik,2011) to adapt the envelopes to observed signals. Theproposed event, associations, and envelope parametersare jointly accepted or rejected by a final MH step.Event death moves similarly involve jointly proposingan event to kill along with a set of phase arrivals todelete (with the remainder preserved as unassociated).

4In this sense the unassociated arrivals play a role sim-ilar to detections produced by traditional station process-ing. However, they are not generated by bottom-up pre-processing, but as part of a dynamic inference procedurethat may create and destroy them using information fromobserved signals as well as top-down event hypotheses.

Figure 6: Training events (blue dots) from the westernUS dataset, with region of interest outlined. Trianglesindicate IMS stations.

Figure 7: Precision-recall performance over the two-week test period, relative to the reference bulletin.

By chaining birth and death moves we also constructmode-jumping moves that repropose an existing event,along with split and merge moves that replace two ex-isting events with a single one, and vice versa. Moore(2016) describes our inference moves in more detail.

6 Evaluation

We consider the task of monitoring seismic events inthe western United States, which contains both sig-nificant natural seismicity and regular mining explo-sions. We focus in particular on the time period im-mediately following the magnitude 6.0 earthquake nearWells, NV, on February 21, 2008, which generated alarge number of aftershocks. We train on one year ofhistorical data (Figure 6), from January to December2008; to enable SIGVISA to recognize aftershocks us-ing waveform correlation, we also train on the first sixhours following the Wells mainshock. The test periodis two weeks long, beginning twelve hours after theWells mainshock; the six hours immediately preceding

Page 7: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

David A. Moore, Stuart J. Russell

Figure 8: Event recall by magnitude range. TheSIGVISA (top) bulletin is defined to match the preci-sion of SEL3 (51%).

were used as a validation set.

We compare SIGVISA’s performance to that of ex-isting systems that also process data from the Inter-national Monitoring System. SEL3 is the final-stageautomated bulletin from the CTBTO’s existing sys-tem (GA); it is reviewed by a team of human analyststo produce the Late Event Bulletin (LEB) reportedto the member states. We also compare to the auto-mated NETVISA bulletin(Arora et al., 2013), whichimplements detection-based Bayesian monitoring.

We construct a reference bulletin by combining eventsfrom regional networks aggregated by the NationalEarthquake Information Center (NEIC), and an anal-ysis of aftershocks from the Wells earthquake based ondata from the transportable US Array and temporaryinstruments deployed by the University of Nevada,Reno (UNR) (Smith et al., 2011). Because the refer-ence bulletin has access to many sensors not includedin the IMS, it is a plausible source of “ground truth”to evaluate the IMS-based systems.

We trained two sets of SIGVISA models, on broad-band (0.8-4.5Hz) signals as well as a higher-frequencyband (2.0-4.5Hz) intended to provide clearer evidenceof regional events, using events observed from the ref-erence bulletin. To produce a test bulletin, we ranthree MCMC chains on broadband signals and two onhigh-frequency signals, and merged the results usinga greedy heuristic that iteratively selects the highest-scoring event from any chain excluding duplicates.Each individual chain was parallelized by dividing thetest period into 168 two-hour blocks and running in-dependent inference on each block of signals. OverallSIGVISA inference used 840 cores for 48 hours.

We evaluate each system by computing a minimumweight maximum cardinality matching between the in-

ferred and reference bulletins, in a bipartite graph withedges weighted by distance and restricted to eventsseparated by at most 2◦ in distance and 50s in time.Using this matching, we report precision (the percent-age of inferred events that are real), recall (the per-centage of real events detected by each system), andmean location error of matched events. For NETVISAand SIGVISA, which attach a confidence score to eachevent, we report a precision-recall curve parameterizedby the confidence threshold (Figure 7).

Our results show that the merged SIGVISA bulletindominates both NETVISA and SEL3. When operat-ing at the same precision as SEL3 (51%), SIGVISAachieves recall three times higher than SEL3 (19.3%vs 6.4%), also eclipsing the 7.3% recall achieved byNETVISA at a slightly higher precision (54.7%). Thehuman-reviewed LEB achieves near-perfect precisionbut only 9% recall, confirming that many events re-covered by SIGVISA are not obvious to human ana-lysts. The most sensitive SIGVISA bulletin recovers afull 33% of the reference events, at the cost of manymore false events (14% precision).

Signal-based modeling particularly improves recall forlow-magnitude events (Figure 8) for which bottom-upprocessing may not register detections. We also ob-serve improved locations (Figure 9) for clusters suchas the Wells aftershock sequence where observed wave-forms can be matched against training data.

Interestingly, because we do not have access to abso-lute ground truth, some events labeled as false in ourevaluation may actually be genuine events missed bythe reference bulletin. Figure 10 shows two candidatesfor such events, with strong correspondence betweenthe model-predicted and observed waveforms. The ex-istence of such events provides reason to believe thatSIGVISA’s true performance on this dataset is mod-estly higher than our evaluation suggests.

6.1 de novo events

For nuclear monitoring it is particularly important todetect de novo events: those with no nearby historicalseismicity. We define “nearby” as within 50km. Ourtwo-week test period includes only three such events,so we broaden the scope to the three-month periodof January through March 31, 2008, which includes24 de novo events. We evaluated each system’s recallspecifically on this set: of the 24 de novo referenceevents, how many were detected?

As shown in Figure 11, SIGVISA’s performancematches or exceeds the other systems. Operatingat the same precision as SEL3, it detects the samenumber (6/24) of de novo events. This suggeststhat SIGVISA’s improved performance on repeated

Page 8: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

Signal-based Bayesian Seismic Monitoring

(a) NETVISA (139 events) (b) SIGVISA top events (393 events) (c) Distribution of location errors.

Figure 9: Inferred events (green), with inset close-up of Wells aftershocks. Reference events are in blue.

(a) Likely mining explosion at Black Thunder Mine.Location 105.21◦ W, 43.75◦ N, depth 1.9km, origintime 17:15:58 UTC, 2008-02-27, mb 2.6, recorded atPDAR (PD31).

(b) Event near Cloverdale, CA along the Rodgers Creekfault. Location 122.79◦ W, 38.80◦ N, depth 1.6km,origin time 05:20:56 UTC, 2008-02-29, mb 2.6, recordedat NVAR (NV01).

Figure 10: Waveform evidence for two events detectedby SIGVISA but not the reference bulletin. Green in-dicates the model predicted signal (shaded ±2σ); blackis the observed signal (filtered 0.8-4.5Hz).

events—including almost all of the natural seismicityin the western US during our two-week test period—does not come at a cost for de novo events. To thecontrary, the full high-sensitivity SIGVISA bulletinincludes six genuine events missed by all other IMS-based systems.

7 Discussion

Our results demonstrate the promise of Bayesian infer-ence on raw waveforms for monitoring seismic events.Applying MCMC inference to a generative probabilitymodel of repeatable seismic signals, we recover up tothree times as many events as a detection-based base-line (SEL3) while operating at the same precision, andreduce mean location errors by a factor of four whilegreatly increasing sensitivity to low-magnitude events.Our system maintains effective performance even forevents in regions with no historical seismicity in thetraining set.

Figure 11: Recall for 24 de novo events between Jan-uary and March 2008.

A major advantage of the generative formulation isthat the explicit model is interpretable by domain ex-perts. We continue to engage with seismologists onpotential model improvements, including tomographictravel-time models, directional information from seis-mic arrays and horizontal ground motion, and explicitmodeling of earthquake versus explosion sources. Ad-ditional directions include more precise investigationsof our model’s ability to quantify uncertainty and toestimate its own detection limits as a function of net-work coverage and historical seismicity. We also ex-pect to continue scaling to global seismic data, ex-ploiting parallelism and refining our inference movesand implementation. More generally, we hope thatsuccessful application of complex Bayesian models willinspire advances in probabilistic programming systemsto make generative modeling accessible to a wider sci-entific audience.

Acknowledgements

We are very grateful to Steve Myers (LLNL) and KevinMayeda (Berkeley/AFTAC) for sharing their expertiseon seismic modeling and monitoring and advising onthe experimental setup. This work is supported by theDefense Threat Research Agency (DTRA) under grant#HDTRA-1111-0026, and experiments by an Azurefor Research grant from Microsoft.

Page 9: Signal-based Bayesian Seismic Monitoringrussell/papers/aistats... · 2017-03-17 · Signal-based Bayesian Seismic Monitoring David A. Moore Stuart J. Russell University of California,

David A. Moore, Stuart J. Russell

References

Arora, N., Russell, S., and Sudderth, E. (2013). Net-visa: Network Processing Vertically Integrated Seis-mic Analysis. Bulletin of the Seismological Societyof America, 103(2A):709–729.

Arora, N., Russell, S. J., Kidwell, P., and Sudderth,E. B. (2010). Global seismic monitoring as proba-bilistic inference. In Advances in Neural InformationProcessing Systems, pages 73–81.

Brune, J. N. (1970). Tectonic stress and the spectraof seismic shear waves from earthquakes. Journal ofGeophysical Research, 75(26):4997–5009.

Cua, G. B. (2005). Creating the Virtual Seismologist:Developments in Ground Motion Characterizationand Seismic Early Warning. PhD thesis, CaliforniaInstitute of Technology.

Daubechies, I. (1992). Ten lectures on wavelets. InRegional conference Series in Applied Mathematics(SIAM), Philadelphia.

Dempster, A. P., Laird, N. M., and Rubin, D. B.(1977). Maximum likelihood from incomplete datavia the EM algorithm. Journal of the royal statisti-cal society. Series B (methodological), pages 1–38.

Eslami, S., Heess, N., Weber, T., Tassa, Y.,Kavukcuoglu, K., and Hinton, G. E. (2016). Attend,infer, repeat: Fast scene understanding with gener-ative models. arXiv preprint arXiv:1603.08575.

Gibbons, S. J. and Ringdal, F. (2006). The detectionof low magnitude seismic events using array-basedwaveform correlation. Geophysical Journal Interna-tional, 165(1):149–166.

Grewal, M. S. and Andrews, A. P. (2014). Kalman Fil-tering: Theory and Practice with MATLAB. JohnWiley & Sons.

Hastie, D. I. and Green, P. J. (2012). Model choiceusing reversible jump Markov chain Monte Carlo.Statistica Neerlandica, 66(3):309–338.

Kennett, B. and Engdahl, E. (1991). Traveltimes forglobal earthquake location and phase identification.Geophysical Journal International, 105(2):429–465.

Koller, D. and Friedman, N. (2009). ProbabilisticGraphical Models: Principles and Techniques. MITPress.

Kulkarni, T. D., Kohli, P., Tenenbaum, J. B., andMansinghka, V. (2015). Picture: A probabilisticprogramming language for scene perception. In Pro-ceedings of the IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR), pages 4390–4399.

Lawrence, N. D. (2004). Gaussian process latent vari-able models for visualisation of high dimensional

data. In Advances in Neural Information ProcessingSystems (NIPS).

Mayeda, K., Hofstetter, A., O’Boyle, J. L., and Wal-ter, W. R. (2003). Stable and transportable re-gional magnitudes based on coda-derived moment-rate spectra. Bulletin of the Seismological Society ofAmerica, 93(1):224–239.

Moore, D. A. (2016). Signal-based Bayesian SeismicMonitoring. PhD thesis, University of California,Berkeley.

Rasmussen, C. and Williams, C. (2006). Gaussian Pro-cesses for Machine Learning. MIT Press.

Regier, J., Miller, A., McAuliffe, J., Adams, R., Hoff-man, M., Lang, D., Schlegel, D., and Prabhat, M.(2015). Celeste: Variational inference for a genera-tive model of astronomical images. In Proceedingsof the 32nd International Conference on MachineLearning.

Schaff, D. P., Kim, W.-Y., and Richards, P. G. (2012).Seismological constraints on proposed low-yield nu-clear testing in particular regions and time peri-ods in the past. Science and Global Security, 20(2-3):155–171.

Schaff, D. P. and Waldhauser, F. (2010). One magni-tude unit reduction in detection threshold by crosscorrelation applied to Parkfield (California) andChina seismicity. Bulletin of the Seismological Soci-ety of America, 100(6):3224–3238.

Simmons, N. A., Myers, S. C., Johannesson, G., andMatzel, E. (2012). Llnl-g3dv3: Global P wave to-mography model for improved regional and teleseis-mic travel time prediction. Journal of GeophysicalResearch: Solid Earth, 117(B10).

Smith, K., Pechmann, J., Meremonte, M., andPankow, K. (2011). Preliminary analysis of themw 6.0 wells, nevada, earthquake sequence. NevadaBureau of Mines and Geology Special Publication,36:127–145.

Storvik, G. (2011). On the flexibility of Metropolis–Hastings acceptance probabilities in auxiliary vari-able proposal generation. Scandinavian Journal ofStatistics, 38(2):342–358.