Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Clara Yoon1, Ossian O’Reilly1, Karianne Bergen2, Gregory Beroza1
Department of Geophysics1, Institute for Computational & Mathematical Engineering2, Stanford University([email protected])
Computationally E�cient Earthquake Detection in Continuous Waveform Data
1. Introduction
2. FAST Method: Single Channel
4. FAST Detection Results3. Network Detection
5. Summary and Future Work
Detection Sensitivity
Com
puta
tiona
l E�
cien
cyTemplate Matching
New approach: FAST
STA/LTA
General Applicability
Autocorrelation
Figure 1: Comparison of earthquake detection methods in terms of 3 qualitative metrics: 1) Detection sensitivity,2) General applicability,3) Computational e�ciency.FAST scores high on all 3 metrics, while other detection methods score high on only 2 out of 3.
New approach to earthquake detection: • Apply “big data” methods to observational seismology• Adapt efficient search algorithm to find similar audio clips [1] to detect similar waveforms in continuous seismic data
New earthquake detection algorithm: Fingerprint and Similarity Thresholding (FAST)1) Sensitive: waveform correlation, over network of stations2) General: �nds seismic signals from unknown sources3) E�cient: fast, scalable to years of continuous data
Potential applications: �nd unknown seismic events• Reduce catalog completeness magnitudes• Identify small repeating earthquakes• Find low SNR, non-impulsive events• Monitor during seismically active periods - Foreshocks, aftershocks, swarms• Find events in sparse seismic networks - Induced seismicity
Motivation: • Improve earthquake detection and monitoring• Find more low-magnitude events in very large continuous data sets
Identify songsFind duplicate
web pagesSearch for
copyright content
A. Feature Extraction
Figure 2: Feature extraction computes binary �ngerprints, which are compact proxies for the original waveforms.
Spectral Images (short
spectrogram windows)
Haar Wavelet Transform
(for fast data compression)
Top Deviation Wavelet
Coe�cients(extract only key
discriminative features)
Fingerprints (sparse, binary)
Freq
uenc
y (H
z)
Time (s)
log10(|Spectrogram|)
500 1000 1500 2000 2500 3000 35000
5
10
−5
0
5
0 500 1000 1500 2000 2500 3000 3500−400−200
0200400
Time (s)
Ampl
itude
Time (s)
Freq
uenc
y (H
z)
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
Time (s)
Freq
uenc
y (H
z)
0 2 4 6 8 100
2
4
6
8
10
−5
0
5
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
0 20 40 600
5
10
15
20
25
30
−5
0
5
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
0 20 40 600
5
10
15
20
25
30
−5
0
5
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
0 20 40 600
5
10
15
20
25
30
−1
0
1
wavelet transform x index
wav
elet
tran
sfor
m y
inde
x
0 20 40 600
5
10
15
20
25
30
−1
0
1
fingerprint x index
finge
rprin
t y in
dex
0 20 40 600
5
10
15
20
25
30
0
1
fingerprint x index
finge
rprin
t y in
dex
0 20 40 600
5
10
15
20
25
30
0
1
Continuous Time Series
Seismic Data
Spectrogram
fingerprint x index
finge
rprin
t y in
dex
Similar fingerprint pair
0 16 32 48 64 0
16
32
48
64
both 0
one 1
both 1
0 5 10−0.5
0
0.5
Time (s)
Ampl
itude
Similar waveform pair
start 1266.95 sstart 1629 s
MHS subset match?
Yes
No
Yes
155
64
231
35
110
21
155
64
207
35
110
21
Table 1
Table 2
Table 3
A h(A) B h(B) Database
A B
AB
A B
B. Database GenerationWaveforms: Correlation coe�cient Fingerprints: Jaccard similarity
CC = 0.9808
J = 0.7544
Figure 3: (Left) Correlation coe�cient CC measures how similar two waveforms a and b are. (Right) Jaccard similarity J(A,B) measures how similar two binary �ngerprints A and B are.
• Locality-sensitive hash (LSH) functions [2] group highly similar fingerprints together with high probability in the database• Min-Hash [3] reduces fingerprint dimensionality, while preserving Jaccard similarity in probabilistic way, to short integer arrays: Min-Hash Signature (MHS)
Figure 4: LSH example: how to group 2 similar �ngerprints A and B in the database. The MHS has 6 integers. Each hash table (red box) gets a di�erent MHS subset; A and B enter same group (oval) in Tables 1 and 3, where their MHS subsets match. However, in Table 2, the MHS subsets are not equal, so A and B enter di�erent groups.
Hash Table b=3Hash Table 1 Hash Table 2
FAST
sim
ilarit
y 1
2/3
1/3
0
• •
• •
• • •
B
A
( , )
( , ) ( , )
( , )
C. Similarity Search
Figure 5: (A) Example database generated by LSH, with 3 hash tables (red boxes). Each table has many groups (ovals); similar earthquake waveforms (colored) are likely to be in the same group, while noise waveforms (black) are in other groups. (B) Search for waveforms in database similar to query waveform (blue). First, LSH determines the group in each table to which the query waveform belongs. Then we collect all other database waveforms in these groups, form pairs of (query, database) waveforms, and compute their FAST similarity. We ignore all other groups in the database, so the query time is near-constant, and scalable for large data sets.
FAST similarity: Fraction of hash tables with
�ngerprint pair in same group
Note: We show waveforms for easy visualization, but we actually store references to �ngerprints in the groups. The “groups” are technically hash buckets.
Figure 6: Similarity search output for single channel of data, as a sparse similarity matrix: we use every possible waveform in the data as a search query, with near-linear runtime. Each square represents a pair of �ngerprints at two di�erent times. Black squares indicate high FAST similarity, where we �nd highly similar waveforms.
Example: One Pair of WaveformsStation Time 1 Time 2 FAST Similarity
CCOB.EHZ
CADB.EHZ
CAO.EHZ
CHR.EHZ
CML.EHZ
0.31
0.03
0.02
0.40
0.12
Network Similarity = 0.88
+
Why detect over a distributed network of stations?• Detect more low-magnitude events, as shown by template matching studies [4]• Fewer false positive detections: coherent signal at multiple stations more likely to be earthquake, not local noise• Need to detect on at least 4 stations to locate earthquake
Figure 7: Example network similarity matrix calculation (one element) for one pair of similar earthquake waveforms from two di�erent times, at 5 stations. FAST similarity values from each station coherently add. We apply a detection threshold on the network similarity matrix.
Method: Network Similarity Matrix• Sum each single-channel similarity matrix, from different stations
122˚W
122˚W
121.9˚W
121.9˚W
121.8˚W
121.8˚W
121.7˚W
121.7˚W
121.6˚W
121.6˚W
121.5˚W
121.5˚W
121.4˚W
121.4˚W
121.3˚W
121.3˚W
37˚N 37˚N
37.1˚N 37.1˚N
37.2˚N 37.2˚N
37.3˚N 37.3˚N
37.4˚N 37.4˚N
37.5˚N 37.5˚N
37.6˚N 37.6˚N
122˚W
122˚W
121.9˚W
121.9˚W
121.8˚W
121.8˚W
121.7˚W
121.7˚W
121.6˚W
121.6˚W
121.5˚W
121.5˚W
121.4˚W
121.4˚W
121.3˚W
121.3˚W
37˚N 37˚N
37.1˚N 37.1˚N
37.2˚N 37.2˚N
37.3˚N 37.3˚N
37.4˚N 37.4˚N
37.5˚N 37.5˚N
37.6˚N 37.6˚N
San Jose
CCOB
CADB
CAOCHR
CML
Calaveras Fault
NCSN StationsMainshock Mw 4.1Catalog EarthquakesCities0 10
km N
0 10 20
1
2
3
4
5
6
7
CCOB.EHE
CCOB.EHN
CCOB.EHZ
CADB.EHZ
CAO.EHZ
CHR.EHZ
CML.EHZ
Time (s), start = 1733
Trac
e num
ber
2011−01−08 network similarity = 1.14
0 10 20
1
2
3
4
5
6
7
CCOB.EHE
CCOB.EHN
CCOB.EHZ
CADB.EHZ
CAO.EHZ
CHR.EHZ
CML.EHZ
Time (s), start = 70510
Trac
e nu
mbe
r
2011−01−08 network similarity = 0.58
121.8˚W
121.8˚W
121.7˚W
121.7˚W
121.6˚W
121.6˚W
121.5˚W
121.5˚W
37.1˚N 37.1˚N
37.2˚N 37.2˚N
37.3˚N 37.3˚N
37.4˚N 37.4˚N
121.8˚W
121.8˚W
121.7˚W
121.7˚W
121.6˚W
121.6˚W
121.5˚W
121.5˚W
37.1˚N 37.1˚N
37.2˚N 37.2˚N
37.3˚N 37.3˚N
37.4˚N 37.4˚N
CCOB.EHN
Calaveras Fault
NCSN StationsMainshock Mw 4.1Catalog Earthquakes (Detected)Catalog Earthquakes (Missed)
0 10
km
N
0 10 20
553.95
616.74
792.45
993.81
1264.18
1626.34
1786.78
4859.12
8212.12
22925.43
51805.57
150967.85
152038.98
153018.91
157526.70
161549.05
166401.85
174144.37
175332.16
395178.84
583296.05
Time (s)
Cat
alog
eve
nt ti
me
in c
ontin
uous
dat
a (s
)
Catalog events
0 10 20
826 1156 1335 1726 1806 8288
10317 70510 90729152081152110153061159891191275218909237276245266282208314782377222377458377588
Time (s)
FAST
det
ectio
n tim
e in
con
tinuo
us d
ata
(s)
FAST new events,also in autocorrelation
0 10 20
377757
378051
378137
379059
380207
395075
411557
429893
442362
444559
444715
446071
446371
446430
447459
480125
480714
489761
524185
537113
537379
Time (s)
FAST new events,also in autocorrelation
0 10 20
7790 11296 63713136263138966176074188987189017322949352902403900427201429006444646489645489675489696489737489805489944490170504882506661519785524516
Time (s)
FAST new events,not in autocorrelation
0 10 20
73919
81227
176032
256800
263895
323884
324483
411585
432734
489334
542189
577120
Time (s)
FAST
det
ectio
n tim
e in
con
tinuo
us d
ata
(s)
False detections
0 10 20
314076.86
336727.14
361735.92
Time (s)
Cata
log
even
t tim
e (s
)
Missed catalog events
0 10 20
51448.55
55724.95
57585.65
245004.15
298128.35
329703.75
329780.85
329855.75
331733.55
332528.45
332918.65
340640.25
396395.25
442172.55
449503.35
452129.25
510464.25
560483.95
571979.45
Time (s)
Auto
corr
elat
ion
dete
ctio
n tim
e in
con
tinuo
us d
ata
(s)
Autocorrelation new eventsmissed by FAST
0 10 20
1
2
3
4
5
6
7
CCOB.EHE
CCOB.EHN
CCOB.EHZ
CADB.EHZ
CAO.EHZ
CHR.EHZ
CML.EHZ
Time (s), start = 11295
Trac
e nu
mbe
r
2011−01−08 network similarity = 0.53
Autocorrelation FAST 0
20
40
60
80
100
Num
ber o
f det
ecte
d ev
ents
Detection Performance
86 89earthquakes earthquakes
Autocorrelation FAST 0
10
20
30
40
Num
ber
of d
etec
ted
even
ts
Detection Performance
37 39earthquakes earthquakes
−2000
200 CCOB.EHE
−2000
200 CCOB.EHN
−2000
200 CCOB.EHZ
−2000
200 CADB.EHZ
−2000
200 CAO.EHZ
−2000
200 CHR.EHZ
0 3 6 9 12 15 18 21 24−2000
200 CML.EHZ
Time (hr)
Autocorrelation FAST 0 4 812162024283236
Run
time
(hr)
Runtime Performance, 1 processor
31 hours25 minutes
46 minutes
1 day
Autocorrelation FAST 0
2
4
6
8
10
Run
time
(day
s)
Runtime Performance, 1 processor
9 days13 hours
1 hour36 minutes
1 week
Single channel, 1 week continuous data
Multiple Stations, 1 day continuous data
Goal: Detect uncataloged earthquakes in continuous data with FAST• Aftershocks of Mw 4.1 event on Calaveras Fault• Data from Northern California Seismic Network (NCSN)
Figure 8: Map of catalog events, 5 stations, used 7 channels in network similarity matrix. CCOB: 3 components, other stations: only vertical. Bandpass �lters applied to remove correlated noise: 4-10 Hz CCOB, 2-6 Hz CML, 2-10 Hz all others. Decimate to 20 sps.
FAST Detections: All 13 catalog events, 26 new events, 8 false detections
Figure 9: (Top) FAST detection results plotted on continuous data used for detection. (Top right) Example uncataloged earthquakes detected with FAST. (Bottom right) FAST �nds about the same total number of events as autocorrelation, but runs 40 times faster.
Figure 10: Map of catalog events, 1 station, used single channel CCOB.EHN for detection. Bandpass �lter applied to remove correlated noise: 4-10 Hz. Decimate to 20 sps. FAST Detections:
21/24 catalog events, 68 new eventsFigure 11: (Top) Waveforms of FAST detections: catalog events (blue), new uncataloged events (red). (Top right) Waveforms of FAST detection errors: false positives (green), false negatives (black). (Bottom right) FAST �nds about the same total number of events as autocorrelation, but runs 140 times faster.
FAST Detection Errors: 12 false detections, 22 missed detections103 104 105 106 107 10810−4
10−2
100
102
104
106
108
1010
1012
Data duration (s)
Runt
ime
(s)
Autocorrelation, 1 processorAutocorrelation, 1000 processorsFAST, 1 processor
Week: 140x
Month: 600x
Year: 7500x
{ {{?
??
Day: 40x
{
Figure 12: Detection algorithm runtime as a function of continuous data duration. Dashed lines are extrapolations based purely on runtime scaling properties (without memory constraints): quadratic for autocorrelation, and near-linear for FAST. FAST should have its greatest utility for longer duration data sets (years), where it could run faster than even massively parallel autocorrelation.
Summary:• FAST algorithm adapted efficient audio search technology to detect earthquakes with similar waveforms• FAST finds as many total events as autocorrelation, with more false detections, but runs 140x faster on 1 week of continuous data• Detected earthquakes in network version of FAST using continuous data from 5 stations
Future Work:• Process longer duration continuous data (month, year): find infrequently repeating events?• Process data in parallel• “Data mining” on diverse data sets to detect earthquakes: foreshocks, aftershocks, swarms, induced seismicity, uncataloged events
References[1] Baluja and Covell (2008), “Waveprint”[2] Leskovec et al. (2014), “Mining of Massive Datasets”[3] Broder et al. (2000), “Min-Wise Indep. Permutations”[4] Shelly et al. (2007), “Non-volcanic tremor”