faganetal_twocol

  • Upload
    ddpros

  • View
    214

  • Download
    0

Embed Size (px)

DESCRIPTION

seismic

Citation preview

  • Clustering revisited: a spectral analysis of microseismic events

    Deborah Fagan, Kasper van Wijk and James Rutledge

    ABSTRACT

    Identifying individual subsurface faults in a larger faultsystem is important to characterize and understandthe relationship between microseismicity and subsur-face processes. This information can potentially helpdrive reservoir management and mitigate risks of nat-ural or induced seismicity. We present a method ofstatistically clustering power spectra from microseismicevents associated with an enhanced oil recovery oper-ation in southeast Utah. Specifically, we are able toprovide a clear distinction within a set of events origi-nally designated in the time domain as a single clusterand to identify evidence of en echelon faulting. Sub-tle time-domain differences between events are accen-tuated in the frequency domain. Power spectra basedon the Fourier transform of the time-domain autocorre-lation function are utilized, as this formulation resultsin statistically independent intensities and is supportedby a full body of statistical theory upon which decisionframeworks can be developed.

    INTRODUCTION

    Monitoring induced seismicity provides valuable informa-tion regarding fluid transport properties in hydrocarbon(Maxwell and Urbancic, 2001) and geothermal (Saar andManga, 2003) reservoirs, geomechanical effects in carbonsequestration (Streit et al., 2005), and hydraulic proper-ties in aquifer recharge and recovery. In energy extrac-tion operations, identifying through microseismicity individual faults, fault systems, and associated structuresinfluence the location and depth of production wells to en-sure the correct fluid conducting faults are tapped. Simi-larly, location of microseismicity is important for injectionwells to ensure that locations, depths, and injection rateswill generate fractures in desired fluid-bearing layers.

    Waveform correlations in the time domain are used todetect low-magnitude and microseismic events (Schaff andBeroza, 2004; Schaff and Richards, 2004, 2011), refinephase arrivals and event hypocenters (Hansen et al., 2006;Rowe et al., 2002; Snieder and Vrijlandt, 2005; Song et al.,2010), determine event focal mechanisms (Hansen et al.,2006), and to classify unknown source events (Harris, 1991;Shumway, 2003; Shumway and Stoffer, 2006). In thismanuscript, we explore correlations in the power spectraof events originally clustered based on a high correlationin the time domain. A permanent vertical borehole arrayrecorded over 3800 microseismic events in 2008 and 2009at the Aneth field, located in southeast Utah. Differencesin the power spectra of microseismic events from a clusterare small, but consistent in terms of their spatial distribu-tion. As a result, we suggest that these events should befurther sub-clustered before further location analysis withmore advanced techniques have been developed based ondifferences between closely spaced events (Waldhauser andEllsworth, 2000; Zhang and Thurber, 2003).

    Geologic Background and Production His-tory

    The Aneth unit is one of four oil producing units locatedwithin the Greater Aneth oil field of the Paradox Basinin southeast Utah (see Figure 1). The lower portion ofa stratigraphic column for the Greater Aneth Field fromHintze and Kowallis (2009) is shown in Figure 2. Of in-terest here are the Pennsylvanian and Mississippian sub-periods, from which oil and gas are extracted, into whichCO2 is injected, and where microseismicity occurs. Theformations of interest, from oldest to youngest, are theMississippian Leadville (two strata: lower dolostone andupper lime mudstone to peloidal to crinoidal lime wacke-stone), and Pennsylvanian Molas (poorly stratified siltand sandstones), Pinkerton Trail (alternating thin bedsof mudstone and shales interbedded with limestone), andParadox (cyclic intervals of dolostone, shale, and salt).

    1

  • Spectral earthquake clustering 2

    Figure 1: Paradox Basin oil and gas fields (modified fromChidsey, Jr., 2009). The study area is in the GreaterAneth and marked by the thick black oval.

    Within the Aneth Unit, the Paradox Formation has beeninformally divided into production zones; from oldestto youngest, named: Alkali Gulch, Barker Creek, Akah,Desert Creek, and Ismay. Oil is produced from the DesertCreek and Ismay zones, the lowest three are referred to asParadox salts (Carney, 2010).

    Chidsey, Jr. (2009) provides a detailed discussion of thesurface and subsurface geology and production history ofAneth; relevant details are summarized here. Structuralorientation within the Aneth Unit is consistent with thatof the Paradox Basin and Greater Aneth (Chidsey, Jr.,2009; Carney, 2010). Figure 1 shows northwest-southeastoriented fold and fault belt along the northeast portionof the Paradox Basin. Surface deformation bands withinGreater Aneth indicate a northwest-southeast trend. Sim-ilarly, structure contour maps of Paradox Formation strataidentify the Aneth area as a northwest-southeast contourorientation. Three-dimensional seismic studies of GreaterAneth show basement faults of Mississippian and Penn-sylvanian age that strike northwest-southeast (Rutledgeand Soma, 2010).

    This field was discovered in 1956, with water-enhancedoil recovery commencing in 1961. The process of con-verting water injection wells to gas began in 2008 andcoincided with the installation of a permanent, 60-level,down-hole geophone array in a former injection well on thewestern side of the field (Rutledge and Soma, 2010). Geo-phones were set between 800 and 1700 m at 15-m spacing.With a sampling rate of 1500 Hz, the deepest eighteen geo-phones are 3-component, the other sensors record the ver-

    Figure 2: The lower portion of a stratigraphic columnfor Aneth (based on Hintze and Kowallis, 2009). Oil isproduced from the Isamy and Desert Creek zones withinthe Paradox formation. The lower zones of the Paradoxformation (Akah, Barker Creek, and Alkali Gulch) arereferred to as Paradox salts, as seen in Figure 4.

    tical component only. The 3-component geophones havea right-hand three-dimensional coordinate system so thata Z signal is positive for a compression arrival from above.The 60-sonde array was built by VCable, LLC and con-sists of OYO Geospace Corporation GS-14-L3 geophoneswith 243000 Hz bandwith and 290 mV/ips sensitivity.

    With the conversion to gas-injection, a deep brine in-jection well was installed to dispose of excess water. Thewell was drilled approximately 1 km northwest of the geo-phone array, into Leadville formation limestone to a depthof approximately 2240 m. The well has four lateral armsand injection rates varied between 12,000 and 20,000 m3

    per week between March, 2008 and March, 2009 (Chidsey,Jr., 2009).

    MICROSEISMIC EVENTS

    Passive seismic data were collected at the geophone arraybetween March, 2008 and March, 2009. Of the 3800 eventsrecorded, P- and S-wave arrival times were obtained for1212 events at 24 geophones. Events organize into twoclusters on opposite sides of the reservoir; the northeastside includes 44 events, the southwest side, 1168 events,which is the focus of discussion and analysis here. Eventsin the southwest cluster trend along a northwest-southeastfracture zone (consistent with the geologic structure of thearea) at least 1500 m long within a depth range of 1830

  • Spectral earthquake clustering 3

    !!

    "#

    #!

    !#

    ##

    !$

    "#

    #!

    $#

    ##

    !"

    ##

    #"

    ##

    %&

    ' ()

    * +,

    -./0'&12/1,/0*2324(

    .56(/762(*2324(

    822721(*92&7)&42

    :'24;*6/42

    # "## $### $"## !### !"## 1(*+,-

    82

    7( )

    * +,

    -

    !#

    ##

    $?

    ##

    @1,>A8212'(BC'22D

    E>'>;&FBG>6(1

    E/4D2'(&4B:'>/6

    E!H>32**326&0/(A

    Figure 3: The 1168 events located on the southwestboundary of the reservoir follow a NW-SE orientation.The 67 events shown as asterisks belong to multiplet 18and separate into two spatial groupings.

    to 2037 m, which places them in either the Paradox saltlayer or the Pinkerton Trail formation (Figure 2). Figure 3shows the southwest cluster of events along with the loca-tion of the geophone array. Asterisks in Figure 3 identifythe location of the 67 events which belong to multiplet 18,discussed in the next section.

    On the 16 shallower receivers, the P- and S-wave phasestend to be impulsive and easy to identify. Events wereinitially screened by manually picking P-wave arrivals onwaveforms from just a few receivers. Using a correlationthreshold of 0.8 in the time domain, these waveforms werethen clustered and re-picked using the methods of Roweet al. (2002), which resulted in 47 sets of events with sim-ilar wavefields called multiplets, using a threshold correla-tion coefficient of 0.8.

    The velocity model used for location was estimated froma 100-ft median smooth of a sonic log obtained from thedeep borehole prior to commencing salt-water injection.Figure 4 shows the sonic log, velocity model and geophonedepths for the receivers used to estimate event locations.

    Rutledge and Soma (2010) applied a master-slave lo-cation scheme to tie the weaker events to the first ar-rivals of the stronger events. High signal-to-noise ratiomaster events were formed by stacking multiplets or werechosen from a single large magnitude event from a mul-tiplet group so that the true first arrivals could be reli-ably picked. For each master event the source locations(radial and depth positions with respect to the verticalreceiver array) were determined using a iterative least-

    !!"#$%&$%'()*+,&-./01

    2%

    3+ 4

    & -.

    1

    5666 7666 8666

    56

    66

    9:

    66

    96

    66

    :6

    6

    ;4*

    !#?#C(DBE#'+0

    !*%?+(

  • Spectral earthquake clustering 4

    events shown in Figure 3. The moveout of first arrivalsfor the earthquake clearly indicate an upward propagatingarrival across the array, and given the polarity of the ver-tical components on the array, indicate dilatational firstmotion. Similarly, for the microseismic events, which alloccur below the bottom of the array, the vertical compo-nent at a subset of geophones show dilatational first mo-tion. Horizontal component first motions are all oppositeof the Bluff earthquake, putting the microseismic eventsin the southeast quadrant with respect to the array.

    Rutledge and Soma (2010) study both the space-timedevelopment of microearthquake occurrence and the cor-relation between seismicity and injection/production ac-tivities. Using the Gutenburg-Richter law log(N) = a bM , where N is the number of events having magnitudethat exceeds M , Rutledge and Soma (2010) fit a valueb = 2.0. Tectonic seismicity typically results in b 1. Thehigh b-value indicates a higher proportion of small eventsto large ones, characteristic of fluid-injection induced seis-micity or natural swarm seismicity. Natural swarm seis-micity typically precedes volcanic eruption or is associatedwith migration of crustal fluids through fracture networks(Lay and Wallace, 1995). The population of earthquakesdominated by small events is generally considered an indi-cation of many small discontinuous faults accommodatingstrain accumulation.

    In a temporal analysis of events, Rutledge and Soma(2010) observe that higher magnitude events occur neargaps in seismicity, indicating that the structure is com-posed of discrete segments, such as en echelon jogs wherestress would tend to concentrate near discontinuities. Upto this point, further inferences regarding subsurface struc-ture are limited, however, because of the single well arraygeometry and large source-array differences.

    Based on the distance between the salt water injectionwell and microseismicity (1200 m), no discernable correla-tion between injection rates and seismicity, and the longhistory of injection at Aneth, Rutledge and Soma (2010)conclude that events occurring within the study periodwere not a direct result of deep salt water injection intothe Leadville or CO2 injection into the Desert Creek zone,but rather stress release related to overall reservoir reduc-tion over the production life of the area.

    MULTIPLET ANALYSIS

    Of the 47 multiplets located on the southwest boundaryof the field, multiplet 18 was chosen to research because ofits highly correlated waveforms that cover the 12-monthstudy period. This multiplet also has a large number ofevents with high-confidence picks used for multiplet iden-tification and location. Multiplet 18 includes 73 events,however, the number of P-wave picks varies from receiverto receiver along the array (as well as from component tocomponent). For instance, 70 picks are available from theeast component data for the geophone at 1552 m (the 11th

    geophone from the bottom), whereas 67 picks are availablefrom the deepest receiver (geophone 1).

    Time (s)

    0.2 0.4 0.6 0.8 1

    ev46

    ev53

    ev67

    ev60

    ev27

    ev58

    ev35

    ev68

    ev33

    ev48

    ev59

    ev55

    ev37

    ev52

    ev26

    ev61

    ev34

    ev23

    ev13

    ev22

    ev2

    ev25

    ev11

    ev31

    ev1

    Figure 5: Waveforms from multiplet 18 from Geophone 1,the deepest receiver, aligned by the lag of the maximumcorrelation coefficient around the maximum amplitude.Each of the 25 waveforms starts 0.1 s prior to the P-wavearrival and is normalized by its maximum amplitude.

    We investigated statistical clustering of multiplet 18events from geophone 1. Data from this receiver were ex-cluded in the location estimation by Rutledge and Soma(2010), because of complications likely stemming fromguided-wave arrivals. While this was an issue with thelocation algorithm used, (guided-wave) data from this geo-phone may contain information on the relatively unchar-acterized volume underlying the production reservoirs.

    Rotated east-component waveforms from the deepestgeophone for 25 events from multiplet 18 are shown inFigure 5. Peak-normalized waveforms starting 0.1 s priorto the P-wave arrival are aligned according to the max-imum correlation in the first 0.25 s, similar to the pro-cess of Rowe et al. (2002) used to create the mulitplets.At the geophone 11 (1552 m depth), waveform correla-tion coefficients for multiplet 18 events exceed 0.90. Crosscorrelation coefficients at geophone 1 (1704 m) range be-tween 0.68 and 0.98. Rutledge and Soma (2010) suggestray paths for these events to the deepest 8 geophones deeper than 1590 m include critically refracted paths inthe Leadville formation. Defined by their strong wavefieldcorrelation in the time domain, these events belong to thesame multiplet 18 and are located using a single masterevent.

  • Spectral earthquake clustering 5

    ev

    3e

    v1

    6e

    v1

    7e

    v2

    6e

    v6

    7e

    v4

    3e

    v4

    9e

    v5

    5e

    v6

    2e

    v1

    4e

    v1

    8 ev

    9e

    v3

    5e

    v4

    7e

    v5

    2e

    v6

    4e

    v6

    1e

    v3

    6e

    v2

    7e

    v4

    4e

    v4

    ev

    12

    ev

    13

    ev

    65

    ev

    31

    ev

    39

    ev

    56

    ev

    19

    ev

    25

    ev

    23

    ev

    29

    ev

    50

    ev

    42

    ev

    57 e

    v1

    1e

    v5

    1e

    v2

    0e

    v5

    3e

    v6

    3e

    v3

    0e

    v4

    1 ev

    6e

    v5

    9e

    v4

    5e

    v5

    8e

    v2

    2e

    v3

    3e

    v2

    8e

    v4

    0e

    v1

    ev

    10 e

    v5

    ev

    24

    ev

    8e

    v1

    5e

    v5

    4e

    v3

    2e

    v4

    6e

    v3

    8e

    v6

    6e

    v3

    4e

    v6

    0e

    v3

    7e

    v4

    8e

    v2

    1e

    v2

    ev

    7

    05

    01

    00

    15

    0

    Di s

    si m

    i l ar i

    t y (

    Hz

    ^ 2)

    Group 3 Group 6 Group 4 Group 5 Group 1 Group 2

    Figure 6: Dendrogram for the spectra of multiplet 18events from Geophone 1, with rectangles identifying clus-ter membership. The longest branch of the dendrogramindicates spectra of Group 3 members are most disparate;the shortest branch shows that Group 1 and 2 event spec-tra are the most similar.

    Spectral Correlations

    The spectra of the events in multiplet 18 were clustered viahierarchical agglomerative clustering methods using spec-tral distance as the dissimilarity metric. See the Appendixfor details, but in essence the spectral distance is definedas the Euclidean distance using normalized spectral powerby frequency. Each event starts out as its own cluster,with similar clusters being iteratively combined until onlya single cluster remains. The final number of clusters cho-sen is based on a dendrogram. Because it works on adistance matrix, the algorithm is robust to the order ofthe events. It is a widely used algorithm, successfully uti-lized in other seismic applications (Bardainne et al., 2006;Rowe et al., 2002).

    Using Fourier frequencies in the range (1, 375) Hz, eventspectra organize spatially, consistent with location. Thecluster dendrogram is shown in Figure 6, with rectanglescorresponding to cluster membership. Height on the y-axis represents dissimilarity, a higher horizontal agglom-eration line indicates greater spectral dissimilarity than alower one. In order to study the spectral characteristicsthat affect clustering, it is useful to examine events corre-sponding to individual agglomerations in the dendrogramof Figure 6. Events in Group 3 in Figure 6 are agglom-erated early in the algorithm, but only join the rest ofthe events at the last agglomeration step. Figure 9 showsthat Group 3 events are geographically separated fromthe rest of the events in the multiplet. Examination ofthe mean spectra in Figure 14 identifies the same range ofcontributing frequencies, but intensities differ markedly,lending insight to how the clustering algorithm utilizesspectral distance to differentiate groups of events.

    The next branch in the dendrogram in Figure 6 revealsadditional geographic organization of events. All events

    !!

    "#

    !!

    ##

    !"

    "#

    !"

    ##

    !$

    "#

    !$

    ##

    %&

    ' ()

    * +,

    -

    ./0(*1'&23450(*1'&231'&23*6

    7""# 7!## 7!"# 78## 78"#

    9#

    6#

    9#

    7#

    450(*+,-

    :/

    3( )

    * +,

    -Figure 7: Results for spectral clustering with three clus-ters; the group of circles in Figure 9 has been subdividedinto western and eastern clusters based on spectral clus-tering.

    !" #

    !" $

    !" %

    !" &

    '()*+,-.

    /(

    01 2

    32

    * 45 6

    77

    * 46

    5 58

    9 (: 1

    6;

    !< ! < = >

    ?87:*@563AB(7:*@563A

    Figure 8: Maximum cross correlations between the west-ern cluster mean and individual events, coded by clustermembership. Positive lags to approximately 3 Hz supportthe mean frequency shift indicated in Figure 11.

  • Spectral earthquake clustering 6

    !!

    "#

    !!

    ##

    !"

    "#

    !"

    ##

    !$

    "#

    !$

    ##

    %&

    ' ()

    * +,

    -.'&/0*123,45673'

    8""# 8!## 8!"# 89## 89"#

    :#

    1#

    :#

    8#

    ;4

  • Spectral earthquake clustering 7

    !!

    "#

    !!

    ##

    !"

    "#

    !"

    ##

    !$

    "#

    !$

    ##

    %&

    ' ()

    * +,

    -.'&/0*1.'&/0*2.'&/0*3.'&/0*$.'&/0*".'&/0*!

    1""# 1!## 1!"# 14## 14"#

    2#

    3#

    2#

    1#

    567(*+,-

    89

    0( )

    * +,

    -

    Figure 12: Further subclustering leads to event locationscoded by spectral cluster membership of Figure 6. Sub-clusters identified align into what appear to be individualen echelon fractures.

    mean (Group 5) are shown in Figure 13. Frequency shiftsare more complicated, but generally follow the trend ob-served in Figure 10. While lags show more variabilitywithin clusters, the geographical organization of clustersis a compelling reasons to consider these as legitimate sub-clusters. In the next Section, we present geologic argu-ments for the significance of these subclusters.

    DISCUSSION

    While we observe spectral differences within a cluster,there are no other particularly distinguishing features thatdifferentiate these clustered events. A time domain analy-sis of the same data in Fagan (2012) does not reveal consis-tent agglomeration of events anywhere near the resolutionobserved in the frequency domain. This result may seemsurprising, given the time and frequency domain analysisare based on the same data. However, it may be that afocus on the power spectra (ignoring phase information)makes small temporal differences easier to identify.

    Differentiation within cluster with spectral intensities inFigure 14 can be attributed to source and/or path effects.These events are nearly co-located relative to the sensorarray (with the exception of Group 3), thereby minimiz-ing path effects to the extent possible. However, there issufficient ambiguity in subsurface structure at the eventsto not rule out a path effect yet. Sensor locations are notsufficient to determine the orientation of events or focalmechanism, but the analysis of Rutledge and Soma (2010)indicate non-tectonic seismicity. If attenuation was a fac-

    !! " ! #"

    "$ %

    "$ &

    "$ '

    #$ "

    ()*+,-./

    0)

    12 3

    43

    + 56 7

    88

    + 57

    6 69

    : ); 2

    7+#=674>+?=674>+@=674>+%=674>+!=674>+&

    Figure 13: Maximum cross correlations for each multiplet18 event cross correlated with the western-most spectralcluster mean (Group 5). Events are generally less corre-lated away from the Group 5 centroid (the western-mostgroup), but lags follow a generally positive trend, an indi-cation of a frequency shift with distance from the receiver.

    tor in the spectral separation of these events one wouldexpect the opposite of what is observed: events fartherfrom the receiver would have less spectral energy in thehigher frequencies than events farther afield.

    After a more detailed subclustering in Figure 12, we candistinguish the individual en echelon faults that were ini-tially hypothesized by Rutledge and Soma (2010), basedupon gaps in the seismicity. Uncertainty elllipses areoriented SW-NE, generally perpendicular to the linear ori-entation of events (see Fagan, 2012), but their size blursthe distinction between sub-groups. That, along with hightime-domain waveform correlation, made distinguishingdetailed subsurface structure difficult. Spectral correla-tion, however, reveals event subclusters suggesting sepa-rate parallel fractures striking NNE-SSW, oblique to themain fault lineament and characteristic of echelon fractur-ing.

    En echelon fractures are planar structures. Althoughthe formation depths below the oil producing zones are notwell understood in the region of microseismicity, assumingthe layer thickness contour plot in Chidsey, Jr. (2009) rep-resents thicknesses patterns of the underlying formations,the consistent depths of multiplet 18 events indicates theyoccur within the same structure, and are therefore planaras well. This collection of evidence leads to the character-ization of en echelon faulting as a result of stress changesassociated with Aneths 50-plus year production history.In future work, we can test this model. Popular tech-niques to refine hypocenter distributions include hypodd

  • Spectral earthquake clustering 8

    0 100 200 300 400

    0. 0

    0. 2

    0. 4

    0. 6

    0. 8

    Frequency (Hz)

    St a

    nd

    ar d

    i ze

    d I

    nt e

    ns

    i ty

    Group 3Remainder

    Figure 14: Mean peak normalized power spectra for thetwo clusters of events identified in Figure 9. Shapes aremarkedly different, providing insight to the spectral clus-tering process.

    (Waldhauser and Ellsworth, 2000) and tomodd (Zhang andThurber, 2003), based on the double differences betweenP- and S-wave arrivals of closely spaced events. The meth-ods rely on inter-event distances not to exceed the scalelength of the velocity model, or bias may be introduced.An assumption is that events are from the same fractureor medium. We propose our spectral method helps pre-define subsets of hypocenters, which in turn can then berelocated via double-difference tomography.

    CONCLUSIONS

    Microseismicity in areas of oil and gas production, as wellas CO2 injection, provides valuable information abouttime-lapse reservoir conditions. Based on an analysis ofevents clustered in the time domain, a novel spectral clus-tering technique reveals more subtle variations in the dis-tribution of hypocenters than is available through timedomain analysis. The dissimilarity metric upon whichthis spectral clustering is based illuminates subtle, po-tentially undetectable differences in the time domain andis supported by a fully developed body of statistical the-ory upon which future applications of statistical decisionmaking using event spectra may be based.

    We hypothesize the new fracture patterns revealed bythe spectral clustering technique are characteristic of stressrelease as en echelon faulting. These details may be valu-able as a preprocessing step for high-resolution time-domainlocation algorithms such as double difference tomography.Further study is needed to determine whether these resultscan be observed for other clusters within this data set andwhether other field sites yield similar results.

    ACKNOWLEDGMENTS

    We would like to acknowledge Schlumberger, and in par-ticular Ian Bradford and Daniel Raymer, for their supportin this research. In addition, we thank Ludmila Adam,Jack Pelton, and Martin Smith for their valuable feed-

    back on the contents of this manuscript.

    REFERENCES

    Bardainne, T., P. Gaillot, N. Dubos-Sallee, J. Blanco, andG. Senechal, 2006, Characterization of seismic wave-forms and classification of seismic events using chirpletatomic decomposition. Example from the Lacq gas field(Western Pyrenees, France): Geophysical Journal In-ternational, 166, 699718.

    Carney, S., 2010, Chapter 3, Suburface Structural andThickness Mapping: Technical report, Utah GeologicalSurvey, Salt Lake City, UT.

    Chidsey, Jr., T. C., 2009, Surface and subsurface geolog-ical characterization of the Aneth Unit, Greater Anethfield, Paradox Basin, Utah - final report: Technical re-port, Utah Geological Survey, Salt Lake City, UT.

    Diggle, P. J., 1991, Time series: A biostatistical introduc-tion: Clarendon Press, Oxford.

    Fagan, D. K., 2012, Statistical Clustering of MicroseismicEvent Spectra to Identify Subsurface Structure: Mas-ters thesis, Boise State University, Boise, Idaho.

    Hansen, S. E., S. Y. Schwartz, H. R. DeShon, and V.Gonzalez, 2006, Earthquake relocation and focal mech-anism determination using waveform cross correlation,Nicoya Peninsula, Costa Rica: Bulletin of the Seismo-logical Society of America, 96, 1003 1011.

    Harris, D. B., 1991, A waveform correlation method foridentifying quarry explosions: Bulletin of the Seismo-logical Society of America, 81, 2395 2418.

    Hintze, L. F., and B. J. Kowallis, 2009, Geologic history ofUtah: Brigham Young University Geology Studies Spe-cial Publication 9: Technical report, Brigham YoungUniversity, Salt Lake City, UT.

    Kaufman, L., and P. J. Rosseeuw, 1990, Finding groups indata: An introduction to cluster analysis: Wiley, NewYork.

    Lay, T., and T. C. Wallace, 1995, Modern global seismol-ogy: Academic Press, San Diego.

    Maxwell, S. C., and T. I. Urbancic, 2001, The role ofmicroseismic monitoring in the instrumented oil field:The Leading Edge, 20, 636 639.

    Phillips, W. S., T. D. Fairbanks, J. T. Rutledge, andD. W. Anderson, 1989, Induced microearthquake pat-terns and oil-producing fracture systems in the AustinChalk: Tectonophysics, 289, 153 169.

    R Development Core Team, 2008, R: A language andenvironment for statistical computing. R Foundationfor Statistical Computing, Vienna, Austria. (ISBN 3-900051-07-0).

    Rowe, C. A., R. C. Aster, B. Borchers, and C. J. Young,2002, An automatic, adaptive algorithm for refiningphase picks in large seismic data sets: Bulletin of theSeismological Society of America, 92, 1660 1674.

    Rutledge, J., and N. Soma, 2010, Chapter 12, Microseis-mic Monitoring of CO2 Enhanced Oil Recovery in theAneth Field: Technical Report LA-UR-10-08213, LosAlamos National Laboratory, Los Alamos, NM.

  • Spectral earthquake clustering 9

    Rutledge, J. T., and W. S. Phillips, 2003, Hydraulic stim-ulations of natural fracture as revealed by induced mi-croearthquakes, Carthage Cotton Valley gas field, eastTexas: Geophysics, 68, 441 452.

    Saar, M. O., and M. Manga, 2003, Seismicity induced byseasonal groundwater recharge at Mt. Hood, Oregon:Earth and Planetary Science Letters, 214, 605 618.

    Schaff, D. P., and G. C. Beroza, 2004, Coseismicand postseismic velocity changes measured by re-peating earthquakes: J. Geophys. Res., 109, 14.(doi:200410.1029/2004JB003011).

    Schaff, D. P., and P. G. Richards, 2004, Repeatingseismic events in China: Science, 303, 11761178.(doi:10.1126/science.1093422).

    , 2011, On finding and using repeating seismic eventsin and near China,: Journal of Geophysical Research,116. (doi:201110.1029/2010JB007895).

    Shumway, R. H., 2003, Time-frequency clustering and dis-criminant analysis: Statistics & Probability Letters, 63,307 314.

    Shumway, R. H., and D. S. Stoffer, 2006, Time series anal-ysis and its applications: With r examples, 2nd ed.:Springer-Verlag New York, LLC.

    Snieder, R., and M. Vrijlandt, 2005, Constraining relativesource locations with coda wave interferometry: Theoryand application to earthquake doublets in the Haywardfault, California: J. Geophys. Res., 110, B04301.

    Song, F., H. S. Kuleli, M. N. Toksoz, E. Ay, and H. Zhang,2010, An improved method for hydrofracture-inducedmicroseismic event detection and phase picking: Geo-physics, 75, A74 A52.

    Streit, J. E., A. F. Siggins, and B. J. Evans, 2005, Predict-ing and monitoring geomechanical effects of CO2 injec-tion, in Carbon Dioxide Capture for Storage in DeepGeologic Formations, Volume 2: Elsevier Ltd., 751 770.

    Venables, W. N., and B. D. Ripley, 1999, Modern and ap-plied statistics with S-PLUS, 3rd ed.: Springer-VerlagNew York, LLC.

    Waldhauser, F., and W. L. Ellsworth, 2000, A double-difference earthquake location algorithm: Method andapplication to the northern hayward fault, california:Bulletin of the Seismological Society of America, 90,13531368.

    Zhang, H., and C. H. Thurber, 2003, Double-DifferenceTomography: The Method and Its Application to theHayward Fault, California: Bulletin of the Seismologi-cal Society of America, 93, 1875 1889.

    APPENDIX A

    This appendix summarizes hierarchical agglomerative clus-tering using the discrete Fourier transform of the auto-covariance function as the power spectrum. See (Dig-gle, 1991; R Development Core Team, 2008; Venables andRipley, 1999) for details of calculating the spectrum and(Kaufman and Rosseeuw, 1990; R Development Core Team,

    2008) for additional details of hierarchical agglomerativeclustering.

    In agglomerative clustering, each event starts out as itsown cluster, clusters are agglomerated (combined) intolarger ones, culminating in a single cluster that containsall events. Agglomeration decisions are based on a dis-similarity metric, among which there are many to choose,including Euclidean distance, Mahalanobis distance, andKullback-Liebler divergence, to mention a few. This ap-plication uses squared spectral distance, defined below, asthe dissimilarity metric.

    Among clustering algorithms, hierarchical agglomera-tive clustering is attractive for several reasons: 1) unlikekmeans, the number of clusters is not required in ad-vance, 2) since it works on a distance matrix, it is ro-bust to the order of events, and 3) interactive determi-nation of the number of clusters provides flexibility, withthe added advantage of quickly being able to identify out-liers and singletons. Hierarchical agglomerative clusteringis a widely used algorithm, successfully applied in otherseismic applications (Bardainne et al., 2006; Rowe et al.,2002).

    For an observed waveform (event) {yt : t = 1, 2, . . . , n}with zero mean, the sample autocovariance for lag k is

    f(k) =1n

    nt=k+1

    ytytk, k = 0, 1, . . . , n 1. (A-1)

    The spectrum is the discrete Fourier transform of f andhas the form

    F () = f(0) + 2n1k=1

    f(k) cos(k), (A-2)

    where f(0) is the variance of y and = 2pij/n for positiveinteger j < n/2.

    Starting with the P-wave arrival, each waveform is trimmedto the length of the shortest duration event so that theFourier frequencies are constant between events and in-terpolation is not necessary. The first 400 Fourier fre-quencies are then used for clustering. The shortest lengthevent is 1598 samples, which is then padded with zeros ton = 1600 to optimize the discrete Fourier transform. Theresult is 800 Fourier frequencies with a delta-frequency of0.937 Hz.

    Squared spectral distance is defined as the squared Eu-clidean distance using the first 400 spectrum amplitudes.The squared spectral distance for events a and b is

    d2(a, b) = [Fa() Fb()]T [Fa() Fb()] (A-3)with F from equation (A-2).

    At each agglomeration step the dissimilarity measuremust be updated. We utilize Wards minimum variancecriterion, which minimizes Var[d2] over all candidate ag-glomerations. Wards minimum variance only works withtrue distance measures, which is our motivation for usingspectral distance. The Lance-Williams algorithm provides

  • Spectral earthquake clustering 10

    an efficient method of performing these calculations, andis discussed in detail in Kaufman and Rosseeuw (1990).If two clusters, C1 and C2 have been agglomerated, thedissimilarity between their union and another cluster Mis given by

    D(C1C2,M) = D(C1,M)+D(C2,M)0.5D(C1, C2),(A-4)

    where D is Var[d2], with d2 as defined in Equation A-3.As noted, we use the Fourier transform of the ACF as

    our spectral estimate. The statistical properties of thisestimator lend themselves to future research and statisti-cal decision making opportunities. Statistical instabilityof intensity estimates is offset by fine tuning taper andsmoothing parameters, which, if varying between events,could result in different peak locations. We mitigate vari-ability between event spectra by using the same param-eters among events, as well as by making all events thesame length. Each spectrum has identical Fourier fre-quencies and and equal number of observations for each in-tensity estimate across events. Slight differences betweenspectra may be present at the extreme frequencies, whichmay be impacted by edge effects, but this is also miti-gated by clustering over a range of frequencies inside theextremes.

    The ordinates (autocorrelation coefficients) of the ACFare not statistically independent, whereas the Fourier trans-form of the ACF is an orthogonal partitioning of the sumof squares that results in statistically independent ordi-nates (intensities). The absolute value of the ACF can givean indication of whether certain frequencies are prevalentin the waveform, but because spectrum coefficients (in-tensities) are independent, we can use fully establishedtheory to statistically compare spectral intensities. It isclear that event spectra do not cluster randomly, as evi-denced by their spatial organization. Statistical analysisof cluster membership is a topic of future research, asrequirements of statistical decision making in this appli-cation are developed.