02 Asoc Published

7/27/2019 02 Asoc Published

1/15

Applied Soft Computing 13 (2013) 13141328

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing

j ournal homepage: www.elsevier .com/ locate /asoc

Neural networks to predict earthquakes in Chile

J. Reyesa, A. Morales-Esteban b, F. Martnez-lvarez c,

a TGT, Camino Agrcola 1710, Torre A, Of 701, Santiago, Chileb Department of ContinuumMechanics, University of Seville, Spainc Department of Computer Science, Pablo de Olavide University of Seville, Spain

a r t i c l e i n f o

Article history:

Received 23 December 2011Received in revised form26 September 2012Accepted 31 October 2012Available online 23 November 2012

Keywords:

Earthquake predictionArtificial neural networksSeismic riskTime series

a b s t r a c t

A new earthquake prediction system is presented in this work. This method, based on the application

ofartificial neural networks, has been used to predict earthquakes in Chile, one of the countries withlarger seismic activity. The input values are related to the b-value, the Baths law, and the OmoriUtsuslaw, parameters that are strongly correlated with seismicity, as shown in solid previous works. Two kindof prediction are provided in this study: The probability that an earthquake ofmagnitude larger than athreshold value happens, and the probability that an earthquake ofa limited magnitude interval mightoccur, both during the next five days in the areas analyzed. For the four Chiles seismic regions examined,with epicenters placed on meshes with dimensions varying from 0.5 0.5 to 1 1, a prototype ofneuronal network is presented. The prototypes predict an earthquake every time the probability ofanearthquake of magnitude larger than a threshold is sufficiently high. The threshold values have beenadjusted with the aim ofobtaining as few false positives as possible. The accuracy ofthe method hasbeen assessed in retrospective experiments by means ofstatistical tests and compared with well-knownmachine learning classifiers. The high success rate achieved supports the suitability of applying softcomputing in this field and poses new challenges to be addressed.

2012 Elsevier B.V. All rights reserved.

1. Introduction

Chile emerges as, perhaps, the highest seismic region in theworld,withthepossibleexceptionofJapan.Theearthquakeofmag-nitude 9.5Mw occurred in Chile in 1960 is, probably, the largestearthquake in world history [42]. The mega-earthquake causedmore than 2000 deaths, more than 2 million people lost theirhomes, and the tsunami generated caused damage in distant loca-tions, such as Japan and Hawaii [8]. Fifty years later, on 27thFebruary 2010, the south of Chile was affected by another mega-earthquake of moment magnitude 8.8. The earthquake and theensuing tsunami caused claimed together more than 500 lives andsevere damage along the adjacent coasts [43].

Despiteforecastandpredictionareusedassynonymousinmany

fields,therearesubtledifferences,asdiscussedin [48]. Inthissense,it has to be pointed that this work is about earthquake prediction.Given a setof inputs, a prediction consists in theinteraction of suchinputs through laws or well defined rules such as thermodynamic,rigidbodymechanics,etc.Asaresult,thefuturehastobecalculatedwith a high degree of accuracy as kinematics describes the trajec-tory of a projectile. In seismology theinput valuescorrespond tothe

Corresponding author.E-mail addresses:[email protected] (J. Reyes),

[email protected] (A. Morales-Esteban), [email protected] (F. Martnez-lvarez).

stress point to point and the asperities or plates sub-topography,which are almost impossible to obtain.

Although several works claim to provide earthquake predic-tion, an earthquake prediction must provide, according to [3], thefollowing information:

1. A specific location or area.2. A specific span of time.3. A specific magnitude range.4. A specific probability of occurrence.

Thatis,anearthquakepredictionshouldstatewhen,where,howbig, andhowprobable thepredictedeventis andwhythe predictionis made. Unfortunately, no general useful method to predict earth-quakes has been found yet. And it may never be possible to predictthe exact time when a damaging earthquake will occur, becausewhen enough strain has built up, a fault may become inherentlyunstable, and any small background earthquake may or may notcontinue rupturing and turn into a large earthquake.

This is, precisely, the challenge faced in this work and its mainmotivation. In particular, the main novelty of this contributionis the development of a system capable of predicting earthquakeoccurrence for periods of time statistically significant, in order toallow the authorities to deploy precautionary policies.

Indeed, a system able to generate confident earthquake predic-tions allows the shutdown of systems susceptible to damage such

1568-4946/$ see front matter 2012Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.asoc.2012.10.014
http://localhost/var/www/apps/conversion/tmp/scratch_3/dx.doi.org/10.1016/j.asoc.2012.10.014http://www.sciencedirect.com/science/journal/15684946http://www.elsevier.com/locate/asocmailto:[email protected]:[email protected]:[email protected]://localhost/var/www/apps/conversion/tmp/scratch_3/dx.doi.org/10.1016/j.asoc.2012.10.014http://localhost/var/www/apps/conversion/tmp/scratch_3/dx.doi.org/10.1016/j.asoc.2012.10.014mailto:[email protected]:[email protected]:[email protected]://www.elsevier.com/locate/asochttp://www.sciencedirect.com/science/journal/15684946http://localhost/var/www/apps/conversion/tmp/scratch_3/dx.doi.org/10.1016/j.asoc.2012.10.014


2/15

J. Reyes et al. / Applied Soft Computing 13 (2013) 13141328 1315

as power stations,transportationand telecommunicationssystems[16], helping the authorities to focalize their efforts to reduce dam-age and human looses.

Since false alarmsare cause fordeep concern in society, the pro-posed approach has been especially adjusted to generate as fewfalse alarms as possible. In other words, when the output of thesystem is there is an upcoming earthquake, it is highly probable anearthquake occurs, as observed in the four Chilean areas examined.

It is worth noting that the system proposed is almost real-time, therefore, the authority receiving this information does notneed to distinguish between mainshock and aftershock. That is,both mainshocks and aftershocks are considered because theyare retrospective designations that can only be identified after anearthquake sequence has been completed[62].

To achieve the goalof predictingearthquakes, artificialneuronalnetworks (ANN) are used to predict earthquakes when the proba-bility of exceeding a threshold value is sufficiently high. The ANNspresented in this paper fulfill the requirements above discussed.In particular, the specific location or area varies from 0.5 0.5 to11. The specific span of time is within the next five days. Thespecific magnitude range is determined from the magnitude thattriggers thealarm(theaverage magnitude of thearea plus 0.6timesthe typical deviation, as discussed in Section 4). Finally, the rate ofearthquakes successfully predictedhas been empirically calculatedandvaries from 18%to 92%(Section 6) dependingontheareaunderanalysis.

The main contribution of this work can be then summarized asfollows. A new earthquake prediction system, based on ANNs, hasbeen proposed. This system makes use of a combination seismicityindicators never used before, based on information correlated withearthquake occurrence,as inputparameters. Especiallyremarkableis the small spatial and temporal uncertainty of this system, thuscontrasting with other existing methods that mainly predict earth-quakes in huge areas with high temporal uncertainty (see Section2 for further information).

The rest of the work is divided as follows. Section 2 introducesthe methods used to predict the occurrence of earthquakes. The

geophysical fundamentals are described in Section 3. A descriptionof the ANNs used in this work is provided in Section 4. In Section5,the training of the ANNs isexplained as well asthe methodto eval-uate the networks. In Section 6, the results obtained for every ANNin the testing process are shown, as well as a comparative anal-ysis with other classifiers. Also, a fractal hypothesis that explainsthe solid performance observed is presented. Section 7 is includedto assess the robustness of the method considering the possibleinfluence of earthquake clustering and different time periods ofapplication. Finally, Section 8 summarizes the conclusions drawn.

2. Related work

Despite the great effort made and the multiple models devel-oped by different authors [83], no successful method has beenfound yet. Due to the random behavior of earthquakes generation,it may never be possible to ascertainthe exact time, magnitudeandlocation of the next damaging earthquake.

Neural networks have been successfully used for solving com-plicated pattern recognition and classification problems in manydomains such as financial forecasting, signal processing, neuro-science, optimization, etc. (in [1] neural applications are detailed).But only very few and very recent studies have been publishedon the application of neural networks for earthquake prediction[4,45,63,64,82].

Actually, there is no consensus among researchers in how tobuild a time-dependent earthquake forecasting model [17] nowa-

days, therefore, it is impossible to define a unique model. Hence,

the Regional Earthquake Likelihood Model (RELM) has developed18 different models.

The U.S. Geological Survey (USGS) and the California GeologicalSurvey (CGS) have developed a time-independent model based onthe assumption that the probability of the occurrence of an earth-quake, in a given periodof time, follows a Poisson distribution[66].The forecast for California is composed of four types of earthquakesources: two types of faults, zones, and distributed backgroundseismicity. The authors also presented a time-dependent modelbased on the time-independent 2002 national seismic hazardmodel and additional recurrence information.

Kagan et al. [37] presented a smoothed-seismicity model forsouthern California. This is a five-year forecast of earthquakes withmagnitudes5.0 and greater, basedon spatially smoothed historicalearthquake catalog [36]. The forecast is based on observed regular-ityofearthquakeoccurrenceratherthanonanyphysicalmodel.Themodelpresentedin [29] issimilartothatofKaganetal.[37] anditisbased on a smoothed seismicity model. This model is based on theassumptionthatsmallearthquakesareusedtomap the distributionof large damaging earthquakes and removing aftershocks.

Shen et al. [80] present a probabilistic earthquake forecastmodel of intermediate to long times during the interseismic timeperiod, constrained by geodetically derived crustal strain rates.Moreover, the authors in [7] propose simple methods to estimatelong-term average seismicity of any region. For California, theycompute a long-term forecast based on a kinematic model of neo-tectonics derived from a weighted least-squares fit to availabledata. On the other hand, Console et al. [11] present two 24-h fore-casts based on two Epidemic-Type Aftershock Sequence model(ETAS). These models arebased on theassumption that everyearth-quake can be regarded as both triggered by previous events and asa potential triggering event for subsequent earthquakes [10].

The model presentedby Rhoades [70] is a method of long-rangeforecasting that uses the previous minor earthquakes in a catalogto forecast the major ones. This model is known as EEPAS (EveryEarthquake a PrecursorAccordingto Scale). Alternatively,Ebel et al.[15] havepresented a setof long term (five-year) forecast ofM5.0

earthquakes and two different methods for short-term (one-day)forecasts ofM4.0 earthquakes for California and some adjacentareas.Five differentmethods forthe RELM were presentedby Ward[92]. The first one is based on smoothed seismicity for earthquakesof magnitude larger or equal to 5.0. The second is based on GPSderived strainand Kostrovsformula.The third oneis based on geo-logical fault slip-data. The fourth model is simply an average of thethree aforementioned models and, finally, the last one is based onan earthquake simulator. Thelast modelof theRELMwas presentedby Gerstenberguer et al. [19] and is a short term earthquake prob-ability model that is a 24-h forecast based on foreshock/aftershockstatistics.

Theworkin [95] presentedandAsperity-basedlikelihoodmodelfor California. This method is based on three assumptions: the

b-value has been shown to be inversely dependent of appliedshear stress so the b-value can be used as a stressmeter withinthe Earths crust where no direct measurements exist [78]. Sec-ondly, asperities are found to be characterized by low b-values[77]. Finally, data from various tectonic regimes suggest that theb-value of micro-earthquakes is quite stationary with time [75].Equally remarkable is the pattern informatics (PI) method [30] thatdoesnotpredictearthquakesbutforecaststheregionswhereearth-quakes are most likely to occur in the relatively near future (510years).

Many other methods have been proposed by different authorsto predict earthquakes: From unusual animal behavior[38] to elec-tromagnetic precursors [89] to any imaginable precursor. Despitesomecorrectpredictionsandmanyeffortsdone,errorsarepredom-

inant(i.e.Parkfield[22] orLomaPrieta[91]). Nevertheless, thereare


3/15

1316 J. Reyes et al. / Applied Soft Computing 13 (2013) 13141328

some exceptions such as the Xiuyen prediction [99] or the predic-tion by Matsuzawa et al. [49], who predicted that an earthquakeof magnitude 4.80.1 would hit Sanriku area (NE Japan) with 99%probability on November 2001. An earthquake of magnitude 4.7actuallyoccurredonNovember13th2001,asexpected.Themethodused by the authors was the comparison of recurrent seismogramsevery 5.30.53 years altogether with the deduction of the exist-ence of a small asperity with a dimension of1 km surrounded bystable sliding area on the plate boundary.

Despite thevast variety of methods attempting to predict earth-quakes, the analysis of the b-value as seismic precursor has stoodout during the last decade. Indeed, the authors in [50] appliedartificial intelligence to obtain patterns which model the behav-ior of seismic temporal data and claimed that this informationcouldhelptopredictmedium-largeearthquakes.Theauthorsfoundsome patterns in thetemporal seriesthat correlated with thefutureoccurrence of earthquakes. This method was based on the work by[54], where the authors had found spatial and temporal variationofb-values preceding the NW Sumatra earthquake of December2004.Also, the use of artificial intelligence techniques can be foundin [47]. In particular, the authors discovered the relation exist-ing between the b-value and the occurrence of large earthquakeby extracting numeric association rules and the well-knownM5

algorithm, obtaining also linear models.Due to the multiple evidences discovered, the variations of the

b-value have been decided to be part of the input data used in theANNs designed.

3. Geophysical fundamentals

Thissectionexposesthefundamentalsthatsupportthemethod-ology used to build the multilayer ANNs. That is, the informationunderlying the input/output parameters of the designed ANNs.Firstly, the GutenbergRichter law is explained. Then, theb-valueof the GutenbergRichter law is examined as seismic precursor.Later, the Omori/Utsu and Baths laws, that are related to the fre-quency and magnitude of the aftershocks, are presented. Finally,the six seismic regions of Chile are described[69], and their year ofcompleteness calculated.

3.1. GutenbergRichters law

The size distribution of earthquakes has been investigatedsincethe beginning of the 20th century. Omori [61] illustrated a table ofthe frequency distribution of maximum amplitudes recorded by aseismometer in Tokyo. Moreover, Ishimoto and Iida [32] observedthat the number of earthquakes,N, of amplitude greater or equaltoA follows a power law distribution defined by:

N(A) = aAB (1)

where a and B are the adjustment parameters.Richter[71] notedthatthenumberofshocksfallsoffveryrapidlyfor the higher magnitudes. Later, Gutenberg and Richter [24] sug-gested an exponential distribution for the number of earthquakesversus the magnitude. Again, Gutenberg and Richter [25] trans-formed this power law into a linear law, expressing this relationfor the magnitude frequency distribution of earthquakes as:

log10N(M) = a bM (2)

This law relates the cumulative (or absolute) number of eventsN(M) with magnitudelarger or equal toMwith the seismic activity,a, andthe size distributionfactor,b.Thea-valuedepends of thevol-ume and the temporal window chosen and is the logarithm of thenumberof earthquakes with magnitudelarger or equal to zero. The

b-value is a parameter that reflects the tectonic of the area under

analysis [40] and has been related with the physical characteristicsof the area. A high value of the parameter implies that the numberof earthquakes of small magnitude is predominant and, therefore,the region has a low resistance. Contrarily, a low value shows asmaller difference between the relative number of small and largeevents,implying a higher resistance of the material. The coefficientb usually takes a value around 1.0 and has been widely used byinvestigators [87].

It is assumed that the magnitudes of the earthquakes that occurin a region and in a certain period of time are independent andidentically distributed variables that follow the GutenbergRichterlaw [67]. IfearthquakeswithmagnitudeM0 andlargerareused,thishypothesis is equivalent to suppose that the probability density ofthe magnitudeMis exponential:

f(M,) = e(MM0) (3)

Gutenberg and Richter [26] used the least squares method toestimate coefficients in the frequencymagnitude relation. Shi andBolt [81] pointed out that the b-value can be obtained by thismethod but the presence of even a few large earthquakes influ-ences the result significantly. The maximum likelihood method,hence, appears as an alternative to theleast squares method, whichproducesestimatesthataremorerobustwhenthenumberofinfre-quent large earthquakes changes.

If the magnitudes are distributed according to theGutenbergRichter law and we have a complete data for earth-quakes with magnitude M0 and larger, the maximum likelihoodestimate ofb is Aki [2] and Utsu [86]:

b =log e

MM0(4)

where M is the mean magnitude and M0 is the cutoff magnitude[67].

The authors in [81] also demonstrated that for large samplesand low temporal variations ofb, the standard deviation of theestimated b is:

2

= 2.30b2

(M) (5)where

2(M) =

ni=1(Mi M)

2

n(6)

where n is the number of events andMi is the magnitude of everyevent. Note that theb-valueuncertaintyis notrequiredin this workbecause it is based on references [50,54] whereonlyvariationsofb-valueare usedas seismic precursor,not consideringits uncertainty.

It should be pointed out that different databases provide dif-ferent locations, magnitudes and focal mechanisms in the catalogs[90]. Nevertheless, the data used in this study is that of provided byChiles National Seismological Service, as it is considered the mostconfident regarding Chile seismicity.

Since Gutenberg and Richter estimated the b-value in variousregions of the world, there is controversy between the investiga-tors about the spatial and temporal variations of theb-value. Someauthors [34,60] suggest that the value is universal and constantwhereas some other authors [5,19,84,93] assert that there are spa-tial and temporal variations of the b-value. It is true that for largeregionsandforlongperiodsoftimeb1.However,significantvari-ations have been observed in limited regions and for short periodsof time [40]. Local earthquakes have been associated with varia-tions of the b-value of days or even hours [23].

From all the aforementioned possibilities, the maximum likeli-hood method has been selected for the estimation of theb-value inthis work. The cutoff magnitude used isM0 =3.0.

It is expected that the ANNs will be able to incorporate the

GutenbergRichter law from the b-value variation series with the


4/15


2 3 4 5 6 7 8 9 10Ms

y

(Ms

)

M0

P(Ms) 6

Fig. 1. Probability of observing an earthquake with magnitude larger or equal to 6(from theinclusion of GutenbergRichters law to ANNs).

theoreticalprobabilitytopredictanearthquakeofmagnitudeequalor larger to 6.0Ms, in concordance with the current configurationof the probability density function, considering the last 50 earth-quakes in the calculation procedure (see Fig. 1).

3.2. The b-value as seismic precursor

The b-value of the GutenbergRichter law is an importantparameter, because it reflects the tectonics and geophysical prop-erties of the rocks and fluid pressure variations in the regionconcerned [40,100]. Thus, the analysis of its variation has beenoften used in earthquake prediction [54]. It is important to knowhowthe sequence ofb-values has been obtained,before presentingconclusions about its variation. The studies of both Gibowitz [20]

and Wiemer et al. [94] on the variation of the b-value over timerefer to aftershocks. The found an increase in b-value after largeearthquakes in NewZealand and a decrease before the next impor-tant aftershock. In general, they showed that theb-value tends todecrease when many earthquakes occur in a local area during ashort period of time.

Schorlemmer et al. [78] and Nuannin et al. [54] infer that theb-value is a stressmeter that depends inversely on differentialstress. Hence, Nuannin et al. [54] presented a very detailed anal-ysis on b-value variations. They studied the earthquakes in theAndamanSumatra region. To consider variations in b-value, asliding timewindow method was used. From the earthquake cat-alog, the b-value was calculated for a group of 50 events. Thenthe window was shifted by a time corresponding to five events.

They concluded that earthquakes are usually preceded by a largedecrease in b, although in some cases a small increase in thisvalue precedes the shock. Previously, the authors in [74] clarifiedthe stress changes in the fault and the variations on the stresschanges in the fault and the variations of theb-value surroundingan important earthquake. In fact,they claimed: A systematic studyoftemporalchangesinseismic b-valueshasshownthatlargeearth-quakes are often preceded by an intermediate-term increase in b,followed by a decrease in the months to weeks before the earth-quake.The onset of theb-value can precede earthquake occurrenceby as much as seven years.

The neural networks used in this work to predict earthquakesdo not use the temporal sequence of magnitude of earthquakesbut the temporal sequence of variations of the b-value (calculated

from the sequence of magnitudeof earthquakes), as the correlation

between the variations of the b-value and the later occurrence ofearthquakes is proved to be high.

3.3. Aftershocks

In the theoretical Poisson law dependent data (aftershocks andforeshocks) must be removed [51,68]. However, it is known thatalmost all larger earthquakes trigger aftershocks with a temporal

decaying probability [27]. In this study, the complete database ofearthquakes has been used because the variation of the b-value(which is the input data for the ANNs) is strongly correlated withthe aftershocks and foreshocks. It is intended that the ANNs pre-sentedin this paper incorporate the OmoriUtsu[88] and Baths [6]laws from the sequence of the b-value variations altogether withthevalueof thelargest earthquake registered in thelast seven days.

3.3.1. Omori/Utsus law

The rate of aftershocks with time follows Omoris law [88] inwhichthe aftershockfrequencydecreasesby roughly the reciprocalof time after the main shock according to:

N(t) =k

c+ t(7)

whereNistheoccurrencerateofaftershocks, tindicatestheelapsedtime since the mainshock [88], k is the amplitude, and the c-valueis the timeoffsetparameter which is a constant typically much lessthan one day.

The modified version of Omoris lawwas proposed by Utsu [85],which is known as OmoriUtsu law:

N(t) =k

(c+ t)p(8)

Thep-valueis thefitnessparameter, that modifies thedecayrate,and typically falls in the range 0.81.2 [88].

These patterns describe only the statistical behavior of after-shocks. The actual times, numbers and locations of the aftershocksare stochastic, while tending to follow these patterns.

Alternative models for describing the aftershock decaying ratehave been proposed [21,52], however the OmoriUtsu law gener-ally provides a very good adjustment to the data [27].

3.3.2. Baths law

The empirical Baths law [6] states that the average differencein magnitude between a mainshock and its largest aftershock isapproximately constant and equals to 1.11.2Mw, regardless of themainshock magnitude.

It is expected that the ANNs, presented in this paper, will beable to deuce Baths law from the incorporation of the magnitudeof the largest earthquake registered in the last seven days.

3.4. Seismic areas in Chile

A global analysis of mainland Chile would involve a huge spatialuncertainty since its surface is 756,000 km2. Thus, the predictionswould lack of interest because it is obvious that earthquakes arelikely to occur daily in a so vast area with an associated high rateoccurrence.

Also,Chiles geography presents different particularities; there-fore, a global analysis of all regions does not seem adequate in thissituation. Some of these peculiarities are now listed:

1. Every seismogenic area described in [69] is characterized by dif-ferent histograms based on GutenbergRichter law.

2. In Northern Chile the speed of subduction is 8 cm/year, whereasSouthern Chile presents a speed of 1.3cm/year.


5/15


Table 1

Seismic Chilean areas analyzed.

Parameters Seismic area Vertices Dimensions

Talca #3 (35CS, 72 W) and (36 S,71 W) 1 1

Pichilemu #4 (34 S, 72.5 W) and (34.5 S,72 W) 0.50.5

Santiago #5 (33 S,71 W) and (34 S,70 W) 1 1

Valparaso #6 (32.5 S,72 W) and (33.5 S,71 W) 1 1

3. Chile is longitudinally divided into coastal plains, coast moun-tains, intermediate depression, Andes mountains, which isrelated with different seismogenic sources: interplate thrust,medium depth intraplate, cortical and outer-rise.

4. Complex fault systems can be found across Continental Chile(Mejillones, San Ramn, Liquine-Ofqui faults, etc.),which meansthat every area has a specific fractal complexity, possibly char-acterized by Hurts exponent (Section 6.6).

Therefore, the zonification proposedby Reyesand Crdenas[69]is used in this work. Based on Kohonen-Self Organized Map(SOM),the authors demonstrated that considering the similarity of his-tograms, frequency versus magnitude, Chile can be divided into sixseismic regions.

The whole continental territory of Chile was thus divided into156 squares of 11 and the SOM defined six seismic regions.Later, it will be demonstrated that it is possible to predict earth-quakes in Chile using multilayer ANNs without creating 156 ANN(one for each square of the grid). Instead of the 156 ANNs, a pro-totype for each seismic region is created. However, the Southernregions 1 and 2 are characterized by a low seismic activity (thesum of earthquakes of magnitude equal or larger to 4Ms, for bothregions, in the period 20012010, is only 32). The joint surface ofboth zones, corresponding to the Chilean Patagonia, is approxi-mately 245,000km2, producing 1.3105 earthquakes per year andkm2. Furthermore, the subduction speed is 1.5cm/year approx-imately or, in other words, there occur mainly low magnitudeearthquakes (central Chiles speed of subduction is 7.5 cm/year).

Moreover,Chilean Patagoniadoes not generate the minimum num-ber of events to create the training sets. In short, the Southernregions lack of interest due its huge spatial uncertainty, its lowearthquake occurrence,and the low magnitudecharacterizingsuchshocks. In this study,only the four ANNs, corresponding to regions#36, are presented.

In Section 4, it will be demonstrated thata minimum of 70+ 122earthquakesisnecessarytocreateanANN.Furthermore,another50extra earthquakes arerequiredto test every ANN. Thelapseof time,from the database, used in this study, is limited from 1st January2001 to March 31st 2011 although the precise limits depend on thesquare under study.

To accumulate the largest quantity of earthquakes, for the sta-tistical analysis of an epicentral region, the observed area must beextended,whichincreasesthe spatial uncertainty of the prediction.In order to achieve themaximum spatial accuracy a plot that showsthe quantity of earthquakes observed for 2010 is shown in Fig. 2.

Thequantityofearthquakesismaximizedforthe34.5 Slatitude.This location guarantees the minimum quantity of data necessary(approximately 70+ 122 + 50) from the information contained inrelatively small areas.

Finally, the four seismic regions of Chile that will be modeled bythe ANNs are summarized in Table 1.

3.5. Chiles earthquake database

The database of earthquakes used in this paper has beenobtainedfromtheChilesNationalSeismologicalService.Inorderto

calculate the b-value from [4] the database must be complete. To

guarantee the completeness of the data the analysis should onlyinclude earthquakes recorded after the year of completeness ofthe database. So, once the seismic database has been obtained, thefirst step is to calculate, for every area, the year of completeness ofthe seismic catalog, defined as the year from which all the earth-quakes of magnitude larger or equal to the minimum magnitudehave been recorded. The Chilean earthquake database only con-tains earthquakes of magnitude equal or larger to 3.0. The year ofcompleteness of the seismic catalog can be graphically calculated.With that purpose the time is inversely plot in thex-axis and theaccumulated number of earthquakes in they-axis, as illustrated inFig. 3.

Itcanbeobservedthattheslopeofthecurve,thatrepresentstheannualrate,is approximately constant until 2001, been reduced for

thepreviousyears.Ifthearrivalofearthquakesisadmittedtofollowa Poisson stationary process, the annual rate should be time inde-pendent, so a systematic decrease of this rate, when it approachestheinitial years of thedatabasemust be concludedas a lack of com-pleteness, ought to the small number of seismic stations availableat that moment. In this case, 2001 is the year of completeness ofthe catalog for the four areas for a magnitude larger or equal to 3.0.

4. ANN applied to predict earthquakes in Chile

This section discusses the ANNs configuration for predictingearthquakes in Chile. In particular, the number of layers and neu-ronsperlayer,theactivationfunction,thetopologyandthelearningparadigm are introduced.

4.1. ANNs configuration

One neural network has been used for each seismic area. Notethat these areas are tagged accordingly to the main city existingin their area of interest or cells: Talca (seismic area #3), Pichilemu(seismic area #4), Santiago (seismic area #5) and Valparaso (seis-mic area #6).

Although one different ANN has been applied to each area, theyall share the same architecture. The common features are listed inTable 2.

Note that all the four ANNs have been constructed followingthe scheme discussed and successfully applied in [65], where the

15 20 25 30 35 40 45 500

50

100

150

200

250

300

Latitude (South)

Numberofseisms

Fig. 2. Seismicity in Chile during 2010.


6/15


2001

0

30

60

90

120

150

180

210

240

270

300

330

196119711981199120012011

Talca

Year

ycneuqe

Year

ycneuqe

2001

0

50

100

150

200

250

300

350

400

450

196119711981199120012011

Pichilemu

Year

ycneuqe

2001

0

200

400

600

800

196119711981199120012011

Santiago

2001

0

200

400

600

800

1000

1200

1400

1600

196119711981199120012011

Valparaiso

Year

ycneuqe

Fig. 3. Year of completeness forall thestudied areas.

authors used such an ANN configuration to predict atmosphericpollution.

4.2. Input neurons

Every time an earthquake occursat thecell subjected to analysisa new training vector, composed of seven inputs and one output,is created. First, the GutenbergRichter laws b-value is calculatedusing the last 50 quakes recorded [53]:

bi =log(e)

(1/50)49

j=0Mij 3(9)

where Mi is the magnitude for the ith earthquake and three is thereference magnitude, M0.

Then, increments ofb are calculated:

b1i = bi bi4 x1i (10)

b2i = bi4 bi8 x2i (11)

b3i = bi8 bi12 x3i (12)

b4i = bi12 bi16 x4i (13)

b5i = bi16 bi20 x5i (14)

Table 2

Commonfeatures in ANNs.

Parameters Values

Input neurons 7Neurons in hidden layer 15Output neurons 1Activation function Sigmoid shapeTopology of the network FeedforwardLearning paradigm Backpropagation

From these equations it can be concluded that 70 earthquakesare required to calculate thexi values, therefore, to obtain the fiveof the ANN inputs.

The sixth input variable, x6i, is the maximum magnitude Ms

from the quakes recorded during the last week in the area ana-lyzed. The use of this information as input is to indirectly providethe ANNs with the required information to model Omori/Utsu andBaths laws.

x6i = max{Ms}, when t [7,0) (15)

where the time tis measured in days.Note that Omori/Utsu law has been used to design the ANN

architecture, but the equality in Eq.(8) is not applied as such, sincethe correlation betweenx6i andtherateofaftershocksisquitehigh.Indeed, if the GutenbergRichter law was considered in any par-ticular area, it would be reported that, in a period equals to thecompleteness time, large earthquakes would generate many more

aftershocks than small ones. Therefore, it is expected that theinfor-mation contained in Omori/Utsu law (number aftershocks per unitof time) will be codified in variablex6i.

The lastinput variable,x7i, identifiesthe probability of recordinganearthquake withmagnitude largeror equalto 6.0. Theaddition ofthis information as input is to include GutenbergRichters lawin adynamic way. It is calculated from the probability density function(PDF):

x7i = P(Ms 6.0) = e3bi/log(e) = 103bi (16)

Finally, there is one output variable,yi, which is the maximummagnitude Ms observed in the cell under analysis, in the next five

days. Note that yi has been set to 0 for such situations where


7/15


no earthquake with magnitude equal or greater to 3, Ms3 wasrecorded. Formally:

yi = max{Ms}, when t (0,5] (17)

where the time tis measured in days.Mathematically, the training vector associated to the ith earth-

quake can be expressed as:

Ti = {x1i, x2i, x3i, x4i, x5i, x6i, x7i, yi} (18)The minimum number of input vectors linearly independent

forming the training set depends on the number of synapticweights, discussed in Section 4.3.

4.3. Hidden layer neurons

Although the four ANNs have one input layer, the number ofhidden layers, and in particular the number of neurons formingthose layers, may vary.

The number of input neurons has already been discussed andset to 7. Also, following with the scheme introduced in [65], thenumber of hidden layers is 1. To determine the number of neuronsthat compose the hidden layer, the Kolmogorovs theorem[39] hasbeen applied. In particular, this theorem was adapted to ANNs byHecht-Nielsen [28]. The theorem states that an arbitrary continu-ous functionf(x1,x2, . . .,xn)on an n-dimensional cube (of arbitrarydimension n) can be represented as a composition of addition of2n+ 1 functions of one variable. Therefore, the number of neuronsin the hidden layer is 2 7+1 = 15, as there are n= 7 input neurons.

In general, the number of synaptic connections of an ANN is cal-culated as: [input neurons][hidden layer neurons] + [hidden layerneurons][number of hidden layers] +2. Consequently, the num-ber of connections is 7 15+15 1+2= 122. This number meansthat a minimum of 122 training vectors, linearly independent,are required to find all these 122 synaptic connections. However,the training set may include more vectors particularly selected toimprove the learning under certain situations (i.e. to increase or

decrease the mean value ofyi during the selftest).

4.4. Output neuron

Regarding the outputs, the ANNs only have one: the maximumvalue observed in the quakes occurred during the next five days,in their corresponding cell. The procedure followed to trigger theANN is now detailed. The mean magnitude Ms is calculated fromthe 122earthquakes observed during the training set and, then, 0.6times the standard deviation is added to this value. Therefore thisvalue is the threshold that activates the ANNs output.

4.5. Activation function

The activation function selected is the sigmoid. Mathematically,this function is formulated as:

(Xi) =1

1+ eXi(19)

where

Xi =

i

wijxi

(20)

and wij are the connection weights between unit i and unitj, andui are the signals arriving from unit i. The signal generated by uniti is sent to every node in the following layer or is registered as anoutput if the output layer is reached.

4.6. Network topology

A feedforward neural network is an ANN where connectionsbetween the units do not form a directed cycle (unlike RecurrentNeural Networks, RNN [31]).

The feedforward neural network wasthe first and simplest typeof artificial neural network devised. In this network, the informa-tion moves in only one direction, forward, from the input nodes,through the hidden nodes and to the output nodes. There are nocycles or loops in the network.

4.7. Learning paradigm

The learning method selected is backpropagation [73], a com-mon supervised method to train ANNs, especially useful forfeedforward networks, the topology chosen for these ANNs. Back-propagation usually consists of two main phases: propagation andweight update.

5. Training the ANNs

This section introduces the quality parameters used to train

the ANNs. Also, the results obtained after the training phase arepresented and discussed for every area.Asstatedin [33], anypredictionshouldbetestedattwostages:A

training period, when suitability of the method is ascertained, andvaluesofadjustableparametersestablished,andaevaluationstage,wherenoparameterfittingisallowed.Thisisexactlywhatitisdonein this work, presenting in this section the first stage (the trainingof the method) and in Section 6 the second stage (evaluation of themodel).

5.1. Performance evaluation

ToassesstheperformanceoftheANNsdesigned,severalparam-eters have been used. In particular:

1. True positives (TP). The number of times that the ANN predictedan earthquake and an earthquake did occur during the next fivedays.

2. True negatives (TN). The number of times that the ANN did notpredict an earthquake and no earthquake occurred.

3. Falsepositives (FP). The number of times that the ANN predictedan earthquake but no earthquake occurred during the next fivedays.

4. False negatives (FN). The number of times that the ANN did notpredict an earthquake but an earthquake did occur during thenext five days.

If the output does not exceed the threshold and the maximumobserved magnitude M

sduring the next five days is less than the

threshold(Ms < Tk innextfivedays),thissituationiscalledzero-levelhit, and denoted byP0. On the other hand, if the output thresholdandthemaximumobservedmagnitudeMs duringthenextfivedaysis larger than the threshold (Ms Tk in next five days), the situationis calledone-level hit, and denoted byP1. Both probabilities, widelyused by seismologists to evaluate the performance of approaches,are calculated as follows:

P0 =#(zero-level hits)

N0=

TN

TN+ FN(21)

P1 =#(one-level hits)

N1=

TP

TP+ FP(22)

where N0 =TN+FNdenotes the times that the ANN predicted the

zero-level and N1 =TP+FP the times that the ANN predicted the


8/15


Table 3

Training values forTalca.

Parameters Value

TP 17TN 77FP 10FN 18P1 63.0%P0 81.1%

Sn 48.6%Sp 88.5%Average 70.3%

one-level. Obviously,N=N0 +N1 denotes the total number of pos-sible predictions.

Additionally, two more parameters have been used to evalu-ate the performance of the ANNs, as they correspond to commonstatistical measures of supervised classifiers performance. Thesetwo parameters, sensitivity or rate of actual positives correctlyidentified as such (denoted by Sn) and specificity or rate of actualnegatives correctly identified (denoted by Sp), are defined as:

Sn = TPTP+ FN

(23)

Sp =TN

TN+ FP(24)

It is worth noting that the use of the first two parameters (P0,P1) is particularly suitable for earthquake prediction as they satisfythe prediction criterion provided by several authorities [3,97].

Also note that several empirical rules, derived from a 17-yearworkintheUniversityofSantiagosNationalCenterofEnvironment[57,65], are used to accurately train the networks. These rules are:

1. P0

and P1

values greater than 95% in the training set involvesoverlearning, that is,the ANNmaynotbe capable of generalizingresults. This effectcan beavoided bymodifying this setrepeatingor adding new vectors to the original training set until reachingsatisfactory probabilities, but never greater than 95%.

2. The lower P0 value in the training set must be greater than 70%.If this value is not reached, the training set has to be modifieduntil reaching acceptable values.

3. Sometimes, the ANN may not be able to predict particularly loworhigh valuesin thetraining set(probability= 0/0). This situationcan be avoided whether adding vectors with very low or veryhigh outputs.

5.2. ANN for Talca

The time interval used to train this ANN was from June 19th2003 to March 21st 2010. However, when the selftest was carriedout(to test using the training set),P1 was equal to 0%, meaning thatthe ANN was unable to generate high values ofyi. This issue wassolved byadding three times to the training set the 20vectors withhighervalues (magnitude largeror equal to4.2)and training for500epochs. This new situation allowed the ANN to obtain satisfactoryP1 valuesduringtheselftest,allowingthusthelearningofsituationswhen the ANNs output was activated.

The values of the quality parameters obtained during the train-ing of this ANN are summarized in Table 3. Note that when theoutput reached values less than 3.0, these values were arbitrarilysubstituted by 2.9, since no values less than 3.0 are present in the

dataset.

Table 4

Training values for Pichilemu.

Parameters Value

TP 56TN 39FP 11FN 16P1 70.9%P0 83.6%

Sn 77.8%Sp 78.0%Average 77.6%

Table 5

Training values for Santiago.

Parameters Value

TP 10TN 94FP 2FN 26P1 83.3%P0 78.3%Sn 27.8%Sp 97.9%

Average 71.8%

5.3. ANN for Pichilemu

The time interval used to train this ANN was from August 10th2005 to March 31st 2010. In this area, the vectors presented a highlogiccoherence,asevidencedtheresultsobtainedafter500epochs.Such results are shown in Table 4.

5.4. ANN for Santiago

ThetimeintervalusedtotrainthisANNwasfromMay13th2003to June 2nd 2004. After the first 500 training epochs,P1 was equal

to 0%. This problem was overcome by adding 10 linearindependentvector, registered before May 13th 2003, and 10 repeated vectorsfrom theoriginalset with a particular feature: they represented themaximumpossiblevaluesregisteredofyi.Theresultsobtainedafterthe modification of the original training set are shown inTable 5.

5.5. ANN for Valparaso

The time interval used to train this ANN was from 31st January2006 to December 19th 2008. The training results are shown inTable 6.

6. Results

This section shows the results obtained when the proposedANNs were applied to the test sets representing the four

Table 6

Training values for Valparaso.

Parameters Value

TP 15TN 77FP 5FN 25P1 75.0%P0 75.5%Sn 37.5%Sp 93.9%Average 70.5%


9/15


seismogenic Chilean zonesanalyzed. Thesesets can be downloadedfrom the University of Chiles National Service of Seismology [55].

First, the type of predictions performed by the ANN areintroduced. Then, the results for every area are summarized interms of the quality parameters described in Section 5.1. Finally,the last subsection is devoted to discuss the results achieved.

Note that, to theauthorsknowledge, there is no other approachcapable of predicting earthquake using only the information ana-lyzed in this work.However,for thesakeof comparison, theauthorshave turned the prediction problem into a classification one, sothat well-known classifiers can be applied and compared. Notethat this conversion has already been used in [1]. To achieve thisgoal, a new attribute has been added: A label has been assigned toevery sample (composed of the attributes described in Section 4.2)indicating whether an earthquake with magnitude larger than Toccurred within the following seven days or not. That is, the valuesrepresented by the label are 1 if such an earthquake occurred, and0 ifnot.

Themethodsselectedforthecomparativeanalysisare K-nearestneighbors (KNN) [13], support-vector machines (SVM) [12], andclassificationvia K-meansclusteringalgorithm[44]. Allthesemeth-ods are implemented on the Weka platform [59] and a guide tooptimally set these classifiers can be found in [96]. These methodshave been chosen because they cover the different classificationstrategies based on machine learning.

6.1. Type of predictions

The ANNs designed are capable of providing two kind of pre-dictions:

1. There isa probability P1 thatanearthquakeoccurswithMs mag-nitude larger or equal to T, during the next five days, in the areasubjected to analysis.

2. There is a probability P1 that an earthquake occurs with magni-tude in intervalMs [T, E], during the next five days, in the areasubjected to analysis.

Tdenotes the threshold that activates the ANNs output, calcu-lated as the addition of the mean value of the 122 earthquakesobserved duringthe training periodand 0.6times itsstandarddevi-ation:

T= mean(Ms) + 0.6Ms (25)

whereMs isthesetofmagnitudesoftheearthquakesinthetrainingset.

Ecould be arbitrarily selected and represents theend PDFs cut-off value. However, considering the social impact that a false alarmmay cause,thevalueforEchosenisthemaximumyi valueobservedduring the training period Eq.(26) shows this value:

E= max(yi) such that yi Training period (26)

P1 can be calculated according to two different strategies. First,P1 can be the observed P1 value during the training set. Alterna-tively, if the historical data the area under study was long enough(if many earthquakes were available to construct the system), itwould be desirable to construct several ANNs, each one consider-ing at least 122 training vectors chronologically ordered. That is,if the historical data is such that the number of training vectorsis much greater than 122, consecutive ANNs are constructed usingonly 122training vectors each (i.e. first ANNwould usethe first 122training vectors, second ANN vectors 123244, and so on), leadingto robust estimations [58,65]. For this second situation:

P

1= QPCTS

1(27)

1 2 3 4 5 6 7 8 9Ms

y(M

s)

T EM

0

P(Ms

< 3) = 0

Fig. 4. GutenbergRichters law. Obtention of the probability that an earthquakewith Ms [T, E] occurs fromthe PDF.

where PCTS1 is the P1 associated with the current training set, andQ is the average quotient of the observed values ofP1 in the nconsecutive ANNs constructed before the current one is analyzed(containing the 122n training vectors occurred before the 122onesselected to train the current ANN). Formally:

Q= 1n

ni=1

P(i)test

P(i)training

(28)

Unfortunately, the datasets analyzed (periods 20012011) donot meet the required condition in any area, therefore P1 = P1 forall the four areas.

Fig. 4 illustrates how to calculate P1 from theGutenbergRichters law. Mathematically,P1 can be formulated as:

P

1 = P1P (29)

where Pis:

P= E

TdMsy(Ms)

T dMsy

(Ms

)(30)

6.2. Earthquake prediction in Talca

For this cell, the training set contained the 122 linearly inde-pendent vectors occurred from June 19th 2003 to March 21st 2010.Analogously, the test set included the 45 vectors generated fromMarch 24th 2010 to 4th January 2011.

Table 7 shows the results in terms of the quality parametersdescribed in Section 5.1 as well as theresults obtained by themeth-ods the ANN is compared to. Note that both KNN and SVM wereunable to distinguish between earthquake and no earthquake, inother words, they classified all the samples as absence of earth-quake (obviously, the specificity is then 100%). By contrast, theK-means classified all the samples in the test set as earthquake(obviously, the sensitivity is then 100%). The dash means that thevaluecouldnotbecalculatedasadivisionbyzerowas encountered.

Table 7

ANNs performance for Talca.

Parameters ANN KNN SVM K-means

TP 3 0 0 8TN 23 37 37 0FP 14 0 0 37FN 5 8 8 0P1 17.6% 17.8%P0 82.1% 82.2% 82.2% Sn 37.5% 0.0% 0.0% 100%Sp 62.2% 100% 100% 0.0%Average 49.86%


10/15


Table 8

ANN performance for Pichilemu.


TP 13 20 0 16TN 91 38 93 84FP 2 55 0 9FN 16 9 29 13P1 86.7% 26.7% 64.0%P0 85.0% 80.9% 76.2% 86.6%

Sn 52.0% 69.0% 0.0% 55.2%Sp 97.8% 40.9% 100% 90.3%Average 78.6% 54.3% 74.0%

P1 =17.6% might be understood as an unsatisfactory result.However, it is important to highlight that in the seismology fieldlogarithmic scales are usually applied in earthquake occurrenceprobabilities [56].

6.3. Earthquake prediction in Pichilemu

For this cell, the training set contained the linearly independentvectors occurred from August 10th 2005 to March 31st 2010. Anal-ogously, the test set included the vectors generated from April 1st

2010 to October 8th 2011.Both sets contain 122 vectors. However, every new earthquakedeeply affects the soil geometry in this area and, therefore, to pre-dict the earthquake 123 the ANN had to be retrained. In particular,the 122 first vectors of the initial test set were used as a new train-ing set for earthquake 123 and successive earthquakes. In practicalterms the prediction of a maximum of 50 earthquakes is advisedfor further research work. Anyway, this paper is focused on study-ing the extreme situation, where the test set was composed of 122vectors.

Table 8 shows the results in terms of the quality parametersdescribedin Section 5.1, as well as theresults obtained by theothermachinelearningclassifiers.AsoccurredinTalca,theSVMclassifiedall samples as no earthquake. KNN obtained poorer results thanthose of the ANN. As for the results ofK-means, they were slightlyworse than those of the ANN on average, however the number ofFP was 9, more than four times greater than in then ANN.

6.4. Earthquake prediction in Santiago

For this cell, the training set contained the 122 linearly inde-pendent vectors occurred from May 13th 2003 to June 2nd 2004.Analogously, the test set included the 122 vectors generated from

June 23rd 2004 to 16th January 2006.Table 9 shows the results in terms of the quality parameters

described in Section 5.1. The results of the application of the otherclassifiers is also summarized in this table. Again, the SVM wasunable to distinguish between earthquake and no earthquake, andin particular, this time classified all the samples as earthquake. Ontheotherhand,theANN,KNNandK-means obtainedsimilar resultson average (KNN slightly worse and K-means slightly better).

Table 9

ANN performance for Santiago.


TP 5 6 0 7FN 101 93 108 95FP 7 15 14 13TN 9 8 0 7P1 41.7% 28.6% 0.0% 35.0%P0 91.8% 92.1% 100% 93.1%Sn 35.7% 42.9% 50.0%Sp 93.5% 86.1% 88.5% 88.0%Average 65.7% 62.4% 66.5%

Table 10

ANN performance for Valparaso.


TP 20 30 35 43TN 59 52 46 0FP 3 10 16 62FN 24 14 9 1P1 87.0% 75.0% 68.6% 41.0%P0 71.1% 78.8% 83.6% 0.0%

Sn 45.5% 68.2% 79.5% 97.7%Sp 95.2% 83.9% 74.2% 0.0%Average 74.7% 76.5% 76.5% 34.7%

However, to reach these values both methods reported twice moreFPs than the ANN did, which was one of the main objectives set atthe beginning: To have the smallest number of FPs possible.

6.5. Earthquake prediction in Valparaso

Forthis cell, thetrainingset containedthe 122 linearly indepen-dent vectors occurred from 31st January 2006 to December 19th2008. Analogously, the test set included the 106 vectors generatedfrom December 19th 2008 to February 10th 2011.

Table 10 shows the results in terms of the quality parametersdescribed in Section 5.1. The results derived from the applicationof other classifiers is also reported. This time wastheK-means whoshowed its inability to distinguish between both situations. In par-ticular it also classified one sample as no upcoming earthquakeand it did it erroneously. On the contrary, KNN and SVM exhibitedslightly better behavior on average. But again, when examining theFPs they obtained values much larger than those of the ANN.

6.6. Discussion of the results

The high values ofP0 and P1 obtained for all the zones indi-cate that the input variables were, indeed, strongly correlated withthe observed magnitude in a near future. The ANNs were capable

of indirectly learning Omori/Utsu and GutenbergRichters laws,confirming thus the great ability these techniques have in the seis-mology field [69]. This fact confirms that the choice of such inputvectors was adequate.

With reference to the specificity, all the zones obtained valuesespecially high (87.2% on average). This fact is of the utmost signif-icance, as it is extremely important not to activate false alarms inseismology due to the social impact they may cause. The sensitiv-ityachieved is, in general, not so high (40.9% on average). However,the scientific community accepts the fact of missing one-level hits[9] and such a sensitivity can be considered successful consideringthe application domain and the rest of the results.

From allthe networks constructed,the oneapplied to Pichilemuwas the easiest to construct and, also, the one with better perfor-

mance for both training and test sets. This fact can be understoodas that forPichilemu thecausality between oneearthquake andthenext one is greater than in the remaining areas. Following with ageological interpretation, it is probable that the subsoil configura-tion is such that the occurrence of seisms cracks the soil causingnew seisms in the near surroundings. This could be the reason forthe area of Pichilemu to be more predictable than the other threeones.

Regarding the comparison made to other methods, several con-clusions can be drawn. First of all, none of them obtained betterresults in more than one area and when this fact happened, theresults were just slightly better on average and triggered muchmore FPs. Secondly, there were many situations in which the algo-rithms classified all the samples into a same class. Finally, the

results obtained could be considered satisfactory in other contexts


11/15


as some of them provides consistent results. This means that theinput parameters have been properly chosen, providing thus onesatisfactory solution for one of the most critical problems in earth-quake prediction.

On theotherhand,systems with memorywhosepreviousstatesinfluence next ones (such as twisters, stock market or atmosphericpollution)presentafractalstructureduetotheinherentfeedbackinthe variables causing these phenomena. In this sense, Omori/Utsuand Baths laws reveal that aftershocks cause more aftershocks.That is,the land is characterized by feedbackloopsbetween seisms,geometrical alterations andcracks that affectits state.It canbe con-cluded, then, that time series of earthquakes must present a fractalstructure [46].

Therefore the authors wondered if, from a single chronologi-cally ordered series of quakes, it could be mathematically inducedthat the earthquakes in Pichilemu were more predictable than inother areas. It is well-known that the fractal dimension of a timeseries expresses the degree of causal interdependencebetween onedatum and another[41]. Moreover, the inverse fractal dimension ofa time series, the so-called Hurst exponent (H), relates autocorre-lations and lag pairs in time series. The application ofHto the fourareas revealed that Pichilemu has a value notably higher (Talca:H= 0.0037, Pichilemu: H= 0.1580, Santiago:H= 0.0729, Valparaso:H= 0.0538), which supports the initial assumption and encouragesfurther research in this way.

7. Additional remarks

This section is devoted to discuss the robustness of the resultsobtained, providing a discussion about possible earthquake clus-tering and analyzing different time intervals.

7.1. About the influence of earthquake clustering

One may think that the high performance of the designedANNsis dueto earthquake clustering influence. Indeed, Kagan[33]

stated: Any prospective earthquakeprediction technique needstodemonstrate that its success is not due to the influence of earth-quake clustering (foreshockmainshockaftershock sequences).On the other hand, earthquakes are inherently presented in clus-ters. Any model that ignores this fact could be, therefore, pointedas incomplete. To this regard, Zechar and Jordan[97] claimed that:Earthquakes cluster in space and time, and therefore, any forecastthat captures this clusteringbehavior should outperform a uniformreference model.

In this section, the hypotheses obtained by comparing theresults described in Section 6 with those obtainable by means ofthe null hypothesis is discussed. To complete this task a two-stepstrategy has been used:

1. First, the Poisson probability for each hit supposing pure chanceis calculated.2. Second, the probability of obtaining a result better than that of

summarized in Tables 710 is calculated, using the probabilityobtained in the first step.

Even if there is evidence that thePoisson distributionis notidealfor describing earthquakenumber variation[35,76], such referencemodel has been usedas done byZecharand Zhuang[98], duetoitsuniform applicability to all alarms and because of its simplicity-namely, it includes few assumptions and it is easily interpreted.

Actually, the authors listed 29 RTP alarms in this paper andthey accepted as one-level hits statements with a maximum Pois-son probability of 29.12%. As shown in Table 11, the probabilities

reported by the constructed ANNs are below this threshold. The

Table 11

Influence of earthquake clustering in theproposed models.

Parameter Talca Pichilemu Santiago Valparaso

h 3 13 5 20N 8 29 14 44PA 0.049 0.080 0.233 0.100P 5.47103 1.09107 0.210 1.61109

following summarizes the Poisson probability calculations usingonly the information prior to the test set, and taking into consider-ation that the horizon of prediction was set to five days:

1. Talca. 25 one-level events (or target earthquakes), in a period of2467 days. Therefore, the Poisson probability within five days is4.9%.

2. Pichilemu.28one-levelevents(ortargetearthquakes),inaperiodof 1694 days. Therefore, the Poisson probability within five daysis 8.0%.

3. Santiago.19 one-level events (or target earthquakes),in a periodof 386 days. Therefore, the Poisson probability within five daysis 23.3%.

4. Valparaso. 22 one-level events (or target earthquakes), in a

period of 1053 days. Therefore, the Poisson probability withinfive days is 10.0%.

Now,theprobabilityofobtainingresultsbetterthanthehitratespresented in Tables 710 is calculated, by means of the techniqueintroduced in [97]. This probability is calculated as follows:

P=

Nn=h

B(n|PA) (31)

whereh isthenumberofhits,Nisthenumberoftargetearthquakes,PA is the true probability of occurrence of an earthquake exceed-ing the threshold in the area of study, and B(n|PA) is the binomialdistribution, calculated as:

B(n|PA) = PUA (1 PA)

Nu (32)

Given this equation, Table 11 summarizes the parametersobtained for every area.

Although the probability in Santiago is 21.0%, the probabilityassociated with other areas is very low (p1). Therefore, in allcasesthehypothesismostlikelytobecorrectisnotthenullhypoth-esis, but the one that states that the rates achieved by the ANNsare superior to pure chance. Note that even if the null hypothesisis not rejected with a small probability (typically 5% in sociologicalresearch [18] or 1% in biomedical research [72]), this does notmeanit is true, as thoroughly discussed by Dimer de Oliveira [14].

7.2. About the use of different time intervals

Again, one may think that the results shown are the best onesfound from a large dataset. This section is therefore to prove therobustness of the ANNs built over time, discussing the resultsobtained for different time intervals.

As discussed in Section 6.1, the generalization ability for anyANN decreases over time, fact reflected in the existence of the Qfactor see Eq. (28). Consequently, the synaptic weights have tobe periodically recalculated so that the ANN is always updated,since ANNs can only be applied to predict the same number ofitems forming thetrainingset, considering this numberas itsupperbound. In other words, the test set can contain maximum thesame number of elements existing in the training set as shown,

for instance, in [79].


12/15


A B C16065

70

75

80

85

90Zero level hits in Talca

Scenarios

(%)

P(train.)

0

P(test)

0

mean(P0)

A B C110

20

30

40

50

60

70

Scenarios

(%)

One level hits in Talca

P(train.)

1

P(test)

1

mean(P1)

Fig. 5. Variations ofP0 and P1 in Talca, fordifferent scenarios.

A B C14550

55

60

65

70

75

80

85

90

95

100

Zero level hits in Pichilemu

Scenarios

(%)

P(train.)

0

P(test)0

mean(P0)

A B C11020

30

40

50

60

70

80

90

Scenarios

(%)

One level hits in Pichilemu

P(train.)

1

P(test)1

mean(P1)

Fig. 6. Variations ofP0 and P1 in Pichilemu, for different scenarios.

C2 A B6065

70

75

80

85

90

95

Scenarios

(%)

Zero level hits in Santiago

P(train.)

0

P(test)

0

mean(P0)

C2 A B2030

40

50

60

70

80

One level hits in Santiago

Scenarios

(%)

P(train.)

1

P(test)

1

mean(P1)

Fig. 7. Variations ofP0 and P1 in Santiago, for different scenarios.


13/15


C2 A B3040

50

60

70

80

90

Scenarios

(%)

Zero level hits in Valparaso

P(train.)

0

P(test)

0

mean(P0)

C2 A B4550

55

60

65

70

75

80

85

90

95

One level hits in Valparaso

Scenarios

(%)

P(train.)

1

P(test)

1

mean(P1)

Fig. 8. Variations ofP0 and P1 in Valparaso, for different scenarios.

On the other hand, a work based on the application of ANNs to

predict atmospheric pollution [65] used different synaptic weightsfor a same architecture to predict different time intervals (seeTables 15 of this paper for further information).

The authors want to highlight that this standard weights updat-ing are notrelated with the possibility of being tuning the model inretrospective experiments to fit the data, as training and test setsare not overlapped in any element (the intersection of both sets is).

To show the robustness of the architecture designed, the ANNsare applied to three different time intervals. To make such a com-parison, the original dataset was updated and extended so that theANNs could be applied to new events: The new dataset time inter-valis from 1stJanuary2001to March 14th 2012. Thearea subjectedto analysis with less earthquakes is Talca (266), which led to the

definition of three new scenarios:

1. Scenario A. The original scenario described in Section 6.2. Scenario B. The new training set was slid 15 earthquakes from

the original one forward.3. Scenario C1. The new training set was slid 30 earthquakes from

the original one forward.4. Scenario C2. The new training set was slid 15 earthquakes from

the original one backward.

The use of the scenarios C1 or C2 depends on the number ofearthquakes found when the dataset was extended. As the datasetcould not be slid 30 earthquakes forward in Santiago and Val-paraso, the scenario C

2was chosen for these two areas and,

contrarily, scenario C1 was selected for both Talca and Pichilemu.Obviously, the ANNs architectures did not change and the

synaptic weights were calculated using only the information con-tained in the training sets (no information from the test sets wereused to adjust the model), using the same restrictions described inSection 5.

Figs. 58 show the performance of the ANNs for the three sce-narios described above. The parameters depicted on them mean:

1. P(train.)0 is the P0 value obtained in the training set.

2. P(test)0 is the P0 value obtained in the test set.

3. mean(P0) = (P(train.)0 + P

(test)0 )/2

4. P(train.)

1

is the P1 value obtained in the training set.

Table 12

P0 values fordifferent periods of time.

Scenario Talca Pichilemu Santiago Valparaso Mean (%)

A 82.1% 85.0% 91.8% 71.1% 82.5B 89.7% 90.9% 93.6% 51.1% 81.3C1 87.8% 98.7% 93.3C2 90.3% 66.0% 78.2

Table 13

P1 values fordifferent periods of time.

Scenario Talca Pichilemu Santiago Valparaso Mean (%)

A 17.6% 86.7% 41.7% 87.0% 58.3B 16.7% 83.8% 41.7% 91.7% 58.5C1 25.0% 12.8% 18.9C2 55.6% 66.7% 61.2

5. P(test)1 is the P1 value obtained in the test set.

6. mean(P1) = (P(train.)1 + P

(test)1 )/2

From all these figures, it can be concluded that the ANNs pre-sented robust and consistent results regardless of the time intervalused to assess their accuracy. Nevertheless, Tables 12 and 13 arealso provided to quantify these values for all the three studiedscenarios.

As it can be observed, P0 values remain almost constant inall scenarios, even exhibiting some degree of improvement onaverage when the training and test sets were slid forward. A sim-

ilar situation occurs with P1 values, except for the scenario C1 inPichilemu. However, this fact is partly due to the inclusion of theaftershocks occurred after the 8.8Mw earthquake in February 27th2010. This involves that the threshold set during the training stagewas much bigger than earthquakes reported in its associated testset Omori/Utsu law, Eq.(8). It is also possible that the uncertaintyinthetrainingishigh,whichmaynegativelyinfluence the results asstated by Wang [90]: the uncertainties decrease with time, exceptthat they increase temporarily after large earthquakes.

8. Conclusions

Three-layer feedforward artificial neural networks have beenconstructed to predict earthquakes in four seismogenic areas in

Chile. These networks included as input parameters information


14/15
http://ssn.dgf.uchile.cl/seismo.htmlhttp://ssn.dgf.uchile.cl/seismo.htmlhttp://www.geofisica.cl/


15/15


[56] US Deparment of Interior. US Geological Survey Website. http://earthquake.usgs.gov, 2011.

[57] University of Santiago. National Centre of Environment. http://www.geofisica.cl/macam/cma usach.htm

[58] University of Santiago de Chile. Web service: Online PM10 forecasting.http://fisica.usach.cl/pronostico/index.htm

[59] WEKA The University of Waikatu. Data mining with open source machinelearning software in Java. http://www.cs.waikato.ac.nz/ml/weka/

[60] E.A. Okal, B.A. Romanovicz, On thevariation ofb-value with earthquake size,Physics of theEarth andPlanetary Interiors 87 (1994) 5576.

[61] F. Omori,Macroseismic measurements in Tokyo,II and III, EarthquakeInves-

tigation Communications 11 (1902) 195.[62] International Commission on Earthquake Forecasting for Civil Protection,Operational earthquake forecasting. State of knowledge and guidelines forutilization,Annals of Geophysics 54 (4)(2011)2127.

[63] A. Panakkat,H. Adeli,Neural network modelsfor earthquakemagnitude pre-diction using multiple seismicity indicators, International Journal of NeuralSystems 17 (1) (2007) 1333.

[64] A. Panakkat, H. Adeli, Recurrentneural network forapproximateearthquaketime and location predictionusing multiple sesimicity indicators,Computer-Aided Civil and Infrastructure Engineering 24 (2009) 280292.

[65] P. Prez, J. Reyes, An integrated neural network model for PM10 forecasting,Atmospheric Environment 40 (2006) 28452851.

[66] M.D. Petersen, T. Cao, K.W. Campbell, A.D. Frankel, Time-independent andtime-dependent seismic hazard assessment for the State of California:Uniform California Earthquake Rupture Forecast Model 1.0, SeismologicalResearch Letters 78 (1)(2007)99109.

[67] G. Ranalli, A statistical study of aftershock sequences, Annali di Geofisica 22(1969) 359397.

[68] P.A.Reasenberg,L.M. Jones, Earthquakehazardaftera mainshockin California,Science 243 (4895) (1989) 11731176.

[69] J. Reyes, V. Crdenas, A Chilean seismic regionalization through a Kohonenneural network, Neural Computing and Applications 19 (2010) 10811087.

[70] D.A. Rhoades, Application of the EEPAS model to forecasting earthquakes ofmoderate magnitude in Southern California, Seismological Research Letters78 (1) (2007) 110115.

[71] C.F. Richter, An instrumental magnitude scale, Bulletin of the SeismologicalSociety of America 25 (1935) 132.

[72] R. Romero-Zliz, C. Rubio-Escudero, J.P. Cobb, F. Herrera, O. Cordn, I. Zwir,A multiobjective evolutionary conceptual clustering methodology for geneannotation within structural databases: a case of study on the gene ontol-ogy database, IEEE Transactions on Evolutionary Computation122 (6) (2008)679701.

[73] D.E. Rumelhart, G.E. Hinton, R.J. Willians, Learning Internal Representationsby Error Propagation,MIT Press, Massachussets, 1986.

[74] P.R. Sammonds, P.G. Meredith, I.G. Main, Role of pore fluid in the generationofseismic precursors to shear fracture, Nature 359 (1992) 228230.

[75] D. Schorlemmer, S. Wiemer, M. Wyss, Earthquake statistics at Parkfield: 1.

Stationarityof b values, Journal of Geophysical Research 109 (2004) B12307.[76] D. Schorlemmer,J.D. Zechar, M.J. Werner, E.H. Field,D.D. Jackson, T.H. Jordan,First results of theRegional Earthquake Likelihood Models experiment, Pureand Applied Geophysics 167 (2010) 89.

[77] D.G. Schorlemmer, S. Wiemer, Microseismicity data forecast rupture area,Nature 434 (7037) (2005) 1086.

[78] D.G. Schorlemmer, S. Wiemer, M. Wyss, Variations in earthquake-size dis-tribution across different stress regimes, Nature 437 (7058) (2005) 539542.

[79] F.M. Shah, M.K. Hasan, M.M. Hoque, S. Ahmmed, Architecture and weightoptimization of ANN using sensitive analysis and adaptive particle swarm

optimization, International Journal of Computer Science and Network Secu-rity 10 (8) (2010) 103111.

[80] Z.Z. Shen, D.D. Jackson, Y.Y. Kagan, Implications of geodetic strain rate forfuture earthquakes, with a five-year forecast of M5 Earthquakes in SouthernCalifornia, Seismological Research Letters 78 (1) (2007) 116120.

[ 81] Y. Shi, B.A. Bolt, The standard error of the magni tudefrequency bvalue, Bulletin of the Seismological Society of America 72 (5) (1982)16771687.

[82] S. Srilakshmi, R.K. Tiwari, Model dissection from earthquake time series: acomparative analysis using nonlinear forecasting and artificial neural net-work approach, Computers and Geosciences 35 (2009) 191204.

[83] K.F. Tiampo, R. Shcherbakov, Seismicity-based earthquake forecasting tech-niques: tenyears of progress, Tectonophysics 522523 (2012) 89121.[ 84] C .I. Trifu , V.I. Shumila , A method for multidimensional anal ysi s of

earthquake frequencymagni tude distribution with and applica-ti on to the Vrancea region of Romania , Tectonophysics 261 (1997)922.

[85] T. Utsu,A statisticalstudy on theoccurrence of aftershocks, GeophysicalMag-azine 30 (1961) 521605.

[86] T. Utsu, A method for determining the value ofb in a formula logn= abMshowingthe magnitudefrequencyrelationfor earthquakes, Geophysical bul-letin of Hokkaido University13 (1965) 99103.

[87] T. Utsu, Representation and analysis of the earthquake size distribution: ahistorical review and some new approaches, Pure and Applied Geophysics155 (1999) 509535.

[88] T. Utsu, Y. Ogata, R.S. Matsuura, The centenary of the Omori formula for adecay law of aftershock activity, Journal of Physics of the Earth 43 (1995)133.

[89] P. Varotsos,K. Alexopoulos, K.Nomicos,Seismicelectriccurrents, Proceedingsofthe Academy of Athens 56 (1981) 277286.

[90] Q. Wang, D.D. Jackson, Y.Y. Kagan,California Earthquakes,18002007: a uni-fied catalog with moment magnitudes, uncertainties, and focal mechanisms,Seismological Research Letters 80 (3) (2009) 446457.

[91] P.L. Ward, R.A. Page, The Loma Prieta Earthquake of October 17, 1989. Whathappened what is expected what can be done. Technical report. UnitedStates Government Printing Office; 1990.

[92] S.N. Ward,Methods forevaluatingearthquakepotential andlikelihoodin andaround California, Seismological Research Letters 78 (1) (2007) 121133.

[93] S. Wiemer, J. Benoit, Mapping the b value anomaly at 100km depth in theAlaska and New Zealand Subduction Zones, Geophysical Research Letters 23(1996) 15571660.

[94] S. Wiemer, M. Gerstenberger, E. Hauksson, Properties of the aftershocksequence of the1999Mw 7.1 Hector Mine Earthquake: implications for after-shock hazard, Bulletin of the Seismological Society of America 92 (4) (2002)12271240.

[95] S. Wiemer, D. Schorlemmer, ALM: an asperity-based likelihood model forCalifornia, Seismological Research Letters 78 (1) (2007) 134143.

[96] I.H.Witten,E. Frank, Data Mining. Practical Machine Learning Toolsand Tech-

niques, Elsevier, Amsterdam, 2005.[97] J.D. Zechar, T.H. Jordan, Testing alarm-based earthquake predictions, Geo-physical Journal International 172 (2008) 715724.

[98] J.D. Zechar, J. Zhuang, Risk and return: evaluating reverse tracing of precur-sorsearthquake predictions, Geophysical Journal International182 (3)(2010)13191326.

[99] S.D. Zhang, The 1999 XiuyenHaichung, Liaoning, M5.4 Earthquake, BeijingSeismological Press, Beijing, 2004.

[100] A.Zollo, W. Marzocchi, P. Capuano, A. Lomaz, G. Iannaccone, Space and timebehaviourof seismic activity at Mt. Vesuvius volcano, southern Italy, Bulletinofthe Seismological Society of America 92 (2) (2002) 625640.
http://earthquake.usgs.gov/http://earthquake.usgs.gov/http://www.geofisica.cl/macam/cma_usach.htmhttp://www.geofisica.cl/macam/cma_usach.htmhttp://fisica.usach.cl/pronostico/index.htmhttp://www.cs.waikato.ac.nz/ml/weka/http://www.cs.waikato.ac.nz/ml/weka/http://fisica.usach.cl/pronostico/index.htmhttp://www.geofisica.cl/macam/cma_usach.htmhttp://www.geofisica.cl/macam/cma_usach.htmhttp://earthquake.usgs.gov/http://earthquake.usgs.gov/

Documents

02 Asoc Published