18
Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830 John B. Rundle Department of Physics and Colorado Center for Chaos & Complexity University of Colorado, Boulder, CO Presented at the GEM/ACES Workshop Maui, HI July 30, 2001

Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Embed Size (px)

Citation preview

Page 1: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Data Mining Using Eigenpattern Analysis in Simulationsand Observed Data

Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

John B. RundleDepartment of Physics and Colorado Center for Chaos & Complexity

University of Colorado, Boulder, CO

Presented at the GEM/ACES Workshop Maui, HIJuly 30, 2001

Page 2: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Activity Correlation Operators

Let y(xi,t) be the number of earthquakes per unit time at location xi and time t.

Now center the time series (remove mean and standard deviation)

y(xi,t) z(xi,t) … where z(xi,t) is the centered time series.

Define two correlation operators, a static correlation operator C(xi,xj ) and a rate correlation operator K(xi,xj):

C(xi,xj ) = z(xi,t) z(xj,t) dt Static

K(xi,xj ) = (2) 2 {z(xi,t)/t} {z(xj,t)/t} dt Rate

Page 3: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Diagonalize the Correlation Operators

C(xi,xj ) and K(xi,xj ) are both symmetric, square, and postive definite matrix operators. We can therefore apply singular value decomposition to find the eigenvectors and eigenvalues:

C(xi,xj ) = 2 T

K(xi,xj ) = 2 T

where T denotes the transpose.

is a matrix of static eigenpatterns n(xi)

is a diagonal matrix of eigenprobabilities i2

is a matrix of rate eigenpatterns n(xi)

is a diagonal matrix of eigenfrequencies i2

Page 4: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Comparison of Eigenpatterns 1,2 for 0 (Top)with Eigenpatterns 1,2 for = 0 (Bottom)

Positively correlated: (red - red) & (blue - blue). Negatively correlated: (red - blue).Uncorrelated: (red - green) & (blue - green).

JBR et al, Phys. Rev. E., v 61, 2000, & AGU Monograph “GeoComplexity & the Physics of Earthquakes”

Page 5: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Patterns of Earthquakes in Southern CaliforniaEarthquakes in southern California have been systematically recorded since 1932. The rate at which these events occur can be used to define activity time series in 10 km x 10 km spatial boxes that can be used to find the spatial patterns.

Above is a map of the relative intensity of seismic activity in southern California, 1932-1999. This can be considered to be a seismic “hazard map”.

Below is a map of the first PCA mode, which we call the “Hazard Mode”. Red areas tend to be active or inactive at the same time.

Above is a map of the second PCA mode, which we call the “Landers Mode”. Red areas tend to be inactive when blue areas are active & vice versa. All sites in a blue or red area tend to be active (or inactive) at the same time.

Figures courtesy KF Tiampo

Page 6: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Comparison of Log Likelihoods for PDPC from 500 random catalogsof seismic activity in Southern California with occurrence of future events (M > 5) with Log Likelihoods of hazard map & actual catalog via PDPC.

Actual catalog: PDPCfor 1978-Dec 31, 1991

Histogram: Log Likelihoods for 500 random catalogs. RSV: Use hazard map as predictor.Actual PDPC: Plot at left

Example of a PDPC arising from a catalog that has been randomized in space and time.

Page 7: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Using this new technique, one can compute the Phase Dynamical Probability Change (PDPC) anomalies that develop during the years 1988-1999.

Our retrospective studies indicate that colored anomalies can be regarded as indicating high probability for current and future major earthquakes (M > 6) over the period ~ 1999-2009, and have considerable forecast skill.

Earthquake Forecasting via the Mathematics of Quantum Mechanics Pattern techniques suggest a new approach to forecasting earthquakes. The idea is to view the patterns in the context of PHASE DYNAMICAL SYSTEMS, whose mathematics can be mapped into the mathematics of QUANTUM MECHANICS.

See JB Rundle et al. (2000); KF Tiampo et al. (2000)

In the PDPC method, intensity of seismic activity is mapped to a “wave function (x,t)”.

Intensity of seismic activity, 1932-1999

Page 8: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

One way to test the forecast for events from 2000-2010 is to plot all events with M > 4.0 that have occurred since Jan 1, 2000, superposed upon the colored forecast anomalies. These events are the small circles at right.

Note that our method should really only forecast events with M > 6.0

Testing the Forecast

Page 9: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830
Page 10: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Space-Time Patterns in ComplexMulti-Scale Earthquake Fault Systems

Since much of the dynamics is not accessible to direct observations, we must focus on learning about the system through analysis of the observable patterns

Space-time patterns in the system are mathematical expressions of the strong statistical correlations between various parts of the system

The system state vector characterizes the current state of the system -- it has an amplitude and a phase angle

Page 11: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Mapping Earthquake Dynamics into theMathematics of Quantum Mechanics

(or “Phase Dynamics”)(JB Rundle et al., Phys. Rev. E, v61, 2416, 2000)

This new technique can be regarded as a novel datamining method

Quantum Mechanical systems are strongly correlated systems (QM is a nonlocal theory)

The mathematics of QM describe systems with periodic and quasiperiodic observables, as well as hidden variables

Relative probabilities are well-defined quantities in QM

Normalized system state vectors are actually “WAVE FUNCTIONS” that describe earthquake probability amplitudes

Page 12: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Using our technique, we can compute the PDPCanomalies that develop during the years 1988-1999.

Our retrospective studies indicate that theseanomalies can be regarded as forecasts for majorearthquakes (M > 6) over the period ~ 1999-2009

An Earthquake Forecast ?

Page 13: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

                                      

                                          

Earthquake Fault System Dynamics are Strongly Correlated in Space and Time and Lead to Patterns

Data from Last Tuesday PDPC Forecast for ~ 1999-2010

Page 14: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Summary & Future Directions

The methods described here can be used to understand many classes of driven threshold systems

Network dynamics are determined importantly by the network connectivity as

well as the details of the nonlinear threshold process

Meanfield threshold systems appear to have locally ergodic behavior

Space-time patterns of observable failures (earthquakes) can be used to understand many facets of the underlying, unobservable dynamics (physical state variables)

Page 15: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830
Page 16: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Boolean Correlation Operators and Space-Time Patterns

We can define a set of basis patterns of earthquake activity using Boolean correlation operators. To do so, we need to define a Boolean activity time series: y(xi,t)

As a first step, we coarse grain the domain in space and time…i.e., we dividethe region of interest up into N boxes (say, ~10 km on a side) and time intoa series of Q short intervals (say, 8 hours).

If an earthquake occurs in a spatial box centered at (xi,t), we give a value

y(xi,t) = 1 ;

y(xi,t) = 0 Otherwise.

We therefore have a set of N time series, all Q elements long:

y(xi,t) = 0,0,0,1,0,0,0,0,0,0,1,0,0,0,0… etc.

Page 17: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Boolean Activity Eigenpatterns from Simulations

Here we show Static or Activity Eigenpatterns from the simulation…these constitute one possible basis set for all possible space-time patterns displayed by the system

Key to Correlation Patterns:

Red sites are positively correlated with red (and blue with blue)

Red sites are negatively correlated with blue

Red sites & Blue sites are uncorrelated with green

The Activity Eigenpatterns are RELATIVE PROBABILITY AMPLITUDES.

( JBR et al, Phys. Rev. E., v 61, 2000, & AGU Monograph “GeoComplexity & the Physics of Earthquakes” )

Page 18: Data Mining Using Eigenpattern Analysis in Simulations and Observed Data Woodblock Print, from “Thirty-Six Views of Mt. Fuji”, by K. Hokusai, ca. 1830

Summary & Future Directions

Numerical simulations (“Third Leg of Science”) are now being used to understand many classes of driven threshold systems (systems with many scales of length and time)

Network dynamics of these complex systems are determined importantly by the network connectivity as well as the details of the nonlinear threshold process

Meanfield threshold systems have dynamics that demonstrate first and second order (phase) transitions.

Threshold systems are capable of universal computation such as that which occurs in the human brain

Space-time patterns of observable failures (earthquakes) can be used to understand many facets of the underlying, unobservable dynamics (physical state variables)