31
Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases Minneapolis, MN August 24, 2011 Comparing Predictive Power in Climate Data: Clustering Matters

Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

Karsten SteinhaeuserUniversity of Minnesota

joint work withNitesh Chawla & Auroop Ganguly

12th International Symposium onSpatial and Temporal Databases

Minneapolis, MN

August 24, 2011

Comparing Predictive Power in Climate Data: Clustering Matters

Page 2: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 2

Outline

• Motivation

• Networks Primer

• From Data to Networks

• Motivating Networks in Climate Science

• Descriptive Analysis and Predictive Modeling

• Empirical Evaluation & Comparison

• Conclusions

Page 3: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 3

Mining Complex Data

• Complex spatio-temporal data pose unique challenges

• Tobler’s First Law of Geography:

“Everything is related, but near

things more than distant.”– But are all near things equally related?

– Are there phenomena explained by

interactions among distant things?

(teleconnections)

Page 4: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 4

Networks Primer

What is a Network?

• Oxford English Dictionary:network, n.: Any netlike or complexsystem or collection of interrelatedthings, as topographical features,lines of transportation, ortelecommunications routes(esp. telephone lines).

• My working definition:Any set of items that are connected or related to each other.(“items” and “connections” can be concrete or abstract)

Page 5: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 5

Networks Primer

Community Detection in Networks

• Identify groups of nodes

that are relatively more tightly

connected to each other than

to other nodes in the network

• Computationally challenging

problem for real-world networks

Page 6: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 6

From Data to Networks

• Networks are pervasive in

social science, technology,

and nature

• Many datasets explicitly

define network structure

• But networks can also represent other types of data,

framework for identifying relationships, patterns, etc.

Page 7: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 7

Motivating Networks in Climate

Uncertainty derives from many known and

often many more unknown sources

Page 8: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 8

Motivating Networks in Climate

• Projections of climate

rely on many factors

– Understanding of the

physical processes

– Ability to implement

this understanding in

computational models

– Assumptions about

the futureSource: IPCC SRES and AR-4

Page 9: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 9

Motivating Networks in Climate

• Some processes well-understood and modeled,

others much less credible

• Comparison to observations shows varying skills

Page 10: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 10

Motivating Networks in Climate

• Models cannot capture some features/processes

• Comparison to observations illustrates severe

geographic variability, topographic bias

Page 11: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 11

Motivating Networks in Climate

Research Question:

Can we characterize the credible variables,

identify relationships to the relatively less

credible variables, and leverage them to

improve or refine our understanding?

Answer:

Stay Tuned…

Page 12: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 12

Basic OLAPCapabilities

Observations

GCM outputs

ARM Data

Data Storage & Management

Knowledge Discovery

HPSS

Oracle DB

Complex Networks

Data Mining

System Inputs

HPCCLens - Jaguar

High-Performance Computing

Visualization

ArcGIS

PowerWall

System Outputs

Novel Insights

Decision Support

Knowledge Discovery for Climate

Page 13: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 13

Historical Climate Data

• NCEP/NCAR Reanalysis (proxy for observation)

• Monthly for 60 years (1948-2007) on 5ºx5º grid

• Seven variables:Sea surface temperature (SST)Sea level pressure (SLP)Geopotential height (GH)Precipitable water (PW)Relative Humidity (RH)Horizontal wind speed (HWS)Vertical wind speed (VWS)

De-S

easonalize

Raw Data

AnomalySeries

Page 14: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 14

Network Construction

• View global climate system as a collection of interacting oscillators[Tsonis & Roebber, 2004]

– Vertices represent locations in space

– Edges denote correlation in variability

• Link strength estimated by correlation, low-weight edges are pruned from the network

• Construct networks only for

locations over the oceans

– Relatively better captured by models

Page 15: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 15

Geographic Properties

• Examine network structure in spatial context

– Link lengths computed as great-circle distance

– Compare autocorrelation / de-correlation lengths for

different variables, interpret within the domain

Sea Level Pressure Precipitable Water Vertical Wind Speed

Autocorrelation

Teleconnection

Page 16: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 16

Clustering Climate Networks

• Apply community detection to partition networks– Use Walktrap algorithm

[Pons & Latapy, 2006]

– Efficient and works well for dense networks

• Visualize spatial pattern using GIS tools

• Cluster structure suggests relationships within the climate system

Sea Level Pressure

Precipitable Water

Page 17: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 17

Update!

Research Question:

Can we characterize the credible variables,

identify relationships to the relatively less

credible variables, and leverage them to

improve or refine our understanding?

Revised Answer:

Yes… but that’s not all.

Page 18: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 18

Descriptive Predictive

• Network representation is able to capture interactions, reveal patterns in climate– Validate existing assumptions / knowledge

– Suggest potentially new insights or hypothesesfor climate science

• Want to extract the relationships between atmospheric dynamics over ocean and land– i.e., “Learn” physical phenomena from the data

Page 19: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 19

Predictive Modeling

• Use network clusters as candidate predictors

• Create response variables for target regions

around the globe (illustrated below)

• Build regression

model relating

ocean clusters

to land climate

Page 20: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 20

Illustrative Example

• Predictive model for air temperature in Peru

– Long-term variability highly predictable due to

well-documented relation to El Nino

• Small number of clusters have majority of skill

– Feature selection (blue line) improves predictionsRaw DataAll ClustersFeature Selection

Page 21: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 21

Update!

Research Question:

Can we characterize the credible variables,

identify relationships to the relatively less

credible variables, and leverage them to

improve or refine our understanding?

Revised Answer:

Yes and Yes… but wait, there’s more.

Page 22: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 22

Results on Train/Test

Page 23: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 23

Predictive Skill

Page 24: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 24

Update!

Research Question:

Can we characterize the credible variables,

identify relationships to the relatively less

credible variables, and leverage them to

improve or refine our understanding?

Revised Answer:

Yes, Yes, and Yes.

Page 25: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 25

Variations / Extensions

• Compare network approach to traditional

clustering methods

– k-means, k-medoids, spectral, EM, etc.

• Compare different types of predictive models

– (linear) regression, regression trees, neural nets,

support vector regression

Page 26: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 26

Compare Clustering Methods

Page 27: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 27

Compare Predictive Models

Page 28: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 28

Refining Model Projections

Page 29: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

Conclusions

• Networks capture behavior of the climate system

• Clusters (or “communities”) derived from these

networks have useful predictive skill

– Statistically significantly better than predictors based

on clusters derived using traditional methods

• Potential for advancing climate science

– Understanding of physical processes

– Complement climate model simulations

08/24/2011 University of Minnesota 29

Page 30: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

Upcoming Events

1. First International Workshop on Climate Informatics, New York, NY, August 26, 2011http://www.nyas.org/climateinformatics

2. NASA Conference on Intelligent Data Understanding (CIDU), Mountain View, CA, Oct 19-21, 2011https://c3.ndc.nasa.gov/dashlink/projects/43/

3. IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Vancouver, Canada, December 10, 2011http://www.nd.edu/~dial/climkd11/

08/24/2011 University of Minnesota 30

Page 31: Karsten Steinhaeuser University of Minnesota joint work with Nitesh Chawla & Auroop Ganguly 12 th International Symposium on Spatial and Temporal Databases

08/24/2011 University of Minnesota 31

Thanks & Questions

Contact

[email protected]

Personal Homepage

http://www.nd.edu/~ksteinha

NSF Expeditions on

Understanding Climate Change

http://climatechange.cs.umn.edu

This work was supported in part by the National Science Foundation

under Grants OCI-1029584 and BCS-0826958. This research was

also funded in part by the project entitled “Uncertainty Assessment

and Reduction for Climate Extremes and Climate Change Impacts”

under the initiative “Understanding Climate Change Impact: Energy,

Carbon, and Water Initiative” within the Laboratory Directed

Research and Development (LDRD) Program of the Oak Ridge

National Laboratory, managed by UT-Battelle, LLC for the U.S.

Department of Energy under Contract DE-AC05-00OR22725.