34
ining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

Embed Size (px)

Citation preview

Page 1: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

Mining Weather Data for Decision Support

Roy GeorgeArmy High Performance Computing Research Center

Clark Atlanta UniversityAtlanta, GA 30314

Page 2: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

2

Research

Clustering Algorithms for Data Mining Spatio-Temporal Domain Parallelization of Algorithms

Algorithms for Feature Extraction and Knowledge Discovery

Page 3: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

3

Challenges of Geographical Data

Complexities associated with data volume Terabyte databases

Domain complexities Interesting signals hidden by stronger patterns

Complexities caused by local variation Systems are interconnected

Data gathering and sampling Interpretation of aggregated data

Formalizing the domain

Page 4: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

4

Background: Issues with Hard Background: Issues with Hard ClusteringClustering

Issue: Force data with imprecision and/or uncertainty into discrete classes

Result: Missing important outliers, boundary patterns

Approach: Use of Approximate Clustering Technique

Page 5: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

5

Background: K-Means Clustering

Partition the data into K Clusters that are homogenous

Algorithm Select K time series as initial centroids Assign all time series to the most similar centroid Re-compute the centeroids Repeat till centroids do not change

Variations based on different measures of similarity

Page 6: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

6

Unsupervised Fuzzy K-Means (UKFM) Clustering

Choose the initial number of clusters Develop a clustering using the Fuzzy K-

Means Merge the cluster pair that have maximum

correlation Compute validity measure Repeat till until termination condition reached

Page 7: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

7

UKFM ResultsWeather Data Set

Initial: 11 Clusters Optimal: 8 Clusters

Final: 4 Clusters

Page 8: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

8

Global Earth Science Data

Collaborative Effort with V. Kumar (UMinn) Test bed for UKFM (comparison with existing

techniques) Data Set

Global Sea Pressure (1989 – 1993) Ocean Climate Indices

Capture Teleconnections Result UKFM can capture even weaker OCI’s using

coarse clusters

Page 9: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

9

Global Climate Data(Sea Level Pressure)

Intermediate: 60 Clusters

Page 10: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

10

Global Climate Data(Sea Level Pressure)

Final: 26 Clusters

Page 11: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

11

Relation with SOI

Page 12: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

12

Integrating Multi Datasets in UFKM Clustering

Motivation: Data-based approach of Determining “interesting” clusters Validate using multi datasets

Rule: Retain clusters that have supporting data

Applicable in Data Rich Environment

Page 13: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

13

UKFM Clustering with Multi-UKFM Clustering with Multi-Dataset ValidationDataset Validation

• Choose the initial number of clusters • Develop a clustering using the Fuzzy K-

Means • Validate cluster with other datasets Di=1,n

• Merge if clusters is uncorrelated ElseConsider next candidate pair to merge

Repeat till until termination condition reached

Page 14: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

14

UKFM Multi-Dataset ResultsHeight Pressure

TemperatureWindspeed

Page 15: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

15

Multi-threading Parallel Algorithm

For each clustering stage For each iteration

Slaves: Calculate Mfor each cluster

Master: Normalize M

Slaves: Calculate Cfor each cluster

Master: Normalize C

Page 16: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

16

Multi-threading Result

Implemented on Sun Fire workstation with four 900-MHz UltraSPARC® III processors

Near Linear Speed Up Obtained

Page 17: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

17

Relevance to the Army

Directly supports the FBKOF STO (B. Broome) Development of the Weather Information and

Tactical Support (WITS) System

Page 18: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

18

Weather Information and Tactical Support (WITS)

Objective: Extraction of patterns from weather to be extracted and fused with external databases (logistics, terrain, forces, etc.) for higher level planning

Page 19: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

19

Approach Development of an OLAP

Weather Repository GA Weather (1981-2002)

Sources: Nat. Weather Svc, GA Env. Network

Development of WITS Modules Ad-hoc Querying Real time Analysis and

Planning Effects on Army Systems

Integration with IWEDA

Abstract Data Representation

text

text

text

text

YEAR

MONTH

DAY

TEMPERATURE,PRECIPITATION,WIND SPEED, etc

Page 20: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

20

WITS System DesignUSER

INTERFACE

text

text

text

text

DATA WAREHOUSE

DATAMINING

MODULES

QUERYMODULES

KNOWLEDGEBASES

(IWEDA)

DATA CLEANING& TRANSFORMATION

DATAACQUISITION AGENTS

REAL TIME MODULE

TAPS MODULE

IQ MODULE

Page 21: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

21

WITS/IQ

Page 22: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

22

WITS/IQ

Page 23: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

23

WITS/IWEDA

Page 24: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

24

WITS/Analysis

Page 25: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

25

WITS/Analysis

Page 26: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

26

Work in Progress

Characterization of Analysis Queries Incorporation into Data Mining Algorithms into

WITS Enhancement of WITS/TAPS Implementation of WITS/Real

Page 27: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

27

Hybrid Genetic Fuzzy Systemsfor Feature Extraction and Knowledge

Discovery

Page 28: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

28

Project Goals

Design and implement hybrid genetic fuzzy system for knowledge discovery. Develop API/Tools. Apply tools to Army related problems.

Page 29: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

29

Contribution Hybrid system based on the Simple Genetic

Algorithm (SGA). Enhanced the SGA by adding three levels of knowledge discovery.

Level 1: Discovers up to k possible rules for a given set of inputs and outputs. It then attempts to minimize the number of rules and tune the knowledge base.

Level 2: Takes the set of rules from Level 1 and further minimizes the rules. In addition, it also tunes the knowledge base.

Level 3: Makes one last attempt to further tune the architecture of the knowledge base.

Page 30: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

30

Rule Discovery Search for k possible rules from the set of p possible rules. k

is a input parameter of the GA application.

Discover the smallest value of k, therefore reducing the number of rules needed.

Example Rules:

If INPUT_1 is low AND INPUT_2 is medium THEN OUTPUT_1 is high

If INPUT_1 is high THEN OUTPUT_1 is low

Page 31: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

31

Relevance to the Army

Collaborators: Jeff Passner, John Raby (ARL) IMETS weather modeling Post processing used to predict additional

parameters Visibility, Turbulence, Fog, etc. Use of Knowledge Discovery to Predict Parameters

Page 32: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

32

Visibility Application

Generate and tune a system that can predict visibility based on input parameters Tasks for the fuzzy genetic system

Search for a set of k rules from p possible rules that describe the relationship of the input parameters with the output (visibility)

Concurrently discover the architecture, and optimize the performance of the knowledge-bases in relation to the k rules

Page 33: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

33

Results for Low Visibility Classifier

Page 34: Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA 30314

34

Results forMedium Visibility Classifier