Upload
julianna-bennett
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Mining Weather Data for Decision Support
Roy GeorgeArmy High Performance Computing Research Center
Clark Atlanta UniversityAtlanta, GA 30314
2
Research
Clustering Algorithms for Data Mining Spatio-Temporal Domain Parallelization of Algorithms
Algorithms for Feature Extraction and Knowledge Discovery
3
Challenges of Geographical Data
Complexities associated with data volume Terabyte databases
Domain complexities Interesting signals hidden by stronger patterns
Complexities caused by local variation Systems are interconnected
Data gathering and sampling Interpretation of aggregated data
Formalizing the domain
4
Background: Issues with Hard Background: Issues with Hard ClusteringClustering
Issue: Force data with imprecision and/or uncertainty into discrete classes
Result: Missing important outliers, boundary patterns
Approach: Use of Approximate Clustering Technique
5
Background: K-Means Clustering
Partition the data into K Clusters that are homogenous
Algorithm Select K time series as initial centroids Assign all time series to the most similar centroid Re-compute the centeroids Repeat till centroids do not change
Variations based on different measures of similarity
6
Unsupervised Fuzzy K-Means (UKFM) Clustering
Choose the initial number of clusters Develop a clustering using the Fuzzy K-
Means Merge the cluster pair that have maximum
correlation Compute validity measure Repeat till until termination condition reached
7
UKFM ResultsWeather Data Set
Initial: 11 Clusters Optimal: 8 Clusters
Final: 4 Clusters
8
Global Earth Science Data
Collaborative Effort with V. Kumar (UMinn) Test bed for UKFM (comparison with existing
techniques) Data Set
Global Sea Pressure (1989 – 1993) Ocean Climate Indices
Capture Teleconnections Result UKFM can capture even weaker OCI’s using
coarse clusters
9
Global Climate Data(Sea Level Pressure)
Intermediate: 60 Clusters
10
Global Climate Data(Sea Level Pressure)
Final: 26 Clusters
11
Relation with SOI
12
Integrating Multi Datasets in UFKM Clustering
Motivation: Data-based approach of Determining “interesting” clusters Validate using multi datasets
Rule: Retain clusters that have supporting data
Applicable in Data Rich Environment
13
UKFM Clustering with Multi-UKFM Clustering with Multi-Dataset ValidationDataset Validation
• Choose the initial number of clusters • Develop a clustering using the Fuzzy K-
Means • Validate cluster with other datasets Di=1,n
• Merge if clusters is uncorrelated ElseConsider next candidate pair to merge
Repeat till until termination condition reached
14
UKFM Multi-Dataset ResultsHeight Pressure
TemperatureWindspeed
15
Multi-threading Parallel Algorithm
For each clustering stage For each iteration
Slaves: Calculate Mfor each cluster
Master: Normalize M
Slaves: Calculate Cfor each cluster
Master: Normalize C
16
Multi-threading Result
Implemented on Sun Fire workstation with four 900-MHz UltraSPARC® III processors
Near Linear Speed Up Obtained
17
Relevance to the Army
Directly supports the FBKOF STO (B. Broome) Development of the Weather Information and
Tactical Support (WITS) System
18
Weather Information and Tactical Support (WITS)
Objective: Extraction of patterns from weather to be extracted and fused with external databases (logistics, terrain, forces, etc.) for higher level planning
19
Approach Development of an OLAP
Weather Repository GA Weather (1981-2002)
Sources: Nat. Weather Svc, GA Env. Network
Development of WITS Modules Ad-hoc Querying Real time Analysis and
Planning Effects on Army Systems
Integration with IWEDA
Abstract Data Representation
text
text
text
text
YEAR
MONTH
DAY
TEMPERATURE,PRECIPITATION,WIND SPEED, etc
20
WITS System DesignUSER
INTERFACE
text
text
text
text
DATA WAREHOUSE
DATAMINING
MODULES
QUERYMODULES
KNOWLEDGEBASES
(IWEDA)
DATA CLEANING& TRANSFORMATION
DATAACQUISITION AGENTS
REAL TIME MODULE
TAPS MODULE
IQ MODULE
21
WITS/IQ
22
WITS/IQ
23
WITS/IWEDA
24
WITS/Analysis
25
WITS/Analysis
26
Work in Progress
Characterization of Analysis Queries Incorporation into Data Mining Algorithms into
WITS Enhancement of WITS/TAPS Implementation of WITS/Real
27
Hybrid Genetic Fuzzy Systemsfor Feature Extraction and Knowledge
Discovery
28
Project Goals
Design and implement hybrid genetic fuzzy system for knowledge discovery. Develop API/Tools. Apply tools to Army related problems.
29
Contribution Hybrid system based on the Simple Genetic
Algorithm (SGA). Enhanced the SGA by adding three levels of knowledge discovery.
Level 1: Discovers up to k possible rules for a given set of inputs and outputs. It then attempts to minimize the number of rules and tune the knowledge base.
Level 2: Takes the set of rules from Level 1 and further minimizes the rules. In addition, it also tunes the knowledge base.
Level 3: Makes one last attempt to further tune the architecture of the knowledge base.
30
Rule Discovery Search for k possible rules from the set of p possible rules. k
is a input parameter of the GA application.
Discover the smallest value of k, therefore reducing the number of rules needed.
Example Rules:
If INPUT_1 is low AND INPUT_2 is medium THEN OUTPUT_1 is high
If INPUT_1 is high THEN OUTPUT_1 is low
31
Relevance to the Army
Collaborators: Jeff Passner, John Raby (ARL) IMETS weather modeling Post processing used to predict additional
parameters Visibility, Turbulence, Fog, etc. Use of Knowledge Discovery to Predict Parameters
32
Visibility Application
Generate and tune a system that can predict visibility based on input parameters Tasks for the fuzzy genetic system
Search for a set of k rules from p possible rules that describe the relationship of the input parameters with the output (visibility)
Concurrently discover the architecture, and optimize the performance of the knowledge-bases in relation to the k rules
33
Results for Low Visibility Classifier
34
Results forMedium Visibility Classifier