Jun Yan Geography Department SUNY at Buffalo July 29, 2004 Geographic Knowledge Discovery in Spatial...

Preview:

Citation preview

Jun YanGeography DepartmentSUNY at BuffaloJuly 29, 2004

Geographic Knowledge Discovery in Geographic Knowledge Discovery in

Spatial Interaction With Self-Spatial Interaction With Self-

Organizing MapsOrganizing Maps

Ph.D. Dissertation Defense Ph.D. Dissertation Defense

Dissertation Committee:Dr. Jean-Claude Thill (Chair)Dr. Ling BianDr. David Mark

Background

Spatial Interaction Data

Methodology Self-Organizing Maps

Visual Data Mining

Case studies

Conclusions and Future Research

OutlineOutline

BackgroundBackground

Information technologies

More tools available

More data available

Two Legs!!!

Data-rich vs computation-rich:

challenge?

opportunity !!!

Background (Cont.)Background (Cont.)

Data Mining & Knowledge Discovery: “useful information from large databases”

useful novel valid Understandable

Geographic data mining (GDM) and geographic knowledge discovery (GKD)?

Background (Cont.)Background (Cont.)

Mining techniques: statistics, pattern recognition, machine learning, visualization, high performance computing …

Knowledge discovery processUser Controller

DBMSDB

InterfaceTarget Data

Selection

Data Mining Evaluation Discoveries

DomainKnowledge Knowledge Base

Knowledge discovery process

Data Mining

Background (Cont.)Background (Cont.)

Finding all the patterns autonomously in a database?: unrealistic

because the patterns could be too many but uninteresting

Data mining: an iterative, interactive, semi-automated process

people directs what to be mined

Visualization: Geovisualization (GVis)

visual data mining !!!

Visualization in KDD ProcessVisualization in KDD Process

Selecting Application Domain

Selecting Target Data

Processing Data

Extracting Information/Knowledge

Interpretation and Evaluation

Understanding basic data distribution, selecting meaningful target datasets

Locating missing data, noise removing, data smoothing

Parameters setting, process tracking, process steering

Interpretation, reporting, comparison, validity checking

Background (Cont.)Background (Cont.)

Learning Algorithm

Examples

Background knowledge (sometimes)

Concept description or

Other knowledge

Input layer Output layer

Hidden layer

Inputs Outputs

Machine learning & Neural Networks

Background (Cont.)Background (Cont.)

Objectives: Explore the effectiveness of neural

networks in GKD

Examine the roles of GVis in GKD

What is spatial interaction? Pairs of places

Elemental: trips made by individuals

Aggregate: flows from origins to destinations

Examples: migration, freight shipment, movement of capital & information …

Spatial Interaction DataSpatial Interaction Data

Spatial Interaction Data Spatial Interaction Data (Cont.)(Cont.)

  Region 1 Region 2 Region 3

Region1      

Region 2      

Region 3      

Basic O-D matrix

  Type 1 Type 2 Type 3

Region1>Region 1      

Region1>Region 2      

Region1>Region 3      

Dyadic O-D matrix

  Origin Destination

Distance

Trip 1      

Trip 2      

Trip 3      

Trip table

Elemental level

Aggregate level

Exploring the Patterns of Interaction

Very necessary!!!

Existing Exploratory Data Analysis (EDA): lack of interactivity

Challenges:

a large number of interactions

wide range of interaction magnitudes

multiple semantics

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Origin

Destination

Interaction semantics

O-D Matrices

Multidimensionality!!!

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Electronic products

Machinery

Vehicle and parts Photographic products

MethodologyMethodology

Self-Organizing Maps (SOM)

Visual Data Mining (VDM):

SOM as core DM engine

Interactivity

Self-Organizing MapsSelf-Organizing Maps

A crucial task of KDD: reduce data complexity

1) Data Quantization: number of records, here number of spatial interactions

2) Data Projection: number of variables, here number of interaction semantics

By reducing data complexity, identification of meaningful geographic structures becomes possible

Traditional multivariate statistical methods share their limitations

Self-Organizing Maps (Cont.)(Cont.)

Losing Node

Winning NodeOutput

Losing Node

Input Layer Competitive Output layer

1. A special type of competitive neural network;

2. Based on some measure of dissimilarity in the attribute space;

3. Capable of reducing data complexity on two dimensions simultaneously

4. Actually an unsupervised pattern classifier.

1. A special type of competitive neural network;

2. Based on some measure of dissimilarity in the attribute space;

3. Capable of reducing data complexity on two dimensions simultaneously

4. Actually an unsupervised pattern classifier.

))()(()()()1( tmxthttmtm kckkk

Self-Organizing Maps (Cont.)(Cont.)

1. Best match unit (BMU) changes its value to fit with the input data;

2. Its neighboring nodes change their values to fit with the input data as well. Only the magnitude decreases with distance;

3. Like a flexible net;

4. Similar data will locate close to each other in the mapping

1. Best match unit (BMU) changes its value to fit with the input data;

2. Its neighboring nodes change their values to fit with the input data as well. Only the magnitude decreases with distance;

3. Like a flexible net;

4. Similar data will locate close to each other in the mapping

Visual Data MiningVisual Data Mining

Visualization Forms

Assignment

Focusing

Brushing

Colormap manipulation

Dynamic linking

Interaction Forms

Operation

Framework

Visualization FormsVisualization Forms

Case StudiesCase Studies

Airline Origin and Destination Survey Market Table (DB1Market): http://www.bts.org 10% of air flight itineraries

Geographic scale: airport level 280 metros in Contiguous US

Temporal range: 1993 to 2002

Two case studies on DB1BMarket Cross-sectional analysis

Temporal changes

9

8

7

6

5

4 3

21

Clustering AnalysisClustering Analysis

1. A cluster is an area of low values (distance) surrounded by areas of high values (distance).

2. There are several clusters in the feature map

1. A cluster is an area of low values (distance) surrounded by areas of high values (distance).

2. There are several clusters in the feature map

9-1

8

7

6

5

43

2

1

9-2

9-3 9-4

9-5

Clustering Analysis (Cont.)Clustering Analysis (Cont.)

A cluster is a valley in a 3-D mapA cluster is a valley in a 3-D map

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

Market Share

Contribution

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

C #

Cluster Property (Airline)

1 America West (HP)

2 US Air (US)

3 Continental (CO), Continental Express

(RU)

4 Northwest (NW), Mesaba (XJ)

5 Horizon (QX)

6 United (UA)

7 Air Wisconsin (ZW)

8 American (AA), American Eagle

(MQ)

9-1

No dominant airlines

9-2

Southwest (WN)

9-3

Comair (OH)

9-4

Delta (DL)

9-5

Delta (DL), Atlantic Southeast (EV)

Multiple

AA MQ

ZW

UA

QX

NW XJ

CO RU

US

HP

WN

QX DL

DL

EV

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

Markets with US Airways Market Share >= 50%

Markets Represented by Cluster 2

Cluster 2

Cluster Analysis: Cluster Analysis: MarketsMarkets From From NashvilleNashville

AA

US

NW

UADL

CORU

WN

EV

Cluster Analysis: Cluster Analysis: MarketsMarkets From From Nashville (Cont.)Nashville (Cont.)

AA

US

NW

UADL

CORU

WN

EV

Association AnalysisAssociation Analysis

Market Share

Average

Airfare

Association Analysis Association Analysis (Cont.)(Cont.)

American Delta

Association Analysis Association Analysis (Cont.)(Cont.)

Average Airfare, Delta (without competition of Airtran)

Average Airfare, Delta (with competition of Airtran)

Temporal ChangesTemporal Changes

Temporal Changes (Cont.)Temporal Changes (Cont.)

AA 1993

TWA 2001

AA 2001AA

2002

Temporal Changes (Cont.)Temporal Changes (Cont.)

Continental share

Northwest share

Temporal Changes: Temporal Changes: TrajectoryTrajectory

98

00

96

01

93

US Airways share

98

00

96

01

93

Southwest share

98

00

96

01

93

US Airways fare

Market from Buffalo to DC

ConclusionsConclusions

Data rich environment: large databases, and high dimensionality

Data complexity reduction is crucial

Results suggest SOM: summarize well the overall data distribution

capable of detecting clustered structures

can be used to analyze the properties of clustered structures

can be used to study the associations among input variables

Conclusions (Cont.)Conclusions (Cont.)

Interactive visual data mining can: examine subset data more closely

study relationships among interaction types

analyze how detected clusters are distributed in the actual geographic space

Help us gain a better understanding of the factors and spatial processes behind

Future ResearchFuture Research

SOM/VDM analysis DB1BMarket

Other types of spatial interaction data

Data at elemental level

Improved VDM environment Human subject testing

Seemly-coupled

Thank You!Questions? Comments?

Contact: junyan@buffalo.edu

Background (Cont.)Background (Cont.)

Geographic database fits the profile: massive volume: GIS, GPS, Remote

Sensing …

high dimensionality

Geographic data mining (GDM) and geographic knowledge discovery (GKD)?

Current topic in GIS research

Background (Cont.)Background (Cont.)

Exploratory analysis

Knowledge construction

Analysis and modeling

Evaluation of results

Model driven

Data driven

TimeVisual exploration & visual data mining Visual

knowledge construction & refinement

Visual model tracking,

model steering

Data presentation,

visualization of uncertainty

Exploratory analysis

Knowledge construction

Roles of Visualization

Visualization in KDD ProcessVisualization in KDD Process

Selecting Application Domain

Selecting Target Data

Processing Data

Extracting Information/Knowledge

Interpretation and Evaluation

Understanding basic data distribution, selecting meaningful target datasets

Locating missing data, noise removing, data smoothing

Parameters setting, process tracking, process steering

Interpretation, reporting, comparison, validity checking

Modeling Flows

Spatial interaction models: “Gravity Models”

Other geographic factors: Geographic relationships among

origins?

Geographic relationships among destinations?

Association among types of interaction?

Modeling FlowsModeling Flows

Modeling Flows Spatial interaction models: “Gravity

Models”

Push: origin

Pull: destination

Transportation cost: distance decay

Modeling FlowsModeling Flows

Iij  =  k Pi Pj / dija

     =  k Pi Pj dij -a

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Spatial Interaction Data (Cont.)Spatial Interaction Data (Cont.)

Limitations of Traditional Multivariate Limitations of Traditional Multivariate MethodsMethods Data Projection

Factor analysis Projection pursuit Multi-dimensional

scaling Data Quantization

Partitioning methods Hierarchical methods

o Linearityo Stationaryo Normal distributiono Limited data amounto One dimension

compression

o Non-linearo Non-stationaryo Distribution unknowno Sparseo Large data amounto Multi-dimensional

Visualization FormsVisualization Forms

Interaction FormsInteraction Forms

Interaction FormsInteraction Forms

Data DistributionData Distribution

1. Similar data distributions

2. But greatly reduced number of low values

3. SOM prototype represents original data well

1. Similar data distributions

2. But greatly reduced number of low values

3. SOM prototype represents original data well

Cluster Analysis (Cont.)Cluster Analysis (Cont.)

Markets with Southwest Market Share >= 50%

Markets Represented by Cluster 9-2

Cluster 9-2 Markets with Southwest Market Share >= 20%

Temporal Changes Temporal Changes (Cont.)(Cont.)

US Airways share

American share

Temporal Changes Temporal Changes (Cont.)(Cont.)

Delta shareUnited share

Temporal Changes (Cont.)Temporal Changes (Cont.)

Temporal Trend: Temporal Trend: Trajectory Trajectory (Cont.)(Cont.) Market from Buffalo to NYC

US Airways share

93

96

00

01

JetBlue share

93

96

00

01

US Airways fare

93

96

00

01

Temporal Trend: Temporal Trend: Trajectory Trajectory (Cont.)(Cont.) Market from Buffalo to Atlanta

93

98

Airtran Airways share

Delta share

93

98

Delta fare

93

98

Recommended