25
SPATIAL DATAMINING Erki Saluveer

Data Mining Methods in Geography - Kursused ... analysis 10. Geography 1. Census data 11. Agriculture 5 16.12.2009 MTAT.03.249 MOBILE POSITIONING - TRAJECTORIES 6 16.12.2009 MTAT.03.249

Embed Size (px)

Citation preview

SPATIAL DATAMININGErki Saluveer

16.12.2009M

TAT.03.249

SHOPPING CART

1. Classical shopping basket analysis2. Association rule: wine - > cheese

9. Is there temporal variability in shopping behavior10.Is there spatial correlation of basket items within a

region?11.Is there spatial and temporal variabilities in

shopping behaviour? 2

16.12.2009M

TAT.03.249

WHY ARE WE LOOKING INTO SPATIAL DATA?

2. Loads of new digital spatial data created every day

3. Number of possible geographic hypothesis too large to explore manually

4. New dimensions:1. Teporal dimesion – variability of the data in time2. Spatial dimension – localisation, nabouring

relations, sizes etc in Geographical space Social networking space Outer space ...

3

16.12.2009M

TAT.03.249

DEFINITION

1. Geographic data mining involves the application of computational tools to revealinteresting patterns in objects and events distributed in geographic space and across time. (Miller & Han, 2001)

4

16.12.2009M

TAT.03.249

EXAMPLES1. 1855 Asiatic Cholera in London: A water

pump identified as the source

3. Meteorology1. patterns of Golf Stream and predictive models2. Climate change

4. Biology5. Geology6. Medicine

1. Disease clusters by Estonian Genome Project7. Marketing

1. Client clusters8. Infrastructure

1. Technical faults in electrical network systems9. Police

1. Crime analysis10. Geography

1. Census data11. Agriculture 5

16.12.2009M

TAT.03.249

MOBILE POSITIONING - TRAJECTORIES

6

16.12.2009M

TAT.03.249

SPATIAL OBJECT

1. Conseptual model1. Field model – a function of location in 2D or 3D2. Discrete objects - point, line, polygon

3. Contains both spatial and nonspatial attributes.4. Must have a location type attributes:

1. Latitude/longitude2. Zip code3. Street address

5. May retrieve object using either (or both) spatial or nonspatial attributes.

7

16.12.2009M

TAT.03.249

CHALLENGES OF SPATIAL DATA

1. Tobler’s first law of geography1. Everything is related to everything else, but near

things are more related than distant things2. It means strong autocorrelation between objects

2. Scale of resolution1. Dependencies on a small scale turn into random

variation when analyzed using broader scale of measure

3. Relationships of objects4. Manhattan distance versus

Euclidean distance

8

16.12.2009M

TAT.03.249

TOPOLOGICAL AND DIRECTIONAL RELATIONSHIPS

9

16.12.2009M

TAT.03.249

HOW TO DEAL?

3. Spatial attributes as ordinary variables

5. Pre-process for feature extraction

7. Special algorithms

10

16.12.2009M

TAT.03.249

CLUSTERING

1. Detect clusters of irregular shapes2. Clusters - non-overlaping heterogeneous

groups3. Use of centroids and simple distance

approaches may not work well.4. Clusters should be independent of order of

input.

11

16.12.2009M

TAT.03.249NEAREST NEIGHBOUR CLUSTERING ALGORITHM

2. Given n elements x1, x2, … xn, and threshold t, .

1. j 1, k 1, Clusters = {} 2. Repeat

1. Find the nearest neighbour of xj 2. Let the nearest neighbour be in cluster m 3. If distance to nearest neighbour > t, then create a

new cluster and k k+1; else assign xj to cluster m

4. j j+13. until j > n 12

16.12.2009M

TAT.03.249

ASSOCIATION RULES

1. Classical method: 1. Association rule given item-types and transactions2. assumes spatial data can be decomposed into transactions3. However, such decomposition may alter spatial patterns

2. New spatial methods1. Spatial association rules2. Spatial co-locations

3. Note: Association rule or co-location rules are fast filters to reduce the number of pairs for rigorous statistical analysis, e.g correlation analysis, cross-K-function for spatial interaction etc.

13

16.12.2009M

TAT.03.249

SPATIAL ASSOCIATION RULE ALGORITHM

14

16.12.2009M

TAT.03.249

CLASSIFICATION AND REGRESSION

1. K-nearest neighbour (kNN)1. Objects with similar characteristics possess similar

class values

2. Model trees

4. Geographically weighted regression (GWR)

6. Kriging

15

16.12.2009M

TAT.03.249

SPATIAL DECISION TREE

1. Approach similar to that used for spatial association rules.

2. Spatial objects can be described based on objects close to them – Buffer.

3. Description of class based on aggregation of nearby objects.

16

16.12.2009M

TAT.03.249

SPATIAL DECISION TREE ALGORITHM

17

16.12.2009M

TAT.03.249

SUBGROUP DISCOVERY

1. Analyses dependencies beween a target variable and several explanatory variables

2. Subgroup discovery is a multi-relational approach that searches for probabilistically defined deviation patterns (Klösgen 1996, Wrobel 1997)

3. Top-down search search from most general to most specific subgroups, exploiting partial ordering of subgroups (S1 ≥ S2 S1 more general than S2)

4. Beam search expanding only the n best ones at each level of search

5. Evaluating hypothesis according to quality function:

T = long-term illness=high 18

16.12.2009M

TAT.03.249

MOBILE POSITIONING – TRAJECTORIES IN TIMESPACE PRISM

19

16.12.2009M

TAT.03.249

What about time?

20

16.12.2009M

TAT.03.249

SPATIO-TEMPORAL DATA

21

Spatial data n points (locations) Distance is important

clustering pattern Presence of attributes

(e.g. man/woman): co-location patterns

Spatio-temporal data

n trajectories, each has t time steps

Distance is time-dependent flock pattern meet pattern

Heading and speed are important and are also time-dependent

16.12.2009M

TAT.03.249

TRAJECTORIES Flock: near positions of (sub)trajectories for some

subset of the entities during some time Convergence: same destination region for some

subset of the entities Encounter: same destination region with same

arrival time for some subset of the entities Similarity of trajectories Same direction of movement, leadership, ......

flock convergence22

16.12.2009M

TAT.03.249

CONCLUSION1. Spatial patterns are opposite of random2. Common spatial patterns: location prediction, feature interaction,

hot spots, geographically referenced statistical patterns, co-location, emergent patterns,…

3. SDM = search for unexpected interesting patterns in large spatial databases

4. Spatial patterns may be discovered using1. Techniques like classification, associations, clustering

and outlier detection2. New techniques are needed for SDM due to

Spatial Auto-correlation Importance of non-point data types (e.g. polygons) Continuity of space Regional knowledge; also establishes a need for scoping Separation between spatial and non-spatial subspace—in

traditional approaches clusters are usually defined over the complete attribute space 23

16.12.2009M

TAT.03.249

SOME GIS – DATA MINING SOFTWARE

3. GeoMiner

5. SPIN!

7. INGENS

24

16.12.2009M

TAT.03.249

REFERENCES

1. Gianotti F., Pedreschi D. “Mobility, Data Mining and Privacy”

2. Miller J. H., Han J. “Geographic Data Mining and Knowledge Discovery”

4. Dunham H. M. “Data Mining. Introductory and Advanced Topics. Part III”

5. Tama A. B. “Introduction to Data Mining”6. Eick F. C. “Brief Introduction to Spatial Data

Mining”

25