Upload
phamthuan
View
222
Download
3
Embed Size (px)
Citation preview
16.12.2009M
TAT.03.249
SHOPPING CART
1. Classical shopping basket analysis2. Association rule: wine - > cheese
9. Is there temporal variability in shopping behavior10.Is there spatial correlation of basket items within a
region?11.Is there spatial and temporal variabilities in
shopping behaviour? 2
16.12.2009M
TAT.03.249
WHY ARE WE LOOKING INTO SPATIAL DATA?
2. Loads of new digital spatial data created every day
3. Number of possible geographic hypothesis too large to explore manually
4. New dimensions:1. Teporal dimesion – variability of the data in time2. Spatial dimension – localisation, nabouring
relations, sizes etc in Geographical space Social networking space Outer space ...
3
16.12.2009M
TAT.03.249
DEFINITION
1. Geographic data mining involves the application of computational tools to revealinteresting patterns in objects and events distributed in geographic space and across time. (Miller & Han, 2001)
4
16.12.2009M
TAT.03.249
EXAMPLES1. 1855 Asiatic Cholera in London: A water
pump identified as the source
3. Meteorology1. patterns of Golf Stream and predictive models2. Climate change
4. Biology5. Geology6. Medicine
1. Disease clusters by Estonian Genome Project7. Marketing
1. Client clusters8. Infrastructure
1. Technical faults in electrical network systems9. Police
1. Crime analysis10. Geography
1. Census data11. Agriculture 5
16.12.2009M
TAT.03.249
SPATIAL OBJECT
1. Conseptual model1. Field model – a function of location in 2D or 3D2. Discrete objects - point, line, polygon
3. Contains both spatial and nonspatial attributes.4. Must have a location type attributes:
1. Latitude/longitude2. Zip code3. Street address
5. May retrieve object using either (or both) spatial or nonspatial attributes.
7
16.12.2009M
TAT.03.249
CHALLENGES OF SPATIAL DATA
1. Tobler’s first law of geography1. Everything is related to everything else, but near
things are more related than distant things2. It means strong autocorrelation between objects
2. Scale of resolution1. Dependencies on a small scale turn into random
variation when analyzed using broader scale of measure
3. Relationships of objects4. Manhattan distance versus
Euclidean distance
8
16.12.2009M
TAT.03.249
HOW TO DEAL?
3. Spatial attributes as ordinary variables
5. Pre-process for feature extraction
7. Special algorithms
10
16.12.2009M
TAT.03.249
CLUSTERING
1. Detect clusters of irregular shapes2. Clusters - non-overlaping heterogeneous
groups3. Use of centroids and simple distance
approaches may not work well.4. Clusters should be independent of order of
input.
11
16.12.2009M
TAT.03.249NEAREST NEIGHBOUR CLUSTERING ALGORITHM
2. Given n elements x1, x2, … xn, and threshold t, .
1. j 1, k 1, Clusters = {} 2. Repeat
1. Find the nearest neighbour of xj 2. Let the nearest neighbour be in cluster m 3. If distance to nearest neighbour > t, then create a
new cluster and k k+1; else assign xj to cluster m
4. j j+13. until j > n 12
16.12.2009M
TAT.03.249
ASSOCIATION RULES
1. Classical method: 1. Association rule given item-types and transactions2. assumes spatial data can be decomposed into transactions3. However, such decomposition may alter spatial patterns
2. New spatial methods1. Spatial association rules2. Spatial co-locations
3. Note: Association rule or co-location rules are fast filters to reduce the number of pairs for rigorous statistical analysis, e.g correlation analysis, cross-K-function for spatial interaction etc.
13
16.12.2009M
TAT.03.249
CLASSIFICATION AND REGRESSION
1. K-nearest neighbour (kNN)1. Objects with similar characteristics possess similar
class values
2. Model trees
4. Geographically weighted regression (GWR)
6. Kriging
15
16.12.2009M
TAT.03.249
SPATIAL DECISION TREE
1. Approach similar to that used for spatial association rules.
2. Spatial objects can be described based on objects close to them – Buffer.
3. Description of class based on aggregation of nearby objects.
16
16.12.2009M
TAT.03.249
SUBGROUP DISCOVERY
1. Analyses dependencies beween a target variable and several explanatory variables
2. Subgroup discovery is a multi-relational approach that searches for probabilistically defined deviation patterns (Klösgen 1996, Wrobel 1997)
3. Top-down search search from most general to most specific subgroups, exploiting partial ordering of subgroups (S1 ≥ S2 S1 more general than S2)
4. Beam search expanding only the n best ones at each level of search
5. Evaluating hypothesis according to quality function:
T = long-term illness=high 18
16.12.2009M
TAT.03.249
SPATIO-TEMPORAL DATA
21
Spatial data n points (locations) Distance is important
clustering pattern Presence of attributes
(e.g. man/woman): co-location patterns
Spatio-temporal data
n trajectories, each has t time steps
Distance is time-dependent flock pattern meet pattern
Heading and speed are important and are also time-dependent
16.12.2009M
TAT.03.249
TRAJECTORIES Flock: near positions of (sub)trajectories for some
subset of the entities during some time Convergence: same destination region for some
subset of the entities Encounter: same destination region with same
arrival time for some subset of the entities Similarity of trajectories Same direction of movement, leadership, ......
flock convergence22
16.12.2009M
TAT.03.249
CONCLUSION1. Spatial patterns are opposite of random2. Common spatial patterns: location prediction, feature interaction,
hot spots, geographically referenced statistical patterns, co-location, emergent patterns,…
3. SDM = search for unexpected interesting patterns in large spatial databases
4. Spatial patterns may be discovered using1. Techniques like classification, associations, clustering
and outlier detection2. New techniques are needed for SDM due to
Spatial Auto-correlation Importance of non-point data types (e.g. polygons) Continuity of space Regional knowledge; also establishes a need for scoping Separation between spatial and non-spatial subspace—in
traditional approaches clusters are usually defined over the complete attribute space 23
16.12.2009M
TAT.03.249
REFERENCES
1. Gianotti F., Pedreschi D. “Mobility, Data Mining and Privacy”
2. Miller J. H., Han J. “Geographic Data Mining and Knowledge Discovery”
4. Dunham H. M. “Data Mining. Introductory and Advanced Topics. Part III”
5. Tama A. B. “Introduction to Data Mining”6. Eick F. C. “Brief Introduction to Spatial Data
Mining”
25