Upload
krishver2
View
741
Download
0
Embed Size (px)
Citation preview
1
Spatial Data Mining
Spatial Database
2
Stores a large amount of space-related dataMapsRemote Sensing Medical ImagingVLSI chip layout
Have Topological and distance informationRequire spatial indexing, data access,
reasoning ,geometric computation and knowledge representation techniques
Spatial Data Mining
3
Extraction of knowledge, spatial relationships from spatial databases
Can be used for understanding spatial data and spatial relationships
Applications:GIS, Geomarketing, Remote Sensing, Image
database exploration, medical imaging, Navigation
ChallengesComplexity of spatial data types and access
methodsLarge amounts of data
Cont.
4
Non-spatial InformationSame as data in traditional data miningNumerical, categorical, ordinal, boolean, etc
e.g., city name, city populationSpatial Information
Spatial attribute: geographically referenced Neighborhood and extentLocation, e.g., longitude, latitude, elevation
Spatial data representationsRaster: gridded spaceVector: point, line, polygonGraph: node, edge, path
Spatial Data
5
Cont.
6
Statistical techniques
7
Popular approach to analyze spatial dataAssumes independence among spatial dataCan be performed only by expertsDo not work well with symbolic values
Spatial Data Warehousing
8
Spatial data warehouse: Integrated, subject-oriented, time-variant, and nonvolatile spatial data repository.It consists of both spatial and non spatial in support of spatial
data mining and spatial-data-related decision-making processes.
Spatial data cube: multidimensional spatial databaseBoth dimensions and measures may contain spatial
components.Challenging issues:
Spatial data integration: a big issueStructure-specific formats (raster- vs. vector-based, OO vs. relational
models, different storage and indexing, etc.)Vendor-specific formats (ESRI, MapInfo, Intergraph, IDRISI, etc.)
Realization of Fast and flexible OLAP in spatial data warehouses.
Dimensions and Measures in Spatial Data Warehouse
9
Dimensionsnon-spatial
e.g. “25-30 degrees” generalizes to“hot” (both are strings)spatial-to-non spatial
e.g. Seattle generalizes to description “Pacific Northwest” (as a string)
spatial-to-spatiale.g. Seattle generalizes to Pacific Northwest (as a spatial
region)Measures
numerical (e.g. monthly revenue of a region)distributive (e.g. count, sum)algebraic (e.g. average)holistic (e.g. median, rank)
spatialcollection of spatial pointers (e.g. pointers to all regions
with temperature of 25-30 degrees in July)
Example: British Columbia Weather Pattern Analysis
10
InputA map with about 3,000 weather probes scattered in B.C.
Recording daily data for temperature, precipitation, wind velocity, etc. for a designated small area and transmitting signal to a provincial weather station.
Data warehouse using star schemaOutput
A map that reveals patterns: merged (similar) regionsGoals
Interactive analysis (drill-down, slice, dice, pivot, roll-up)Fast response timeMinimizing storage space used
ChallengeA merged region may contain hundreds of “primitive” regions
(polygons)
Star Schema of the BC Weather WarehouseSpatial data
warehouseDimensions
region_nametimetemperatureprecipitation
Measurementsregion_mapareacount
11Fact tableDimension table
12
Can we precompute all of the possible spatial merges and store them in the corresponding cuboid cells of a spatial data cube?
Probably not.It requires multi-megabytes of storage.On-line computation is slow and expensive.
Dynamic Merging of Spatial Objects
13
Methods for Computing Spatial Data Cubes
14
On-line aggregation: collect and store pointers to spatial objects in a spatial data cubeexpensive and slow, need efficient aggregation
techniquesPrecompute and store all the possible
combinationshuge space overhead
Precompute and store rough approximations in a spatial data cubeaccuracy trade-off, MBR
Selective computation: only materialize those which will be accessed frequentlya reasonable choice
Mining Spatial Association and Co-location Patterns
15
Spatial association rule: A B [s%, c%]A and B are sets of spatial or non-spatial
predicatesTopological relations: intersects, overlaps, disjoint, etc.Spatial orientations: left_of, west_of, under, etc.Distance information: close_to, within_distance, etc.
s% is the support and c% is the confidence of the rule
Examples is_a(x, “School”) ^ Close_to(x, “Sports_Center”)
close_to(x, “Park”) [7%, 85%]
Progressive Refinement
16
Progressive Refinement:spatial association mining needs to evaluate multiple
spatial relationships among a large no. of spatial object – expensive.
Hierarchy of spatial relationship:First search for rough relationship and then refine itSuperset coverage property – all the potential answers
should be perserved (i.e.false-positive test).Two-step mining of spatial association:
Step 1: Rough spatial computation (as a filter) Using MBR for rough estimation
Step2: Detailed spatial algorithm (as refinement) Apply only to those objects which have passed the rough
spatial association test (no less than min_support)
Spatial co-locations
17
Just what one really wants to explore.Based on the property of spatial
autocorrelation, interesting features likely coexist in closely located regions.
Efficient methods - Apriori , progressive refinement,etc.
18
Spatial Cluster Analysis
19
• Mining clusters—k-means, k-medoids, hierarchical, density-based, etc.
• Analysis of distinct features of the clusters
Spatial Classification
20
Analyze spatial objects to derive classification schemes, such as decision trees, in relevance to certain spatial properties (district, highway, river, etc.)Classifying medium-size families according to income,
region, and infant mortality ratesMining for volcanoes on Venus
Employ methods such as:Decision-tree classification, Naïve-Bayesian classifier +
boosting, neural network, genetic programming, etc.
Spatial Trend Analysis
21
FunctionDetect changes and trends along a spatial dimensionStudy the trend of non-spatial or spatial data changing
with spaceApplication examples
Observe the trend of changes of the climate or vegetation with increasing distance from an ocean
Crime rate or unemployment rate change with regard to city geo-distribution.
Traffic flows in highways and in cities.
Mining Raster Databases
22
Vector data MiningMapsGraphsMolecular chains
Raster data miningSatellite Images
23
Other Applications
24
Spatial data mining is used in NASA Earth Observing System (EOS): Earth
science dataNational Inst. of Justice: crime mapping Census Bureau, Dept. of Commerce: census
data Dept. of Transportation (DOT): traffic dataNational Inst. of Health(NIH): cancer clusters Commerce, e.g. Retail Analysis