Title: Spatial Data Mining in Geo-Business

Title: Spatial Data Mining in Geo-Business

Overview

Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the

generation of a customer density surface Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic

distributions of mapped data Interpolating Spatial Distributions — discusses the basic concepts underlying spatial interpolation Interpreting Interpolation Results — describes the use of “residual analysis” for evaluating spatial interpolation performance Characterizing Data Groups — describes the use of “data distance” to derive similarity among the data patterns in a set of map layersIdentifying Data Zones — describes the use of “level-slicing” for classifying locations with a specified data pattern (data zones)Mapping Data Clusters — describes the use of “clustering”

to identify inherent groupings of similar data patterns Mapping the Future — describes the use of “linear regression” to develop prediction equations relating dependent and

independent map variables Mapping Potential Sales — describes an extensive geo-business application that combines retail competition analysis and

product sales prediction

Paper available online at www.innovativegis.com/basis/present/GeoTec08/

http://www.innovativegis.com/basis/present/GeoTec08/

ClassifiedDensityLevels

Classify

Density Map

DensitySurface Totals

Density Surface Analysis

Counts the number of customers (points) within in each grid cell

Customer Street

Address

Customer GIS

Location

CustomerCounts(# per cell)

Geo-Coding Vector to Raster

2D grid display of customer counts

Roving Window

Calculates the total number of customers within

a roving window– customer density

2D perspective display of density contours

3D surface plot

91

Identifying Pockets of High Density

CustomerDensity(Map Surface)

Customer Density

(Non-spatial Statistics)

Unusually High = Mean + 1 Standard Deviation

Grid-based Analysis Frame (Keystone Concept)

Customer Database(non-spatial)

…appends Lat, Lon, Column, Row location to customer records

…GeoCoding plots customers address on the streets map

Vector (point)

Raster (cell)

Analysis Frame …V to R Conversion plots customers location in the analysis frame (grid)

Latitude, Longitude, C, R

Customer Database

(spatial)

Point Samples

Surface Modeling (Spatial Interpolation)

Surface Map

“Spikes ‘n Blanket”

Avg = 42.9

66.3

“Spikes”

66.3

…“maps the variance” by using geographic position to help explain the differences in the sample values.

IDW Interpolation (Inverse Distanced Weighted)

5) Move window to next grid location and repeat

2) Calculate distance from location to data points— Pythagorean Theorem

#11distance = 22.80 #14distance = 26.08 #15distance = 6.32 #16distance = 14.14

3) Weight-average values in the window based on distance to grid location— (1/Distance)2 * Value“closer has more influence”

X

#11#11

#14#14#15#15

#16#16

Sampled Data

1) Identify data points in window—

#11value = 56.9 #14value = 22.5 #15value = 52.3 #16value = 66.3

#16#15#14

#11

x

X

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

4) Assign weight-averaged value— 53.35

Average vs. IDW Interpolated Surface

Average

IDW Surface

RedsAvg>IDW

GreensAvg<IDW

Min = -26.1Max = 29.5

Difference Surface(IDW – Average)

IDW - Average

IDW vs. Krig Interpolated Surfaces

Krig Surface

IDW Surface

Min = -14.8Max = 5.0

Difference Surface(IDW – Krig)

RedsKrig>IDW

GreensKrig<IDW

IDW - Krig

Assessing Relationships Among MapsHousing Density

Home Value

Home Age

(Units/ac)

($K)

(Years)

South hasLower Density

South hasHigher Values

South hasNewer Homes

Geographic Space Data Space

Density

Value

Age

Geographic Space – relative spatial position of measurements

Point #1

Point #2

Data Space – relative numerical magnitude of measurements

Comparison Point #1 D= Low (2.4 units/ac) V= High ($407,000) A= Low (18.3 years)

Least Similar Point #2 D= High (4.8 units/ac) V= Low ($190,000) A= High (51.2 years)

Data Similarity is inversely proportional to Data Distance

…as data distance increases, the map values for two locations are less similar

Assessing Map Similarity

“Data Distance” determines similarity among data patterns

…the farthest away point in data space (least similar) is set 0 and the comparison point is set to 100 —

Data Space

05101520253035404550556065707580859095100

PercentSimilar

Least similar point

Comparison point

Least Similar Point = 4.8, 190, 51.2

Comparison Point = 2.4, 407, 18.3

…all other Data Distances are scaled in terms of their relative similarity as “percent similar” to the comparison point (0 to 100)

Geographic Space

Identifying Data Patterns of InterestHousing Density

Geographic Space Data Space Geographic Space

Mean = 3.56

+StDev = 0.80LevelMin = 4.36

Unusually High

67.2 = -StDev189.8 = LevelMax

257.0 = Mean

Home Value

Unusually Low

Level-Slicing Classifier (two variables)

Data Space

Unusually HighHousing Density

Unusually LowHome Value

Unusually High Density

and Low Value

Geographic Space

Level-Slicing Classifier (three variables)

…common “data zones” can be mapped by identifying specific levels of each mapped variable

then adding the binary maps

Geographic Space

…locates combinations of selected measurements

(high D, low V, high A)

1 + 2 + 4 = 7

(high D, low V but not high A)

1 + 2 + 0 = 3

Data Space

…identifies combinations of selected

measurements

(high D, low V, high A)

Spatial Data Clustering

…“data clusters” are identified as groups of neighboring data points in Data Space, and then mapped as

corresponding grid cells in Geographic Space

Geographic Space …maps common data patterns (clusters)

Relatively high D, low V and high A

Relatively low D, high V and low A

Three Clusters

Four Clusters

TwoClusters

Data Space…plots and identifies groups of similar data values

Spatial Regression (prediction equation)

Low

High

Low

High

HousingDensity

HomeValue

HomeAge

LoanConcentration

…relationship between Loan Concentration and independent variables housing Density, Value and Age

Loan Concentration

vs. Housing Density

Y = 26 -5.7 * Xdensity [R2 = 40%]

V

Loan Concentration

vs. Home Value

Y = -13 +0.074 * Xvalue [R2 = 46%]

V

Loan Concentration

vs. Home Age

Y = 17 - 0.074 * Xage [R2 = 23%]

V

Competition Analysis (Spatial Analysis Steps)

Build travel time maps for entire market area• Compute travel time from every location to our store

• This requires grid-based map analysis software

• Update customer record with travel time to our store

• Add this to every non-customer record in trading area

Step 1

Repeat for every competitor• Update every customer record with travel time to

competitor store

• Add to every non-customer record in trading area

Step 2

Compute Travel Time Gain for travel to main store• Every customer and non-customer record is updated

• The greater gain indicates lower travel effort to visit our store

Step 3

Predictive Modeling (Spatial Statistics Steps)

Build analytic dataset from customer data• Geocoding information

• Transactions, sales, product category purchases

• Visitation frequency, recency, spend

• Customer Segment, travel times, demographics

Step 4

Build predictive models• Probability of Visitation (not possible for this demo)

• Probability of Purchase by Product Category

• Expected Sales and Transactions

• Use store travel time and all competitive differences

Step 5

Map the scores• The distribution of the scores provide visual evidence

of the effects of travel time and competitive pressure

• Spatial hypotheses can be tested and evaluated

Step 6

Map Analysis Framework

Mapping and Geo-query

While discrete sets of points, lines and polygons have served our mapping demands for over 8,000 years and keep us from getting lost…

…the expression of mapped data as continuous spatial distributions (surfaces) provides a new foothold for the contextual and numerical analysis of mapped data—

“Thinking with Maps”

References

Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through the

generation of a customer density surface Linking Numeric and Geographic Distributions — investigates the link between numeric and geographic

distributions of mapped data Interpolating Spatial Distributions — discusses the basic concepts underlying spatial interpolation Interpreting Interpolation Results — describes the use of “residual analysis” for evaluating spatial interpolation performance Characterizing Data Groups — describes the use of “data distance” to derive similarity among the data patterns in a set of map layersIdentifying Data Zones — describes the use of “level-slicing” for classifying locations with a specified data pattern (data zones)Mapping Data Clusters — describes the use of “clustering”

to identify inherent groupings of similar data patterns Mapping the Future — describes the use of “linear regression” to develop prediction equations relating dependent and

independent map variables Mapping Potential Sales — describes an extensive geo-business application that combines retail competition analysis and

product sales prediction

Paper available online at www.innovativegis.com/basis/present/GeoTec08/

http://www.innovativegis.com/basis/present/GeoTec08/

www.innovativegis.com/basis/present/GeoTec08/

…to download this PowerPoint slide set

Spatial Data Mining in Geo-Business

Weighted Average Calculations for Inverse Distance Weighting (IDW) Spatial Interpolation Technique

Evaluating Interpolation Performance

…Residual Analysis

is used to evaluate interpolation performance

(Krig at .03 Normalized Error is best)

Average IDW Krig

Documents

Title: Spatial Data Mining in Geo-Business