44
Introduction to GIS Modeling Introduction to GIS Modeling Week 9 — Spatial Data Mining Week 9 — Spatial Data Mining GEOG 3110 –University of Denver GEOG 3110 –University of Denver Presented by Presented by Joseph K. Berry Joseph K. Berry W. M. Keck Scholar, Department of Geography, University of Denver W. M. Keck Scholar, Department of Geography, University of Denver Linking Numeric and Geographic Patterns Linking Numeric and Geographic Patterns : : Map comparison Similarity maps — Clustering mapped Map comparison Similarity maps — Clustering mapped data data Investigating map correlation — Developing prediction Investigating map correlation — Developing prediction models; Assessing prediction results models; Assessing prediction results Basic Descriptive Statistics and its GIS Basic Descriptive Statistics and its GIS Expression Expression : : Normalizing maps; Mapping spatial dependency Normalizing maps; Mapping spatial dependency

Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Embed Size (px)

DESCRIPTION

Presented by Joseph K. Berry W. M. Keck Scholar, Department of Geography, University of Denver. Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver. Basic Descriptive Statistics and its GIS Expression : Normalizing maps; Mapping spatial dependency. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Introduction to GIS ModelingIntroduction to GIS Modeling Week 9 — Spatial Data MiningWeek 9 — Spatial Data Mining

GEOG 3110 –University of DenverGEOG 3110 –University of Denver

Presented byPresented by Joseph K. BerryJoseph K. BerryW. M. Keck Scholar, Department of Geography, University W. M. Keck Scholar, Department of Geography, University

of Denverof Denver

Linking Numeric and Geographic PatternsLinking Numeric and Geographic Patterns:: Map comparison — Similarity maps — Clustering mapped dataMap comparison — Similarity maps — Clustering mapped dataInvestigating map correlation — Developing prediction models; Investigating map correlation — Developing prediction models;

Assessing prediction resultsAssessing prediction results

Basic Descriptive Statistics and its GIS ExpressionBasic Descriptive Statistics and its GIS Expression::Normalizing maps; Mapping spatial dependencyNormalizing maps; Mapping spatial dependency

Page 2: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Kicking at the Finish Kicking at the Finish (Waning Class Moments)(Waning Class Moments)

The last of the “The last of the “Learning OpportunitiesLearning Opportunities” that remain are…” that remain are…

• Exercise #9Exercise #9 on Spatial Data Mining (or paper) for 50 points on Spatial Data Mining (or paper) for 50 points

• Exam #2Exam #2 on Surface Modeling, Spatial Data Mining and Future Directions on Surface Modeling, Spatial Data Mining and Future Directions

material for 150 pointsmaterial for 150 points

• Optional ExercisesOptional Exercises for up to 50 extra credit points for up to 50 extra credit points (can only improve your grade)(can only improve your grade)

• Grad Student PresentationsGrad Student Presentations on a topic of their choice for 100 points on a topic of their choice for 100 points (15-minute (15-minute PowerPoint at last class meeting)PowerPoint at last class meeting)

Special, special offer Special, special offer providedprovided you fully participate in the study question “group you fully participate in the study question “group study” study” you can choose not to take the second examyou can choose not to take the second exam— —

Fine printFine print: I will simply allocate the points for the exam according to the current percentage of all of your graded materials : I will simply allocate the points for the exam according to the current percentage of all of your graded materials which means which means not taking the exam has no effect on your gradenot taking the exam has no effect on your grade. .

If you choose to take the exam and get a grade below your current percentage of all graded materials, the exam grade will If you choose to take the exam and get a grade below your current percentage of all graded materials, the exam grade will be ignored …be ignored …therefore taking the exam can only improve your gradetherefore taking the exam can only improve your grade. .

22ndnd Exam Study Questions Exam Study Questions ……posted Friday 3/8 by 12:00noonposted Friday 3/8 by 12:00noon. Class initiative to . Class initiative to “group study” to collectively address the “group study” to collectively address the 24 study24 study

Midterm Exam Midterm Exam ……you will download and take the 2-hour exam online (honor system) you will download and take the 2-hour exam online (honor system) sometime between sometime between 10:00 am, Friday, March 15 and and 5:00 pm, Tuesday, March 19

Page 3: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Spatial Statistics Operations – Numerical ContextSpatial Statistics Operations – Numerical Context

Map Analysis Toolbox

Grid Map Layers

GIS and Statistical Perspectives GIS and Statistical Perspectives (SS)(SS)

BerryBerry

Surface Modeling (Density Analysis, Spatial Interpolation, Map Generalization)

Spatial Data Mining (Descriptive, Predictive, Prescriptive)

GIS Perspective:

Basic Descriptive Statistics (Min, Max, Median, Mean, StDev, etc.)

Basic Classification (Reclassify, Contouring, Normalization)

Map Comparison (Joint Coincidence, Statistical Tests)

Unique Map Statistics (Roving Window and Regional Summaries)

Surface Modeling (Density Analysis, Spatial Interpolation)

Advanced Classification (Map Similarity, Maximum Likelihood, Clustering)

Predictive Statistics (Map Correlation/Regression, Data Mining Engines)

Statistical Perspective:

…last weekSurface Modeling (Density Analysis, Spatial Interpolation)

Page 4: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Basic Concepts in Statistics Basic Concepts in Statistics (SN_Curve Shape)(SN_Curve Shape)

Kurtosis …shape(positive= peaked; negative= flat)

(Berry)See Beyond Mapping III , Topic 7, Linking Data Space and Geographic Space

Page 5: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Basic Concepts in Statistics Basic Concepts in Statistics (SN_Curve Shape continued)(SN_Curve Shape continued)

…multi-modal

…Skewness

(positive= right; negative= left)

(Berry)See Beyond Mapping III , Topic 7, Linking Data Space and Geographic Space

Page 6: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

……aa HistogramHistogram depicts thedepicts the numeric distributionnumeric distribution (Mean/Central Tendency focus)(Mean/Central Tendency focus) ……aa Map Map depicts thedepicts the geographic distributiongeographic distribution (Variance/Variability focus)(Variance/Variability focus)

……Data ValuesData Values link the two viewslink the two views——

Click anywhere on Click anywhere on the the MapMap and the and the Histogram interval Histogram interval is highlightedis highlighted

Click on the Click on the HistogramHistogram interval interval and the Map and the Map locations are locations are highlightedhighlighted

Linking Numeric & Geographic DistributionsLinking Numeric & Geographic Distributions

(Berry)(Berry)(See (See Beyond Mapping III, “Topic 7” for more information), “Topic 7” for more information)

……simply simply different waysdifferent ways to organize and analyze to organize and analyze ““mapped datamapped data” ” (x,y= Where and z= What)(x,y= Where and z= What)

Key Concept

Page 7: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

See See www.innovativegis.com/basis/Download/IJRSpaper/

An Analytic Framework for GIS ModelingAn Analytic Framework for GIS Modeling

(Berry)(Berry)

(Last week) (Last week) Surface ModellingSurface Modelling operations involve operations involve creating continuous spatial distributions from creating continuous spatial distributions from point sampled data point sampled data ((univariateunivariate))..

(This week) (This week) Spatial Data MiningSpatial Data Mining operations operations involve involve characterizing numerical patterns and characterizing numerical patterns and relationships among mapped data relationships among mapped data ((multivariatemultivariate))..

Page 8: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Preprocessing Mapped Data Preprocessing Mapped Data (Preprocessing Types 1-3)(Preprocessing Types 1-3)

(Berry)

CalibrationCalibration 1 1 — “tweaking” the values… sort of like a slight turn on a — “tweaking” the values… sort of like a slight turn on a

bathroom scale to alter the reading to what you know is your ‘true weight’ bathroom scale to alter the reading to what you know is your ‘true weight’

TranslationTranslation 22 — converts map — converts map

values into appropriate units for values into appropriate units for analysis, such as feet into meters analysis, such as feet into meters or bushels per acre (measure of or bushels per acre (measure of volume) into tons per hectare volume) into tons per hectare (measure of mass)(measure of mass)

Adjustment/CorrectionAdjustment/Correction 33 — —

dramatically changes the dramatically changes the data, such as post processing data, such as post processing GPS coordinates and/or GPS coordinates and/or Mass Flow LagMass Flow Lag adjustment adjustment

PreprocessingPreprocessing involves involves conversion of raw data into consistent valuesconversion of raw data into consistent values

that accurately represent mapped conditionsthat accurately represent mapped conditions (4 types of preprocessing)(4 types of preprocessing)

Antenna OffsetGPS Fix Delay

Overlap and Multiple PassesMass Flow Lag and Mixing

… “trolling” for data

Page 9: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Normalizing Mapped Data Normalizing Mapped Data (4(4thth type of preprocessing) type of preprocessing)

NormalizationNormalization — — involves standardization of a data set, usually for involves standardization of a data set, usually for comparison among different types of data…comparison among different types of data…

GoalGoal ……Norm_GOAL = (mapValue / 250 ) * 100Norm_GOAL = (mapValue / 250 ) * 100 0-1000-100 ……Norm_0-100 = ((mapValue – min) * 100) / (max – min) + 0Norm_0-100 = ((mapValue – min) * 100) / (max – min) + 0SNVSNV ……Norm_SNV = ((mapValue - mean) / stdev) * 100 Norm_SNV = ((mapValue - mean) / stdev) * 100

(Berry)

Key ConceptKey Concept

Note: the generalized rescaling equation is…

  Normalize a data set to a fixed range of Rmin to Rmax = (((X-Dmin) * (Rmax – Rmin)) / (Dmax – Dmin)) + Rmin

…where Rmin and Rmax is the minimum and maximum values for the rescaled range, Dmin and Dmax is the minimum and maximum values for the input data and X is any value in the data set to be rescaled.

See Beyond Mapping III , Topic 18, Understanding Grid-based Data

Since normalization involves scalar Since normalization involves scalar mathematics (constants), the mathematics (constants), the pattern of the numeric distribution pattern of the numeric distribution (histogram) and the(histogram) and the spatial spatial distribution distribution (map)(map) do not change do not change

……same relative distributionssame relative distributions

Norm_GOAL = (Yield_Vol / 250 ) * 100Norm_GOAL = (Yield_Vol / 250 ) * 100

……generates a generates a standardized mapstandardized map based based on a yield goal of 250 bushels/acre. on a yield goal of 250 bushels/acre. This map can be used in analysis with This map can be used in analysis with other goal-normalized maps, even other goal-normalized maps, even from different cropsfrom different crops

“apples and oranges to mixed fruit scale”

Page 10: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Proximity StratificationProximity Stratification

……proximity to proximity to high yieldhigh yield

……unusually high yieldunusually high yield

> Average + 1Stdev > Average + 1Stdev

……Yield mapYield map

(Berry)

……proximity to proximity to field edgefield edge

……Proximity mapProximity map identifies the distance identifies the distance from point, line or from point, line or polygon features to all polygon features to all other locationsother locations

Edge effects

“Sweet Spot” (interior)

……StratificationStratification partitions partitions the data the data (numeric) or the project (numeric) or the project area (spatial) into area (spatial) into logical groups—logical groups—

“High Yield”vicinity

Far:

Close

Page 11: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Summarizing Map RegionsSummarizing Map Regions (template/data)(template/data)

(Berry)

Soil Type Soil Type

VeVeVdCVdCBIBBIBBIABIATuCTuCHvBHvB

PavgPavg15.015.012.812.811.211.214.614.610.510.511.311.3

……average phosphorous average phosphorous level for each soil typelevel for each soil type

……creates a map summarizing values from creates a map summarizing values from a a data mapdata map (Phosphorous levels) that (Phosphorous levels) that coincide with the categories of a coincide with the categories of a template template mapmap (Soil types) or stratification (Soil types) or stratificationpartitioningpartitioning

BIBBIB

Phosphorous levels

SoilTypes

……average P-level for each soil unit average P-level for each soil unit (clump first before COMPOSITE)(clump first before COMPOSITE)

Overall BIA Pavg = 14.6

15.5

13.6

8.6

IndividualBIA clumps

Page 12: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Data AnalysisData Analysis (establishing relationships)(establishing relationships)

(Berry)

On-farm studiesOn-farm studies, such as seed hybrid performance, can be conducted using actual farm conditions…, such as seed hybrid performance, can be conducted using actual farm conditions…

……management action recommendations are based on management action recommendations are based on local relationshipslocal relationships instead of instead of Experiment Station research hundreds of miles awayExperiment Station research hundreds of miles away

……is is radically changing research and management practicesradically changing research and management practices in agriculture and in agriculture and numerous other fields from business to epidemiology and natural resourcesnumerous other fields from business to epidemiology and natural resources

Page 13: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Comparing Discrete Maps Comparing Discrete Maps (Multivariate analysis)(Multivariate analysis)

Spatial PrecisionSpatial Precision ((WhereWhere — boundaries) — boundaries)

of Points, Lines and Areas of Points, Lines and Areas (polygons) is a primary (polygons) is a primary

concern of GIS, but we are concern of GIS, but we are often less concerned withoften less concerned with

Thematic Accuracy Thematic Accuracy ((WhatWhat — map values) — map values)

(Berry)

Thematic CategorizationThematic Categorization

……we often we often represent continuous represent continuous spatial data (map surfaces) as a spatial data (map surfaces) as a

set of discrete polygonsset of discrete polygons

Which classified map is correct? Which classified map is correct? How similar are the three maps?How similar are the three maps?

HighMedium

Low

See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning

Page 14: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Comparing Discrete MapsComparing Discrete Maps

(Berry)

Two ways to compare Discrete Maps…Two ways to compare Discrete Maps…

Coincidence SummaryCoincidence Summary Proximal AlignmentProximal Alignment

……Coincidence SummaryCoincidence Summary generates a cross-tabular generates a cross-tabular listing of the intersection of listing of the intersection of two maps.two maps.

Table InterpretationTable Interpretation Diagonal Diagonal (Same)(Same)

Off-diagonal Off-diagonal (Above/Below)(Above/Below)

Percentages Percentages (% Same)(% Same)

Overall PercentageOverall Percentage ((631+297+693)/1950)*100= ((631+297+693)/1950)*100= 83%83% ((475+297+563)/1950)*100=((475+297+563)/1950)*100= 68%68%

Raster versus VectorRaster versus Vector

693

See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning

Page 15: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Comparing Discrete MapsComparing Discrete Maps (Coincident Summary) (Coincident Summary)

(Berry)See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning

Two ways to compare Discrete Maps…Two ways to compare Discrete Maps…

Coincidence SummaryCoincidence Summary Proximal AlignmentProximal Alignment

……Coincidence SummaryCoincidence Summary generates a cross-tabular generates a cross-tabular listing of the intersection of listing of the intersection of two maps.two maps.

Table InterpretationTable Interpretation Diagonal Diagonal (Same)(Same)

Off-diagonal Off-diagonal (Above/Below)(Above/Below)

Percentages Percentages (% Same)(% Same)

Overall PercentageOverall Percentage ((631+297+693)/1950)*100= ((631+297+693)/1950)*100= 83%83% ((475+297+563)/1950)*100=((475+297+563)/1950)*100= 68%68%

Raster versus VectorRaster versus Vector

Map2: Med-- 104 + 297 + 225 = 626; (297/626) *100= 47 percent matched

Map3: Med-- 260 + 297 + 335= 912; (297/912) *100= 33 percent matched

Map2

Map3

Map1

Map1

……helpfulhelpful in in answering answering Question 2Question 2

475 + 297 + 563 = 1335; (1335/1950) *100= 68 percent matched

631 + 297 + 693 = 1621; (1621/1950) *100= 83 percent matched

Page 16: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Comparing Discrete MapsComparing Discrete Maps (Proximal Alignment) (Proximal Alignment)

Two ways to compare Discrete Maps…Two ways to compare Discrete Maps…

Coincident SummaryCoincident Summary Proximal AlignmentProximal Alignment

……Proximal AlignmentProximal Alignment isolates isolates a categorya category on one on one of the maps, generates its of the maps, generates its proximity, then identifies proximity, then identifies the proximity values that the proximity values that align with the same align with the same category on the other map.category on the other map.

Table InterpretationTable Interpretation ZerosZeros (Agreement) (Agreement) ValuesValues (> Disagreement) (> Disagreement)

PA Index (average)PA Index (average)

(Berry)

Proximity_Map1_Category1 * Binary_Map3_Category1Proximity_Map1_Category1 * Binary_Map3_Category1

……non-zero values identify non-zero values identify changeschanges and and how far awayhow far away

See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning

Page 17: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Comparing Map Surfaces Comparing Map Surfaces (Statistical Tests)(Statistical Tests)

Three ways to compare Map Surfaces…Three ways to compare Map Surfaces…

Statistical TestsStatistical Tests Percent DifferencePercent Difference Surface ConfigurationSurface Configuration

(Berry)

Box-and-whiskergraphs

See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning

……must be must be quantitativequantitative isoplethisopleth data data

……Statistical TestsStatistical Tests compare one set of cell values to that of another based on the compare one set of cell values to that of another based on the differences in the distributions of the datadifferences in the distributions of the data— 1) data sets— 1) data sets (partition or coincidence; (partition or coincidence;

continuous or sampled) 2) statistical procedure (t-Test, f-Test, etc.) continuous or sampled) 2) statistical procedure (t-Test, f-Test, etc.)

Table 1Table 1

Page 18: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Comparing Map Surfaces Comparing Map Surfaces (%Difference)(%Difference)

(Berry)

Three ways to compare Map Surfaces…Three ways to compare Map Surfaces…

Statistical TestsStatistical Tests Percent DifferencePercent Difference Surface ConfigurationSurface Configuration

See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning

Question 3Question 3

……Percent DifferencePercent Difference capitalizes on the spatial arrangement of the values by capitalizes on the spatial arrangement of the values by comparing the values at each map locationcomparing the values at each map location— %Difference Map, %Difference Table— %Difference Map, %Difference Table

Table 2Table 2

Page 19: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Comparing Map Surfaces Comparing Map Surfaces (Surface Configuration)(Surface Configuration)

(Berry)

Three ways to compare Map Surfaces…Three ways to compare Map Surfaces…

Statistical TestsStatistical Tests Percent DifferencePercent Difference Surface ConfigurationSurface Configuration

See Beyond Mapping III , Topic 10, Analyzing Map Similarity and Zoning

……Surface ConfigurationSurface Configuration capitalizes on the spatial arrangement of the values by capitalizes on the spatial arrangement of the values by comparing the localized trend in the values comparing the localized trend in the values — Slope Map, Aspect Map, Surface — Slope Map, Aspect Map, Surface

Configuration IndexConfiguration Index Table 3Table 3

Page 20: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Spatial Statistics Operations – Numerical ContextSpatial Statistics Operations – Numerical Context

Map Analysis Toolbox

Grid Map Layers

GIS and Statistical Perspectives GIS and Statistical Perspectives (SS)(SS)

BerryBerry

Surface Modeling (Density Analysis, Spatial Interpolation, Map Generalization)

Spatial Data Mining (Descriptive, Predictive, Prescriptive)

GIS Perspective:

Basic Descriptive Statistics (Min, Max, Median, Mean, StDev, etc.)

Basic Classification (Reclassify, Contouring, Normalization)

Map Comparison (Joint Coincidence, Statistical Tests)

Unique Map Statistics (Roving Window and Regional Summaries)

Surface Modeling (Density Analysis, Spatial Interpolation)

Advanced Classification (Map Similarity, Maximum Likelihood, Clustering)

Predictive Statistics (Map Correlation/Regression, Data Mining Engines)

Statistical Perspective:

…last weekSurface Modeling (Density Analysis, Spatial Interpolation)

Page 21: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Spatial DependencySpatial Dependency

(Berry)(Berry)See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Map StackMap Stack– relationships among maps are investigated by aligning grid – relationships among maps are investigated by aligning grid maps with a maps with a common configurationcommon configuration… #cols/rows, cell size and geo-reference.… #cols/rows, cell size and geo-reference.

Data ShishkebabData Shishkebab– each map represents a – each map represents a variablevariable, each grid space a , each grid space a casecase and each value a and each value a measurement measurement with all of the rights, privileges, and with all of the rights, privileges, and responsibilities of non-spatial mathematical , numerical and statistical analysisresponsibilities of non-spatial mathematical , numerical and statistical analysis

Spatial Variable DependenceSpatial Variable Dependence — what occurs at a location — what occurs at a location in geographic space is related to:in geographic space is related to:

• the conditions of that variable at nearby locations, termed the conditions of that variable at nearby locations, termed Spatial Spatial AutoAutocorrelationcorrelation ( (intra-variableintra-variable dependence) dependence)

Discrete Point Map Continuous Map Surface

Surface ModelingSurface Modeling

• the conditions of that variable at nearby locations, termed the conditions of that variable at nearby locations, termed Spatial Spatial AutoAutocorrelationcorrelation ((intra-variableintra-variable dependence) dependence)

• the conditions of other variables at that location, termed the conditions of other variables at that location, termed Spatial CorrelationSpatial Correlation ( (inter-variableinter-variable dependence) dependence) MultivariateMultivariate

Spatial Data MiningSpatial Data Mining

Page 22: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Visualizing Spatial RelationshipsVisualizing Spatial Relationships

(Berry)(Berry)

What spatial What spatial relationships do you relationships do you see?see?

Interpolated Spatial DistributionInterpolated Spatial Distribution

Phosphorous (P)

……do relatively high levels do relatively high levels of P often occur with high of P often occur with high levels of K and N?levels of K and N?

……how often?how often?

……where?where?

See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Page 23: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Identifying Unusually High MeasurementsIdentifying Unusually High Measurements……isolate areas with mean + 1 StDev (tail of normal curve)isolate areas with mean + 1 StDev (tail of normal curve)

(Berry)(Berry)See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Page 24: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Level SlicingLevel Slicing……simply multiply the two maps to identify simply multiply the two maps to identify joint coincidencejoint coincidence

1*1=1 coincidence (any 0 results in zero)1*1=1 coincidence (any 0 results in zero)

(Berry)(Berry)

Question 4Question 4

See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

2-dimensional data space Box

Page 25: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Multivariate Data Space Multivariate Data Space ……sum of a binary progression (1, 2 ,4 8, 16, etc.) provides sum of a binary progression (1, 2 ,4 8, 16, etc.) provides

level slice solutions for many map layerslevel slice solutions for many map layers

(Berry)(Berry)See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

3-dimensional space Cube(Parallel piped )

Page 26: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Calculating Data DistanceCalculating Data Distance……an n-dimensional plot depicts the multivariate distribution; the distance an n-dimensional plot depicts the multivariate distribution; the distance

between points determines the between points determines the relative similarity in data patternsrelative similarity in data patterns

……the closest floating ball is the least similar (largest data distance) from the comparison pointthe closest floating ball is the least similar (largest data distance) from the comparison point(Berry)(Berry)See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Page 27: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Identifying Map SimilarityIdentifying Map Similarity

(Berry)(Berry)

The green tones indicate field locations with fairly similar P, K and N levels; red tones indicate dissimilar The green tones indicate field locations with fairly similar P, K and N levels; red tones indicate dissimilar areas. areas.

……the relative data distance between the comparison point’s data pattern the relative data distance between the comparison point’s data pattern and those of all other map locations form a and those of all other map locations form a Similarity IndexSimilarity Index

Question 5Question 5

See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Page 28: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Clustering Maps for Data ZonesClustering Maps for Data Zones

(Berry)(Berry)

……groups of “floating balls” in data space groups of “floating balls” in data space identify locations in the field with similar data identify locations in the field with similar data patterns– patterns– data zonesdata zones

…a map stack is a spatially organized set of numbers

……fertilization rates vary for the different fertilization rates vary for the different clusters “clusters “on-the-flyon-the-fly””

Variable Rate ApplicationVariable Rate Application

Cyber-Farmer, Circa 1992Cyber-Farmer, Circa 1992

Question 6Question 6

See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Page 29: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Assessing Clustering ResultsAssessing Clustering Results

(Berry)(Berry)

……Clustering results can be roughly evaluated using basic statisticsClustering results can be roughly evaluated using basic statistics AverageAverage, , Standard DeviationStandard Deviation, , MinimumMinimum and and MaximumMaximum values within each cluster are calculated. Ideally values within each cluster are calculated. Ideally

the averages between the two clusters would be radically different and the standard deviations small—large the averages between the two clusters would be radically different and the standard deviations small—large difference between groups and small differences within groups.difference between groups and small differences within groups.

Standard Standard Statistical TestsStatistical Tests of two data setsof two data sets

Box and Box and Whisker PlotsWhisker Plots

to visualize to visualize differencesdifferences

See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Page 30: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

How Clustering Works How Clustering Works (IsoData algorithm)(IsoData algorithm)

(Berry)

1) The scatter 1) The scatter plot shows plot shows

Height versus Height versus Weight data that Weight data that might have been might have been collected in your collected in your

old geometry old geometry class class

2) 2) The data The data distancedistance to each to each

weight/height weight/height measurement pair measurement pair is calculated and is calculated and

the point is the point is assigned to the assigned to the

closest closest arbitrary arbitrary cluster center cluster center

3) The 3) The averageaverage X,Y coordinates X,Y coordinates of the assigned of the assigned students to each students to each

“working” “working” cluster is cluster is

calculated and calculated and used to used to

reposition the reposition the cluster centers cluster centers

4) 4) RepeatRepeat data data distances, cluster distances, cluster assignments and assignments and

repositioning until repositioning until no change in no change in

cluster cluster membership membership

(centers do not (centers do not move) move)

See Beyond Mapping III , Topic 7, Linking Data Space and Geographic Space

Data SpaceData Space

Page 31: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Map Correlation Map Correlation (How it works)(How it works)

…625 small data tables within 5 cell reach =

81map values for localized summary

Roving Window

= .562 localized

Localized Correlation

…where x = Elevation value and y = Slope valueand n = number of value pairs

= .432 map wider =…one large data table with 25rows x 25 columns =

625 map values for map wide summary

Elevation(Feet)

Slope(Percent)

Point- by-Point

Yslope = 38%

X elev = 2,063 feet

Spatially Aggregated Correlation

(Berry)

Page 32: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Spatially Aggregated Correlation

r = .432 map wide

Scalar Value – one value represents the overall non-spatial relationship between the

two map surfaces

StrongPositive

StrongNegative

MinimalCorrelation

Localized Correlation

r = .562 Localized

StrongPositive

StrongNegative

Map Variable – a continuous quantitative surface represents the localized spatial

relationship between the two map surfaces

Map Correlation Map Correlation (Aggregated and Localized results)(Aggregated and Localized results)

(Berry)

Page 33: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

An Analytic Framework for GIS ModelingAn Analytic Framework for GIS Modeling

(Berry)(Berry)

Spatial Data Mining Spatial Data Mining operations involve operations involve characterizing numerical patterns and characterizing numerical patterns and relationships among mapped data.relationships among mapped data.

See www.innovativegis.com/basis/Download/IJRSpaper/ See www.innovativegis.com/basis/Download/IJRSpaper/

Page 34: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Regression Regression (conceptual approach)(conceptual approach)

(Berry)(Berry)

A line is “fitted” in data space that balances the data so the differences from the A line is “fitted” in data space that balances the data so the differences from the points to the linepoints to the line (residuals) for all the points are minimized (residuals) for all the points are minimized

and theand the sum of the differences is zero sum of the differences is zero……

……the equation of the regression line is used to predict the the equation of the regression line is used to predict the ““DependentDependent” variable ” variable (Y axis)(Y axis) using one or more “ using one or more “IndependentIndependent” variables ” variables (X axis)(X axis)

Page 35: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Evaluating Prediction Maps Evaluating Prediction Maps (non-spatial)(non-spatial)

(Berry)(Berry)

Non-spatialNon-spatial ……R-squared R-squared value looks at the value looks at the deviations from the regression line; data deviations from the regression line; data

patterns about the regression linepatterns about the regression line

Page 36: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Map VariablesMap Variables

(Berry)(Berry)

……from a set of existing or easily measured from a set of existing or easily measured Independent MapIndependent Map variables variables

The The Dependent MapDependent Map variable is the one that you want to predict… variable is the one that you want to predict…

…derive from customer data

Question 7Question 7

See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business

Page 37: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Map Regression Results Map Regression Results (Bivariate)(Bivariate)

(Berry)(Berry)

The “The “R-squared indexR-squared index” provides a general measure of how good the predictions ought to be—” provides a general measure of how good the predictions ought to be—40%, 46% indicates a moderately weak predictors; 23% indicates a very weak predictor 40%, 46% indicates a moderately weak predictors; 23% indicates a very weak predictor

(R-squared index = 100% indicates a perfect predictor; 0% indicates an equation with no predictive capabilities)(R-squared index = 100% indicates a perfect predictor; 0% indicates an equation with no predictive capabilities)

See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business

Scatter plots and regression equations relating Loan Density Scatter plots and regression equations relating Loan Density to three candidate driving variables (Housing Density, Value and Age)to three candidate driving variables (Housing Density, Value and Age)

Loans= fn( Housing Density )

Loans= fn( Home value )

Loans= fn( Home Age )

Question 7Question 7Creates the Loan Creates the Loan

Concentration Concentration map surfacemap surface

Question 8Question 8Creates Creates

regression regression equation and equation and

RR22 index index

Page 38: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Generating a Multivariate Regression Generating a Multivariate Regression

(Berry)(Berry)

……a regression equation using all three independent map variables using a regression equation using all three independent map variables using multiple linear regressionmultiple linear regression is used to generate a prediction map is used to generate a prediction map

See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business

Question 9Question 9

Page 39: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Evaluating Regression Results Evaluating Regression Results (multiple linear)(multiple linear)

(Berry)(Berry)

……a regression equation using all three independent map variables usinga regression equation using all three independent map variables usingmultiple linear regression is used to generate a prediction map multiple linear regression is used to generate a prediction map

See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business

Optional Question 9-1Optional Question 9-1

……that is compared to the actual dependent variable data — that is compared to the actual dependent variable data — Error SurfaceError Surface

Page 40: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Using the Error Map to StratifyUsing the Error Map to Stratify

(Berry)(Berry)

One way to improve the predictions, however, is to stratify the data set by breaking it One way to improve the predictions, however, is to stratify the data set by breaking it into groups of similar characteristics into groups of similar characteristics …and then generating separate regressions…and then generating separate regressions

…generate a different regression for each of the stratified areas– red, yellow and green

……other stratification techniques include indigenous knowledge, other stratification techniques include indigenous knowledge, level-slicing and clusteringlevel-slicing and clustering

See Beyond Mapping III , Topic 28, Spatial Data Mining in Geo-Business

Optional Question 9-2Optional Question 9-2

Page 41: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Spatial Data Mining Spatial Data Mining (The Big Picture)(The Big Picture)

Mapped data that Mapped data that exhibits high exhibits high spatial spatial dependencydependency create create strong prediction strong prediction functions. As in functions. As in traditional statistical traditional statistical analysis, spatial analysis, spatial relationships can be relationships can be used to predict used to predict outcomesoutcomes

……the difference is the difference is that spatial statisticsthat spatial statisticspredicts wherepredicts where responses will be responses will be high or lowhigh or low

……making sense out of a map stackmaking sense out of a map stack

(Berry)(Berry)See Beyond Mapping III , Topic 16, Characterizing Patterns and Relationships

Page 42: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

An Analytic Framework for GIS ModelingAn Analytic Framework for GIS Modeling

(Berry)(Berry)

Spatial Data Mining Spatial Data Mining operations involve operations involve characterizing numerical patterns and characterizing numerical patterns and relationships among mapped data.relationships among mapped data.

See www.innovativegis.com/basis/Download/IJRSpaper/ See www.innovativegis.com/basis/Download/IJRSpaper/

Page 43: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Prescriptive MappingPrescriptive Mapping

(Berry)(Berry)

Four primary types of Four primary types of applied spatial modelsapplied spatial models: : Suitability—Suitability— mapping preferences mapping preferences (e.g., Habitat and Routing)(e.g., Habitat and Routing)

Economic— Economic— mapping financial interactions mapping financial interactions (e.g., Combat Zone and Sales Propensity)(e.g., Combat Zone and Sales Propensity)

Physical—Physical— mapping landscape interactions mapping landscape interactions (e.g., Terrain Analysis and Sediment Loading)(e.g., Terrain Analysis and Sediment Loading)

Mathematical/StatisticalMathematical/Statistical— — mapping mapping numerical relationshipsnumerical relationships……

― DescriptiveDescriptive math/stat models math/stat models summarizesummarize existing mapped data existing mapped data (e.g., Standard Normal Variable Map for Unusual Conditions and Clustering for Data Zones)(e.g., Standard Normal Variable Map for Unusual Conditions and Clustering for Data Zones)

― PredictivePredictive math/stat models develop math/stat models develop equationsequations relating mapped data relating mapped data(e.g., Map Regression for Equity Loan Prediction and Probability of Product Sales )(e.g., Map Regression for Equity Loan Prediction and Probability of Product Sales )

― PrescriptivePrescriptive math/stat models identify management math/stat models identify management actionsactions based on based on descriptive/predictive relationships descriptive/predictive relationships (e.g., Retail Marketing and (e.g., Retail Marketing and Precision AgPrecision Ag)…)…

Discrete ActionsDiscrete Actions: : If <condition(s)> Then <Action(s)>If <condition(s)> Then <Action(s)>

If P is If P is 0-40-4 ppm, then apply ppm, then apply 5050 lbs P lbs P22OO55/Acre/Acre

If P is If P is 4-84-8 ppm, then apply ppm, then apply 1818 lbs P lbs P22OO55/Acre/Acre

If P is If P is 8-128-12 ppm, then apply ppm, then apply 77 lbs P lbs P22OO55/Acre/Acre

If P is If P is >12 >12 ppm, then apply ppm, then apply 00 lbs P lbs P22OO55/Acre/Acre 0

50

0 12 moreP

PP22OO

55//

50

187

0

Phosphorous (P)

0

50

0 12 moreP

PP22OO

55//

Continuous ActionsContinuous Actions: : Equation defining action(s)Equation defining action(s)

Negative linear equation of the form:Negative linear equation of the form: y = aX

Negative exponential equation of the form: Negative exponential equation of the form: y = e-x

Page 44: Introduction to GIS Modeling Week 9 — Spatial Data Mining GEOG 3110 –University of Denver

Grid-Based Map AnalysisGrid-Based Map Analysis

Spatial Data MiningSpatial Data Mining investigates the “investigates the “numericalnumerical” relationships in mapped data…” relationships in mapped data…

DescriptiveDescriptive— — summary statistics, comparison, classification (e.g., clustering)summary statistics, comparison, classification (e.g., clustering) PredictivePredictive— math/stat — math/stat relationships among map layers (e.g., regression)relationships among map layers (e.g., regression) PrescriptivePrescriptive— — appropriate actions (e.g., optimization) appropriate actions (e.g., optimization)

Surface ModelingSurface Modeling maps the “maps the “spatial distributionspatial distribution” of point data…” of point data…

Density AnalysisDensity Analysis— — count/sum of points within a local windowcount/sum of points within a local window Spatial InterpolationSpatial Interpolation— — weighted average of points within a local windowweighted average of points within a local window Map GeneralizationMap Generalization— — fits mathematical relationship to all of the point datafits mathematical relationship to all of the point data

Spatial AnalysisSpatial Analysis investigates the “investigates the “contextualcontextual” relationships in mapped data…” relationships in mapped data…

ReclassifyReclassify— — reassigning map values (position; value; size, shape; contiguity) reassigning map values (position; value; size, shape; contiguity) OverlayOverlay— — map overlay (point-by-point; region-wide)map overlay (point-by-point; region-wide) DistanceDistance— proximity and connectivity (movement; optimal paths; visibility)— proximity and connectivity (movement; optimal paths; visibility) NeighborsNeighbors— — ”roving windows” (slope/aspect; diversity; anomaly)”roving windows” (slope/aspect; diversity; anomaly)

(Berry)(Berry)