70
Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Embed Size (px)

Citation preview

Page 1: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Geographic Information Systems(GIS): Spatial Analysis

November 1, 2005

Page 2: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Notes

Oslo ProjectGroups

Assignment

Due Date: December 15, 2005

Mid-term quiz 2: November 8

Progress in GI Science eSeminar Series

Page 3: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Existing Groups

1. Marita Sanni, Julie Aaraas, Kristin I. Dankel, Solveig Melå (4)2. Åslaug Enger Olsen, Maria Lyngstad, Guro Bakke Håndlykken

og Jorunn Randby (M3)3. Nina Ambro Knutsen, Ellen Winje og Leif Ingholm (3)4. Birte Mobraaten, Hans Petter Wiken, Silje Hernes and Bente Lise

Stubberud (4)5. Daniel Molin, Ida Sjølander, Anne-Lise Folland and Nicolai

Steineger (4)6. Hæge Skjæveland, Marie Aaberge, Cecilie Hirsch, Kaja Korsnes

Kristensen7. Urs Dippon, Steven huiching Yip, Harald Kvifte & Eirik Waag8. Marthe Stiansen, Marielle Stigum, Tomas Nesset,Andreas

Skjetne 9. Gjermund Steinskog (Archaeaology – M16-18)10. Solveig Lyby (Archaeaology - M10-12)10. Andreas Dyken, Håkon Grevbo, Terje-Andre Gudmundsen (3)

Page 4: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Project Examples from 2004

Tilgjengelighet til legesentre i Bydel GrorudInnvandrernes bosettingsmønsterDistinksjoner i Oslo: En Bourdieusk alanyse av ulikehet ved hjelp av geografiske informasonssystemerSosiale skiller i OsloSosiale ulikheter i OsloInntekt og boligstruktur i Oslo: med fokus på bydel Gamle OsloPrivatisering og innntektsnivå i bydel Vestre Aker

Page 5: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

GI Science eSeminar Series

Page 6: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Outline for Today’s lecture

What is spatial analysis?

Queries and reasoning

Measurements

Spatial Interpolation

Descriptive Summaries

Optimization

Hypothesis Testing

Page 7: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Spatial Analysis

Turns raw data into useful informationby adding greater informative content and value

Reveals patterns, trends, and anomalies that might otherwise be missed

Provides a check on human intuitionby helping in situations where the eye might deceive

Page 8: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Definitions

A method of analysis is spatial if the results depend on the locations of the objects being analyzed

move the objects and the results change

results are not invariant (i.e., they vary!) under relocation

Spatial analysis requires both attributes and locations of objects

a GIS has been designed to store both

Page 9: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Snow Map (cholera outbreaks in the 1850s)

Provides a classic example of the use of location to draw inferencesBut the same pattern could arise from contagion (cholera spread through the air)

if the original carrier lived in the center of the outbreakcontagion was the hypothesis Snow was trying to refute. Today, a GIS could be used to show a sequence of maps as the outbreak developedcontagion would produce a concentric sequence, drinking water a random sequence

Page 10: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Types of Spatial Analysis

There are literally thousands of techniques

Six categories are used in this course, each having a distinct conceptual basis:

Queries and reasoning

Measurements

Transformations

Descriptive summaries

Optimization

Hypothesis testing

Page 11: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Queries and Reasoning

A GIS can respond to queries by presenting data in appropriate views

and allowing the user to interact with each view

It is often useful to be able to display two or more views at once

and to link them together

linking views is one important technique of exploratory spatial data analysis (ESDA)

Page 12: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Catalog View

Shows folders, databases, and files on the left, and a preview of the contents of a selected data set on the

right. The preview can be used to query the data set’s metadata, or to look at a thumbnail map, or at a table of

attributes. This example shows ESRI’s ArcCatalog.

Page 13: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Map View

A user can interact with a map view to identify objects and query their attributes, to search for objects meeting specified criteria, or to find the

coordinates of objects. This illustration uses ESRI’s ArcMap.

Page 14: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Table View

Here attributes are displayed in the form of a table, linked to a map view. When objects are selected in the table, they are automatically highlighted in the map view, and vice versa. The table view can be used to answer simple

queries about objects and their attributes.

Page 15: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Measurements

Many tasks require measurement from maps

measurement of distance between two points

measurement of area, e.g. the area of a parcel of land

Such measurements are tedious and inaccurate if made by hand

measurement using GIS tools and digital databases is fast, reliable, and accurate

Page 16: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Measurement of Length

A metric is a rule for determining distance from coordinates

The Pythagorean metric gives the straight-line distance between two points on a flat plane

The Great Circle metric gives the shortest distance between two points on a spherical globe

given their latitudes and longitudes

Page 17: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Issues with Length Measurement

The length of a true curve is almost always longer than the length of its polyline or polygon representation

Page 18: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Issues with Length Measurement

Measurements in GIS are often made on horizontal projections of objects

length and area may be substantially lower than on a true three-dimensional surface

Page 19: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Measurement of Area

•Calculate and sum the areas of a series of polygons, formed by dropping perpendiculars to the x axis. Subtract the area of the extended trapezium (in this case, a rectangle).

•The area for each polygon is calculated as the difference in x times the average of y.

x1 x2

y1

y2

Page 20: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Measurement of Shape

Shape measures capture the degree of contortedness of areas, relative to the most compact circular shape

by comparing perimeter to the square root of area

normalized so that the shape of a circle is 1

the more contorted the area, the higher the shape measure

Page 21: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Shape as an indicator of gerrymandering in elections

The 12th Congressional District of North Carolina was drawn in 1992 using a GIS, and designed to be a majority-minority district: with a majority of African American voters, it could be expected to return an African American to Congress. This objective was

achieved at the cost of a very contorted shape. The U.S. Supreme Court eventually rejected the design.

Page 22: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Slope and Aspect

Calculated from a grid of elevations (a digital elevation model)

Slope and aspect are calculated at each point in the grid, by comparing the point’s elevation to that of its neighbors

usually its eight neighbors

but the exact method varies

in a scientific study, it is important to know exactly what method is used when calculating slope, and exactly how slope is defined

Page 23: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Alternative Definitions of Slope

The angle between the surface and the horizontal, range 0 to 90

The ratio of the change in elevation to the actual distance traveled, range 0 to 1

The ratio of the change in elevation to the horizontal distance traveled, range 0 to infinity

Page 24: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Transformations

Create new objects and attributes, based on simple rules

involving geometric construction or calculation

may also create new fields, from existing fields or from discrete objects

Page 25: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Buffering (Dilation)

Create a new object consisting of areas within a user-defined distance of an existing object

e.g., to determine areas impacted by a proposed highway

e.g., to determine the service area of a proposed hospital

Feasible in either raster or vector mode

Page 26: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Buffering

Point

Line

Polygon

Page 27: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Raster Buffering Generalized

Vary the distance buffered according to values in a friction layer

City limits

Areas reachable in 5 minutesAreas reachable in 10 minutesOther areas

Page 28: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Point in Polygon Transformation

Determine whether a point lies inside or outside a polygon (enclosure)

Basis for answering many simple queries

used to assign crimes to police precincts, voters to voting districts, accidents to reporting counties

Page 29: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Point in Polygon Algorithm

Draw a line from the point to infinity in any

direction, and count the number of intersections

between this line and each polygon’s

boundary. The polygon with an odd number of

intersections is the containing polygon: all other polygons have an

even number of intersections.

Page 30: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Polygon Overlay

Two case: for discrete objects and for fields

Discrete object case: find the polygons formed by the intersection of two polygons. There are many related questions, e.g.:

do two polygons intersect?

Which areas fall in Polygon A but not in Polygon B?

The complexity of computing polygon overlays was one of the greatest barriers to the development of vector GIS

Page 31: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Polygon Overlay, Discrete Object Case

In this example, two polygons are intersected to form 9 new polygons. One is formed from both input polygons; four are

formed by Polygon A and not Polygon B; and four are formed by Polygon B

and not Polygon A.

A B

Page 32: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Polygon Overlay, Field Case

Two complete layers of polygons are input, representing two classifications of the same area

e.g., soil type and land ownership

The layers are overlaid, and all intersections are computed creating a new layer

each polygon in the new layer has both a soil type and a land ownership

the attributes are said to be concatenated

The task is often performed in raster

Page 33: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Owner X

Owner Y

Public

Polygon overlay, field case

A layer representing a field of land ownership (colors) is overlaid on a layer of soil type (layers offset for

emphasis). The result after overlay will be a single layer with 5 polygons, each with a land ownership value and

a soil type.

Page 34: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Spurious or Sliver PolygonsIn any two such layers there will almost certainly be boundaries that are common to both layers

e.g. following riversThe two versions of such boundaries will not be coincidentAs a result large numbers of small sliver polygons will be created

these must somehow be removedthis is normally done using a user-defined tolerance

Page 35: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Overlay of fields represented as rasters

A B

The two input data sets are maps of (A) travel time from the urban area shown in black, and (B) county (red indicates County X, white indicates

County Y). The output map identifies travel time to areas in County Y only, and might be used to compute average travel time to points in that county in

a subsequent step.

Page 36: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Spatial Interpolation

Values of a field have been measured at a number of sample points

There is a need to estimate the complete field

to estimate values at points where the field was not measured

to create a contour map by drawing isolines between the data points

Methods of spatial interpolation are designed to solve this problem

Page 37: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Spatial Interpolation

Thiessen polygons (define individual areas of influence around each of a set of points. They are polygons whose boundaries define the area that is closest to each point relative to all other points, defined by the perpendicular bisectors of the lines between all points.

Page 38: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005
Page 39: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Inverse Distance Weighting (IDW)

The unknown value of a field at a point is estimated by taking an average over the known values

weighting each known value by its distance from the point, giving greatest weight to the nearest points

an implementation of Tobler’s Law

Page 40: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

point iknown value zi

location xi

weight wi distance di

unknown value (to be interpolated)location x

i

ii

ii wzwz )(x

21 ii dw

The estimate is a weighted average

Weights decline with distance

Page 41: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Issues with IDW

The range of interpolated values cannot exceed the range of observed values

it is important to position sample points to include the extremes of the field

this can be very difficult

Page 42: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

A Potentially Undesirable Characteristic of IDW interpolation This set of six

data points clearly suggests

a hill profile (dashed line). But in areas

where there is little or no data the interpolator

will move towards the overall mean (solid line).

Page 43: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Kriging

A technique of spatial interpolation firmly grounded in geostatistical theoryKriging is based on the assumption that the parameter being interpolated can be treated as a regionalized variable (intermediate between a truly random and a completely deterministic variable) Points near each other have a certain degree of spatial autocorrelation, and points that are widely separate are statistically independent. Kriging is a set of linear regression routines which minimize estimation variance from a predefined covariance model.

Page 44: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

A semivariogram. Each cross represents a pair of points. The solid circles are obtained by averaging within the ranges or bins of the distance axis. The solid line represents the best fit to these five points, using one of a small number of

standard mathematical functions.

Page 45: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Stages of Kriging

Analyze observed data to estimate a semivariogram

Estimate values at unknown points as weighted averages

obtaining weights based on the semivariogram

the interpolated surface replicates statistical properties of the semivariogram

Page 46: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Density Estimation and Potential

Spatial interpolation is used to fill the gaps in a field

Density estimation creates a field from discrete objects

the field’s value at any point is an estimate of the density of discrete objects at that point

e.g., estimating a map of population density (a field) from a map of individual people (discrete objects)

Page 47: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Kernel Function

Each discrete object is replaced by a mathematical function known as a kernel

Kernels are summed to obtain a composite surface of density

The smoothness of the resulting field depends on the width of the kernel

narrow kernels produce bumpy surfaces

wide kernels produce smooth surfaces

Page 48: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

A typical kernel function

The result of applying a 150km-wide kernel to points distributed

over California

Page 49: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

When the kernel width is too small (in this case 16km, using only the S California part of the database) the

surface is too rugged, and each point generates its own peak.

Page 50: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Other types of spatial analysis

Data mining

Descriptive summaries

Optimization

Hypothesis testing

Page 51: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Data Mining

Analysis of massive data sets in search for patterns, anomalies, and trends

spatial analysis applied on a large scale

must be semi-automated because of data volumes

widely used in practice, e.g. to detect unusual patterns in credit card use

Page 52: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Descriptive Summaries

Attempt to summarize useful properties of data sets in one or two statistics

The mean or average is widely used to summarize data

centers are the spatial equivalent

there are several ways of defining centers

Page 53: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Centroid

Found for a point set by taking the weighted average of coordinates

The balance point

Page 54: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

The Histogram

A useful summary of the values of an attribute

showing the relative frequencies of different values

A histogram view can be linked to other views

e.g., click on a bar in the histogram view and objects with attributes in that range are highlighted in a linked map view

Page 55: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

A histogram or bar graph, showing the relative frequencies of values of a selected attribute. The attribute is the length of street between intersections. Lengths of around 100m are commonest.

Page 56: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Spatial Dependence

There are many ways of measuring this very important summary propertyMost methods have been developed for pointsPatterns can be random, clustered, or dispersedMeasures differ for unlabeled and labeled features (e.g. individual house locations, versus housing types)

Page 57: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Dispersion

A measure of the spread of points around a center (“standard deviation”)

Related to the width of the kernel used in density estimation

Page 58: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Fragmentation Statistics

Measure the patchiness of data setse.g., of vegetation cover in an area

Useful in landscape ecology, because of the importance of habitat fragmentation in determining the success of animal and bird populations

populations are less likely to survive in highly fragmented landscapes

Page 59: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Three images of part of the state of Rondonia in Brazil,

for 1975, 1986, and 1992. Note the increasing

fragmentation of the natural habitat as a result of

settlement. Such fragmentation can adversely affect the success of wildlife

populations.

Page 60: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Optimization

Spatial analysis can be used to solve many problems of design or create improved design (minimizing distance traveled or construction costs, maximizing profit)

A spatial decision support system (SDSS) is an adaptation of GIS aimed at solving a particular design problem

Page 61: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Optimizing Point Locations

The minimum aggregate travel (MAT) is a simple case: one location and the goal of minimizing total distance traveled to get there

The operator of a chain of convenience stores (e.g. Seven Eleven) might want to solve for many locations at once

where are the best locations to add new stores?

which existing stores should be dropped?

Page 62: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Routing Problems

Search for optimum routes among several destinations

The traveling salesperson problemfind the shortest tour from an origin, through a set of destinations, and back to the origin

Page 63: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Routing service technicians for Schindler Elevator. Every day this company’s service crews must visit a

different set of locations in Los Angeles. GIS is used to partition the day’s workload among the crews and trucks (color coding) and to optimize the route to minimize time

and cost.

Page 64: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Optimum Paths

Find the best path across a continuous cost surface

between defined origin and destination

to minimize total cost

cost may combine construction, environmental impact, land acquisition, and operating cost

used to locate highways, power lines, pipelines

requires a raster representation

Page 65: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Solution of a least-cost path problem. The white

line represents the optimum solution, or path of least total cost, across a friction surface represented

as a raster. The area is dominated by a mountain

range, and cost is determined by elevation

and slope. The best route uses a narrow pass

through the range. The blue line results from

solving the same problem using a coarser raster.

Page 66: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Hypothesis Testing

Hypothesis testing is a recognized branch of statisticsA sample is analyzed, and inferences are made about the population from which the sample was drawnThe sample must normally be drawn randomly and independently from the population

Page 67: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Hypothesis Testing with Spatial Data

Frequently the data represent all that are available

e.g., all of the census tracts of Los Angeles

It is consequently difficult to think of such data as a random sample of anything

not a random sample of all census tracts

Tobler’s Law guarantees that independence is problematic

unless samples are drawn very far apart

Page 68: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Possible Approaches to Inference

Treat the data as one of a very large number of possible spatial arrangements

useful for testing for significant spatial patterns

Discard data until cases are independentno one likes to discard data

Use models that account directly for spatial dependenceBe content with descriptions and avoid inference

Page 69: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005

Summary

All methods of spatial analysis work best in the context of a collaboration between human and machine. One benefit of the machine is that it sometimes serves to correct any misleading aspects of human intuition. (Human can be poor at guessing the answers to optimization problems in space.)

Page 70: Geographic Information Systems (GIS): Spatial Analysis November 1, 2005