30
1 Data Quality and Error Analysis in GIS Joshua Greenfeld, PhD, LS Professor emeritus, NJIT Professor, Israel Institute of Technology Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 1 ABSTRACT One of the major challenges of GIS is dealing with the uncertainty and the assessment of the quality of spatial information. The challenge is to assess the quality of spatial information not just the quality of spatial data. Many professionals are involved in providing GIS services. Surveying is only one of them. For surveying to make a mark on the GIS industry and become a prominent stake holder of GIS, it has to offer some expertise that most other professionals cannot. Unfortunately, the ability to collect spatial data is becoming a common skill and the surveyors positioning expertise is not as unique as it used to be. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 2 ABSTRACT There is one area that surveyors have an advantage over other GIS professionals is their propensity and ability to understand and quantify spatial errors and accuracies. In surveying, the uncertainty and quality assessment is mostly confined to positioning or positional accuracies. The quality of surveying results is typically assessed on the basis of measurement accuracy and the propagation of these accuracies into other computed quantities. In GIS uncertainty and quality issues are much more broad. In addition to positional accuracy there is: attribute accuracy, completeness of the data, sources and lineage of the data, logical consistency, fuzziness of the spatial phenomenon, currency of the data and other uncertainty issues. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 3 Objective The objective of this seminar is to enable surveyors to understand the broader issues of accuracy assessment beyond positional accuracies. It will outline the extended definition of uncertainty and quality as it applies to GIS. It will include an overview on the errors and uncertainties that could impact the quality of spatial data. This will be followed by discussing the impact of errors in spatial data on spatial information. The ISO geospatial standards will be reviewed as well. Finally, some practical tools and examples of numerical and statistical assessment of uncertainty and quality of spatial information will be discussed and demonstrated. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 4

ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

1

Data Quality

and

Error Analysis in GIS

Joshua Greenfeld, PhD, LS Professor emeritus, NJIT

Professor, Israel Institute of Technology

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 1

ABSTRACT

One of the major challenges of GIS is dealing with the

uncertainty and the assessment of the quality of spatial

information.

The challenge is to assess the quality of spatial

information not just the quality of spatial data.

Many professionals are involved in providing GIS

services. Surveying is only one of them.

For surveying to make a mark on the GIS industry and

become a prominent stake holder of GIS, it has to offer

some expertise that most other professionals cannot.

Unfortunately, the ability to collect spatial data is becoming

a common skill and the surveyors positioning expertise is

not as unique as it used to be. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 2

ABSTRACT

There is one area that surveyors have an advantage over

other GIS professionals is their propensity and ability to

understand and quantify spatial errors and accuracies.

In surveying, the uncertainty and quality assessment is

mostly confined to positioning or positional accuracies.

The quality of surveying results is typically assessed on the

basis of measurement accuracy and the propagation of

these accuracies into other computed quantities.

In GIS uncertainty and quality issues are much more

broad. In addition to positional accuracy there is:

attribute accuracy, completeness of the data, sources and

lineage of the data, logical consistency, fuzziness of the

spatial phenomenon, currency of the data and other

uncertainty issues. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 3

Objective

The objective of this seminar is to enable surveyors to

understand the broader issues of accuracy assessment

beyond positional accuracies.

It will outline the extended definition of uncertainty and

quality as it applies to GIS.

It will include an overview on the errors and uncertainties

that could impact the quality of spatial data.

This will be followed by discussing the impact of errors in

spatial data on spatial information.

The ISO geospatial standards will be reviewed as well.

Finally, some practical tools and examples of numerical

and statistical assessment of uncertainty and quality of

spatial information will be discussed and demonstrated. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 4

Page 2: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

2

Importance of Quality

Gain confidence in geodata

Reduce users‘ complaints

Get customer’s satisfaction

Minimize consecutive costs caused by decisions

or actions based on erroneous data

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 5

No unified definition of data quality

1. Data Quality refers to the degree of excellence

exhibited by the data in relation to the portrayal of the

actual phenomena. GIS Glossary

2. The state of completeness, validity, consistency,

timeliness and accuracy that makes data appropriate

for a specific use. Government of British Columbia

3. The totality of features and characteristics of data

that bears on their ability to satisfy a given purpose; the

sum of the degrees of excellence for factors related to

data. Glossary of Quality Assurance Terms

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 6

No unified definition of data quality

4. Information Quality : the fitness for use of

information; information that meets the requirements of

its authors, users, and administrators. (Martin Eppler)

5. Data quality: The processes and technologies

involved in ensuring the conformance of data values to

business requirements and acceptance criteria

6.ISO/PAS 26183:2006 defines product data quality as

a measure of the accuracy and appropriateness of

product data, combined with the timeliness with which

those data are provided to all the people who need

them.

And more……

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 7

Error and Uncertainty in GIS

• One of the major problems currently existing within GIS is

the aura of accuracy surrounding digital geographic data

• Often hardcopy map sources include a map reliability rating

or confidence rating in the map legend

• This rating helps the user in determining the fitness for use

for the map

• However, rarely is this information encoded in the digital

conversion process

• Often because GIS data is in digital form and can be

represented with a high precision it is considered to be

totally accurate Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 8

Page 3: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

3

Error and Uncertainty in GIS

• In reality, a buffer exists around each feature which

represents the actual positional location of the feature

• For example, data captured at the 1:20,000 scale

commonly has a positional accuracy of ± 20 metres

• This means the actual location of features may vary 20

metres in either direction from the identified position of the

feature on the map

• Considering that the use of GIS commonly involves the

integration of several data sets, usually at different scales

and quality, one can easily see how errors can be

propagated during processing Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 9

Error and Uncertainty in GIS

• The ease with which geographic data in a GIS can be

used at any scale highlights the importance of

detailed data quality information.

• Although a data set may not have a specific scale

once it is loaded into the GIS database, it was

produced with levels of accuracy and resolution that

make it appropriate for use only at certain scales, and

in combination with data of similar scales.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 10

Error and Uncertainty in GIS

• Error - Two sources of error:

Inherent and Operational

• Inherent error is the error present in source

documents and data

• Operational error is the amount of error produced through the data capture and manipulation functions of a GIS

• Both contribute to the reduction in quality of the

products that are generated by GIS. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 11

Error and Uncertainty in GIS

• Possible sources of operational errors include :

• Mislabelling of areas on thematic maps • Misplacement of horizontal (positional)

boundaries • Human error in digitizing classification error • GIS algorithm inaccuracies • human bias

• While error will always exist in any scientific process,

the aim within GIS processing should be to identify existing error in data sources and minimize the amount of error added during processing

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 12

Page 4: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

4

Errors in Database Creation

Errors are introduced at almost every step of database

creation

Concerns the degree to which the data exhausts the

universe of possible items

Are all possible objects included within the

database?

Affected by rules of selection, generalization and

scale

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 13

Error and Uncertainty in GIS

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 14

Error induced by data cleaning, Longley et al., chapter 6, pages 132-133

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 15

Merging. Longley et al., chapter 6, pages 132-133

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 16

Page 5: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

5

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 17

classification error -- difference in pixel class between the map and a

reference

1939

1956

1971

1995

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 19

Error and Uncertainty in GIS

• Because of cost constraints it is often more appropriate to

manage error than attempt to eliminate it!

• There is a trade-off between reducing the level of error in a

data base and the cost to create and maintain the

database

• An awareness of the error status of different data sets will

allow user to make a subjective statement on the quality

and reliability of a product derived from GIS processing

• The validity of any decisions based on a GIS product is

directly related to the quality and reliability rating of the

product Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 20

Page 6: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

6

Error and Uncertainty in GIS

• Depending upon the level of error inherent in the source

data, and the error operationally produced through data

capture and manipulation, GIS products may possess

significant amounts of error

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 21

Error and Uncertainty in GIS

Tools to get a handle on uncertainty

Models of uncertainty: methods for assessing and

describing error

Error propagation (during analysis)

Fuzzy approaches (membership of classes)

Sensitivity analysis (effect of errors)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 22

Error and Uncertainty in GIS

Error assessment, reporting, interpretation - more difficult

Quality of data: standards and metadata

But: No professional GIS currently in use can present the

user with information about the confidence limits that

should be associated with the results of an analysis.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 23

Classification of Errors in GIS [Hun ‘92]

Resulting in

Forms of Error

Source of Error Data Collection and Compilation

Data Processing

Data Usage

Positional Error

Logical Error

Attribute Error Completeness

(Primary) (Secondary)

Final Product Errors

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 24

Page 7: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

7

Uncertainty

Uncertainties in geographic information originate from

different sources:

Uncertainty due to the inherent nature of geography:

different interpretations can be equally valid;

Cartographic uncertainty resulting in positional and

attribute errors;

Conceptual uncertainty as a result of differences in

“what it is that is being mapped”.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 25

Uncertainty (Definition of a Forest)

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50 60 70 80 90

Tre

e H

eig

ht

(m)

Canopy Coverage (%)

Portugal

Mexico

U.S. Israel

Belgium Malaysia

UN

Turkey

Estonia

Switzerland

Somalia New Zealand

UNESCO Australia Japan

Denmark

Morocco

Kenya

Zimbabwe

Sudan

Tanzania

Ethiopia

South Africa

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 26

Internal and External Data Quality

internal quality - Corresponds to the level of similarity that

exists between “perfect” data to be produced (what is

called “nominal ground”) and the data actually produced

external quality - Corresponds to the similarity between

the data produced and user needs

Data that should have

been produced

Data produced

User needs 1

User needs 2

User needs n

Internal

Quality

External

Quality 2

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 27

Characteristics to define the

internal quality – Completeness: presence and absence of features,

their attributes and relationships.

– Logical consistency: degree of adherence to logical

rules of data structure, attribution, and relationships (data

structure can be conceptual, logical or physical).

– Positional accuracy: accuracy of the position of

features.

– Temporal accuracy: accuracy of the temporal

attributes and temporal relationships of features.

– Thematic accuracy: accuracy of quantitative attributes

and the correctness of non-quantitative attributes and of

the classifications of features and their relationships. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 28

Page 8: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

8

six characteristics to define the

external quality (Beard and Vallière)

– Definition: to evaluate whether the exact nature of a

data and the object that it describes, that is, the “what”,

corresponds to user needs (semantic, spatial and

temporal definitions).

– Coverage: to evaluate whether the territory and the

period for which the data exists, that is, the “where” and

the “when”, meet user needs.

– Lineage: to find out where data come from, their

acquisition objectives, the methods used to obtain them,

that is, the “how” and the “why”, and to see whether the

data meet user needs.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 29

six characteristics to define the

external quality (Beard and Vallière)

– Precision: to evaluate what data is worth and whether

it is acceptable for an expressed need (semantic,

temporal, and spatial precision of the object and its

attributes).

– Legitimacy: to evaluate the official recognition and the

legal scope of data and whether they meet the needs of

de facto standards, respect recognized standards, have

legal or administrative recognition by an official body, or

legal guarantee by a supplier, etc.;

– Accessibility: to evaluate the ease with which the user

can obtain the data analyzed (cost, time frame, format,

confidentiality, respect of recognized standards,

copyright, etc.). Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 30

Conceptual model of

uncertainty in spatial data Uncertainty

Poorly Defined Objects

Well Defined Objects

Error Vagueness

Probability Fuzzy Set

Theory

Ambiguity

Discord Non-Specifity

Expert Opinion Dempster Schafer

Endorsement Theory, Fuzzy Set Theory

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 31

Definitions of geographic objects

An examples of well-defined geographical objects is

land ownership. The boundary between land parcels is

commonly marked on the ground, and shows an abrupt

and total change in ownership

Examples of poorly defined geographical objects are

the rule in natural resource mapping. The

conceptualization of mappable phenomena and the

spaces they occupy is rarely clear-cut

There are rarely sharp transitions from one vegetation

type to another

In a region there could be several types of vegetation

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 32

Page 9: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

9

Five dimensions of objects A and B

Relation

Scale

Space

Time

Attribute

B

A

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 33

Error

Ideally, if an object is conceptualized as being definable

in both attribute and spatial dimensions, then it has a

Boolean occurrence; any location is either part of the

object, or it is not.

Within GIS, for a number of reasons, a location or the

assignment of an object to a location or to the a class

may be expressed as a probability.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 34

Common reasons for a

database being in error

Type of Error Cause of error

Measurement Measurement of a property is erroneous.

Assignment The object is assigned to the wrong class

because of measurement error by the

scientist in either the field or laboratory or by

the surveyor.

Class

Generalization

Following observation in the field, and for

reasons of simplicity, the object is grouped

with objects possessing somewhat dissimilar

properties.

Spatial

Generalization

Generalization of the cartographic

representation of the object before digitizing,

including displacement, simplification, etc. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 35

Common reasons for a

database being in error

Type of Error Cause of error

Entry Data are miscoded during (electronic or

manual) entry in a GIS.

Temporal The object changes character between the

time of data collection and the time of

database use.

Processing In the course of data transformations an

error arises because of rounding or

algorithm error.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 36

Page 10: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

10

Vagueness

Sorites Paradox (is a bald man with an additional 1

hair still bald?

When, exactly, is a house a house; a settlement, a

settlement; a city a city; an oak woodland, an oak

woodland?

The questions always revolve around the threshold

value of some measurable parameter or the opinion

of some individual, expert or otherwise.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 37

Vagueness

Fuzzy-set theory is an alternative to Boolean sets.

Membership of an object in a Boolean set is

absolute, and defined by one of two integer values

{0,1}.

Membership of a fuzzy set is defined by a real

number in the range [0,1]. Membership or non-

membership of the set is identified by the terminal

values, while all intervening values define an

intermediate degree of belonging to the set (a

membership of 0.25 reflects a smaller degree of

belonging to the set than a membership of 0.5.)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 38

Ambiguity

Ambiguity occurs when there is doubt as to how a

phenomenon should be classified because of differing

perceptions of that phenomenon.

There are two types of ambiguity:

Discord – different definitions and interpretation of the

same piece of land. (not a problem of a single

classification but of multiple mapping of the same area)

in the defining of soil, for example, many countries

have slightly different definitions of what constitutes a

soil, names for soils and the spatial and attribute

boundaries between soil types.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 39

Definition of a Forest

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50 60 70 80 90

Tre

e H

eig

ht

(m)

Canopy Coverage (%)

Portugal

Mexico

U.S. Israel

Belgium Malaysia

UN

Turkey

Estonia

Switzerland

Somalia New Zealand

UNESCO Australia Japan

Denmark

Morocco

Kenya

Zimbabwe

Sudan

Tanzania

Ethiopia

South Africa

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 40

Page 11: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

11

Discord Ambiguity

a) There is more class1 than

class2

b) The “zone of transition” between

classes 1 and 2 is represented

by a mosaic of class1-&-class2 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 41

Discord Ambiguity

c) the whole area is allocated

into a class1-&-class2

mosaic

d) the two distinct areas of class1

and class2 are separated by two

mosaics of class1-&-class2 and

class2-&-class1 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 42

Discord Ambiguity

Some solutions for the problem of discord include:

Use of expert look-up tables and producer-supplied

metadata to compare classifications. This is an artificial

intelligence based solution.

Use personal (expert) judgment to compare

classification and phenomenon changes over a longer

period. This solution makes extensive use of rough and

fuzzy sets to accommodate the uncertainty in the

correspondence of classes.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 43

Non-specificity Ambiguity

Ambiguity through non-specificity can be illustrated by

geographical relationships.

The relation “A is north of B” is itself non-specific

because it can mean:

A lies on exactly the same line of longitude and

towards the north pole from B;

A lies somewhere to the north of a line running east to

west through B

A lies between perhaps north-east and north-west, but

is most likely to lie in the sector between north-north-

east and north-north-west of B.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 44

Page 12: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

12

Non-specificity Ambiguity

The first two definitions are precise and specific, the third

is the natural language concept, which is itself vague.

Any lack of definition as to which should be used means

that uncertainty arises in the interpretation of “north of”.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 45

Uncertainty

Attribute uncertainty (Forest vs. Ag)

Positional uncertainty

Definitional uncertainty

Measurement uncertainty

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 46

The Necessity of “Fuzziness”

“It’s not easy to lie with maps, it’s essential...to present

a useful and truthful picture, an accurate map must tell

white lies.” -- Mark Monmonier

distort 3-D world into 2-D abstraction

characterize most important aspects of spatial reality

portray abstractions (e.g., gradients, contours) as

distinct spatial objects

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 47

Fuzziness (cont.)

All GIS subject to uncertainty

What the data tell us about the real world

Range of possible “truths”

Uncertainty affects results of analysis

Confidence limits - “plus or minus”

Difficult to determine

“If it comes from a computer it must be right”

“If it has lots of decimal places, it must be accurate”

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 48

Page 13: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

13

Method for determination

conformance quality levels Assumptions

The more errors are in a dataset, the higher the likelihood of applying erroneous data for decisions or actions

Each false decision or action leads to consecutive costs

• costs on finding the right answer

• costs due to damages caused by false information e.g. by hitting a pipeline which was documented at a different location

• hidden costs by loosing confidence of the user community

• hidden costs due to image loss by the customer

A dataset is never completely free of errors

The effort to gain a certain quality level costs time and money

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 49

Data quality

Data Quality

Lineage

Accuracy Positional

Attribute

Completeness

Logical Consistency

Semantic Accuracy

Currency

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 50

Positional accuracy (2D example)

We distinguish between point objects, line objects, and area

objects.

For a point object, with (x ±σx, y ±σy) coordinates

The values of σx and σy may be known from:

– previous studies

– specifications

– derived from the collected data

The point positional accuracy is then PAP = 𝜎x2 + 𝜎y

2

sx

sy

(x,y)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 51

Positional accuracy (2D example)

A simple approximation for Lines and Area objects with n

points is: PAL,A = 𝑛(𝜎x2 + 𝜎y

2

sx

sy sx

sy

sx

sy

sx

sy sx

sy

sx

sy

sx

sy

(x1,y1)

(x4,y4)

(x3,y3) (x2,y2)

(x1,y1) (x3,y3) (x2,y2)

Note: The size of each error could be different

Line

Polygon

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 52

Page 14: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

14

Error Propagation or

Propagation of Random Errors

Definition:

Given independent variables each with an

uncertainty, error propagation is the method

of determining an uncertainty in a function of

these variables.

Computed errors Measurement or given Errors

E x , E y Angular and distance

E area Coordinates

E vol Distance and elevation Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 53

Error propagation is a way of combining two

or more random errors together to get a third.

It can be used when you need to measure

more than one quantity to get at your final

result. For example, an angle and a distance

to compute coordinates

Error propagation can also be used to

combine several independent sources of

random error on the same measurement.

Error Propagation

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 54

In General matrix equation Σzz = A Σxx AT

Error Propagation

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 55

Derivation of formulas.

Suppose that x is a measured quantity and y is computed

from

y = ax + b

If we knew xt is the true value of x, we could compute yt

yt = axt + b

The measured value of x has an error of dx or

x = xt + dx.

Thus y = a(xt + dx) + b = axt + b + a dx

y = yt + a dx

dy = a dx

x

ya

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 56

Page 15: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

15

A general formula (assuming independence

or no correlation)

22

3

2

2

2

1

)()()()(321 nx

n

xxxyx

y

x

y

x

y

x

ys

s

s

s

s

Error Propagation

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 57

Random error of a sum

If y = x1 + x2 + x3 + . . . + xn

Then

2222

321 nxxxxy sssss

A leveling loop was measured with the following accuracies: DH1 = 12.34 ±0.01 DH2 = -8.72 ±0.02 DH3 = 4.93 ±0.005 DH4 = -8.53 ±0.01

The closure is 0.02

The accuracy is of the loop:

0.012+0.022+0.0052+0.012 =0.025

Error Propagation Examples:

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 58

Random error of a series

If y = x1 + x2 + x3 + . . . + xn and

Then

n x x x x s s s s

3 2 1

xy n ss

0.012+0.012+0.012+0.012 = 4 x 0.01 = 0.02

Example

Error Propagation Examples:

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 59

Random error of area

A = a b

2222

baA ab sss

sA = 802 x 0.022 + 1002 x 0.022 = 2.56’

Example

The sides of an 80’x100’ rectangle lot was measured with an accuracy of ±0.02’. What is the accuracy of the area of the lot?

Error Propagation Examples:

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 60

Page 16: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

16

Error Propagation of Azimuth and

Distance to coordinates (x,y)

ABAB AZDXX sin

ABAB AZDYY cos

2

22222

206265)cos()(sin AZ

ABDABXX AZDAZAB

ssss

2

22222

206265)sin()(cos AZ

ABDABYY AZDAZAB

ssss

A

B

D

AZ

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 61

2222 )()( YXYYXXD ABAB DD

Y

X

YY

XXAZ

AB

AB

D

D

11 tantan

)()(1 222222

BABA YYXXD YXD

sssss DD

)()(1 222222

2 BABA YYXXAZ XYD

sssss DD

Error Propagation of coordinates (x,y) to Azimuth and Distance

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 62

Error Propagation of coordinates

to area of a closed polygon

)(2

111 iii yyxA

])[(])[(2

1 22

11

22

11 ii yiixiiA xxyy sss

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 63

A x y yi i i

1

21 1( ) s s sA i i xi i i yi

y y x x

1

21 1

2 21 1

2 2[( ) ] [( ) ]

AREA=

622473.45 sa=

30.81

A 10000.00

10000.00

B

9600.04 10599.96

-1300.01

-799.95

-7679521.06

78.76

468.01

C

8699.99 10799.95

-500.06

649.94

5654467.30

216.58

116.81

D

9099.98

9950.02 1300.01

799.95

7279498.42

570.06

1457.26

A 10000.00

10000.00

500.06

-649.94

Point

X

Y

Xi+1 -Xi-1

Yi-1 -Yi+1

Xi (Yi-1 -Yi+1)

[sx(Yi-1 -Yi+1)]2

[sy(Xi+1 -Xi-1)]

2

-6499391.57

545.43

344.47

B

9600.04 10599.96

-1244946.91

1410.83

2386.55

S

Error Propagation of coordinates

to area of a closed polygon

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 64

Page 17: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

17

POSITIONAL ACCURACY

defined as the closeness of locational information

(usually coordinates) to the true position

How to test positional accuracy?

use an independent source of higher accuracy (e.g. GPS

or raw survey data)

use internal evidence

unclosed polygons, lines which overshoot or

undershoot junctions, are indications of inaccuracy -

the sizes of gaps, overshoots and undershoots may

be used as a measure of positional accuracy

compute accuracy from knowledge of the errors

introduced by different sources using error propagation Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 65

The National Standard for

Spatial Data Accuracy (NSSDA)

A well-defined statistic and testing methodology for positional accuracy of spatial data.

Applicable to digital and graphic forms (aerial photographs, satellite imagery, and maps)

The standard does not define “pass-fail” accuracy values. (agencies are to set criteria)

Accuracy report

http://www.fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 66

Spatial Accuracy (Horizontal

Accuracy)

Circular error is based on the sample

standard deviation of di, the difference

between the data set coordinate value and

the coordinate value determined by an

independent check survey of higher accuracy

for the same point.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 67

The standard deviation for the horizontal coordinate r is:

1

)( 2

n

ddi

rs

Where:

22

iii yxr ii checkdatai rrd

n

dd

i The mean discrepancy

n = total number of points checked

NSSDA horizontal accuracy is:

Accuracyr = 2.4477 * si , (95% confid. level) Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 68

Page 18: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

18

The standard deviation for the z coordinate direction is:

1

)( 2

n

ddi

zs

where:

i ii data checkd z z

n

dd

i The mean discrepancy

n = total number of points checked

NSSDA vertical accuracy is: Accuracyr = 1.96 * si , (95% confidence level)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 69

Well-Defined Points

Small scale Large scale

Road/Rail intersections Center of utility access cover

Small isolated shrubs Sidewalk/curb/gutter intersec.

Corners of structures Monuments

Features that can be identified within 1/3 of the

maximum expected uncertainty for the data set.

Acceptable features

Check survey points should have accuracies within one-third the data sets intended accuracy (95% CL)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 70

Check Point Location (assuming rectangle area)

Spaced at intervals of at least 10% of the diagonal.

At least 20% of the points are located in each quad.

Check points may be distributed more densely in the vicinity

of important features

When data exist for only a portion of the data set, confine

test points to that area.

When the distribution of error is likely to be nonrandom, it

may be desirable to locate check points to correspond to

the error distribution.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 71

Positional Accuracy evaluation

of Othophotos in New Jersey

Point

Accuracy (ft)

1

4.25

2

4.07

3

2.28

4

3.98

5

4.18 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 72

Page 19: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

19

ATTRIBUTE ACCURACY

Defined as the closeness of attribute values to their true value

Note that while location does not change with time, attributes often do

Attribute accuracy must be analyzed in different ways depending on the nature of the data

For continuous attributes (surfaces) such as on a DEM or TIN:

accuracy is expressed as measurement error (e.g. ±1m)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 73

ATTRIBUTE ACCURACY

For categorical attributes such as classified polygons:

Are the categories appropriate, sufficiently detailed and defined?

Is polygon classified as A really A or should be B?

How heterogeneous are the polygon (e.g. 70% A and 30% B

How well are A and B defined (e.g. soils classifications)

center area may be definitely A, but more like B at the edges

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 74

ATTRIBUTE ACCURACY

How to test attribute accuracy?

prepare a misclassification matrix and calculate the degree of correctness

Examples:

The Kappa coefficient

Map Producer’s accuracy

Map User’s accuracy

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 75

The Kappa coefficient

0 AA BBP P P

0

1

e

e

P PKappa

P

Dataset A Dataset B Comparing A to B

A B

A PAA PAB PAr

B PBA PBB PBr

PAc PBc 1

A B

A OAA OAB OAr

B OBA OBB OBr

OAc OBc Σ

e Ac Ar Bc BrP P P P P

O – Observed

P – Percentage

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 76

Page 20: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

20

The Kappa coefficient

00.586 0.283 0.869P

0.869 0.5460.711

1 0.546Kappa

Dataset A Dataset B Comparing A to B

R B

R 0.586 0.061 0.646

B 0.071 0.283 0.354

0.657 0.343 1

R B

R 58 6 64

B 7 28 35

65 34 99

0.657 0.646 0.343 0.354 0.546e

P Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 77

How to interpret Kappa

Kappa is always less than or equal to 1.

A value of 1 implies perfect agreement and values less

than 1 imply less than perfect agreement.

In rare situations, Kappa can be negative. This is a sign

that the two observers agreed less than would be

expected just by chance.

A possible interpretation of Kappa. The agreement is:

0.0 0.2 0.4 0.6 0.8 1.0

Poor Fair Moderate Good Very good

Kappa Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 78

Assume we have a 9 cell land cover map, one from 1980 and one from 2000 with three categories: A, B, and C.

The cross tabulation can be quantified into a matrix oftentimes called a confusion matrix

Other Accuracy Assessment

A B C

A

B

C

1980 LC 2000 LC Cross Tabulated Grid

A B A

B C C

A A B

B B A

B B C

B A C

BA BB AA

BB BC CC

BA AA CB

2 0 2

0 2 1

0 1 1

The matrix shows the agreements

between the 1980 and 200 maps. As

an example, 2 cells remained A (AA),

1 cell was C and is now B (CB), etc.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 79

Other Accuracy Assessment

Sum up the rows and columns. But

what do these numbers tell us?

The bottom row tells us that there

were two cells that were A, five B,

and two C.

A B C

A

B

C

2 0 2

0 2 1

0 1 1

4

3

2

2 5 2

The rightmost column tells us that we mapped 4 cells as A, 3 as B, and 2 as C.

Adding up the Diagonal cells says that 5 cells were right.

The overall agreement between maps is:

Σdii /n = 5/9 = 0.55%

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 80

Page 21: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

21

User and Producer Accuracy

The total correspondence of our example is 55%. But,

that only tells us part of the story. What if we were

really interested in classification B? Where there

changes in classification B? Even here, there are two

different ways of interpreting that question:

If I were interested in mapping all the areas of B,

how well did I get them all? This is called the map

Producer’s Accuracy. That is, how well did we

produce a map of classification B.

If I were to use the map to find B, how successful

would I be? This is called the Map User’s Accuracy.

That is, much confidence should a user of the map

have for a given classification. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 81

User and Producer Accuracy

Map user’s accuracy = the total number correct within

a row divide by the total number in the whole row.

Map producer’s accuracy = the total number of

correct within a column divided by the

total number in the whole column.

Example of classification B

Map user’s accuracy = 2/3 = 67%

Map producer’s accuracy = 2/5 = 40%

A B C

A

B

C

2 0 2

0 2 1

0 1 1

4

3

2

2 5 2

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 82

User and Producer Accuracy

How can we use the above results?

This means that if we were to use this map and look

for the classification of B, we would be correct 67% of

the time.

This means that the map produced only 40% of all

the B’s that were out there.

This also gives us some indication of the nature of

the errors. For instance, it appears that we confused

classification A with classification B (we said on two

occasions that B was A). By understanding the

nature of the errors, perhaps we can go back, look

over our process and correct for that mistake. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 83

LOGICAL CONSISTENCY

Refers to the degree of adherence to logical rules of

data structures (conceptual, logical or physical),

attribution and relationships. It includes:

Conceptual consistence; adherence to rules of

conceptual schema

Domain consistency; adherence of values to the value

domain

Format consistency; degree to which data is stored in

accordance to physical structure of the dataset

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 84

Page 22: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

22

LOGICAL CONSISTENCY

Topological consistency; correctness of the explicitly

encoded topological characteristics of a dataset. For

example:

• If there are polygons, do they close?

• Is there exactly one label within each polygon?

• Are there nodes wherever arcs cross, or do arcs

sometimes cross without forming nodes?

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 85

COMPLETENESS

Refers to and absence of features, their attributes and

relationships of spatial data in comparing what is

defined in the data model or what is in the real world.

Error of commission – data presented in a data set that

is not present in the data model or the real world

Error of omission – data that is present in the data

model or the real world is absent in the dataset.

Affected by rules of selection, generalization and scale

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 86

LINEAGE

A record of the data sources and of the operations

which created the database

How was it digitized, from what documents?

When was the data collected?

What agency collected the data?

What steps were used to process the data?

• precision of computational results

Is often a useful indicator of accuracy

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 87

An Example of Data Quality Elements

and Sub-elements for Buildings

Quality

elements

Quality sub-

elements

Description by

examples

Completeness

Commission error Buildings with area less

than 4m2 are presented

in Building Polygon layer

of 1:1000 data set.

Omission error Buildings with area equal

to or larger than 4m2 are

absent from the Building

Polygon layer.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 88

Page 23: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

23

An Example of Data Quality Elements

and Sub-elements for Buildings Quality

elements

Quality sub-

elements

Description by

examples

Positional

accuracy

Horizontal accuracy

RMSE of a building

polygon based on a com- parison of the horizontal coordinates of all the

nodes of its footprints of

a building in GIS with

the corresponding

reference values.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 89

An Example of Data Quality Elements

and Sub-elements for Buildings

Quality

elements

Quality sub-

elements

Description by

examples

Positional

accuracy

Vertical accuracy

RMSE of a building

polygon based on a

comparison of the

vertical coordinates of all

the nodes of its footprints

of a building in GIS with

the corresponding

reference values.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 90

An Example of Data Quality Elements

and Sub-elements for Buildings Quality

elements

Quality sub-

elements Description by examples

Attribute

accuracy

Classification

correctness

Correctness that a building or

related features is correctly

classified as one (or more)

building- related features.

Non-quantitative

attribute

correctness

The Name of a building

polygon may be correct or

wrong in a GIS.

Quantitative

attribute

correctness

The value of the field

"Building Top Level" of a

Building Polygon may be

correct or wrong. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 91

An Example of Data Quality Elements

and Sub-elements for Buildings Quality

elements

Quality sub-

elements Description by examples

Logical

consistency

Conceptual

consistency

A tower is described to be

under its podium.

Domain

consistency

The classification of feature

code for a building polygon is

beyond any of the following

given classes: BR BAR BUP,

IBP, OSP, PWP, TSP.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 92

Page 24: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

24

An Example of Data Quality Elements

and Sub-elements for Buildings Quality

elements

Quality sub-

elements Description by examples

Logical

consistency

Format

consistency

Building names in title case -

Hong Kong Airport- are

consistent, while a name

such as "HONG KONG

Airport" is not consistent in

format.

Topological

consistency

When the outline of a building

polygon is closed, the

topology is consistent; when

the outline is not closed, the

topology is not consistent. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 93

Uncertainties Measured Based on

Various Mathematical Theories Uncertainty

Imprecision Ambiguity Vagueness

Confidence region

model Shi 1994

Entropy Shannon 1948 Hartley’s measure 1928

Discord measure, Confusion measure

and non-specificity measure

U-uncertainty, Fuzzy measure

Fuzzy topology measure

Probability and

statistical theory

Evidence theory

Fuzzy sets, Probability

and Fuzzy topology

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 94

Positional Uncertainty

DEM surface

Uncertainty In spatial analysis

Raster Image

A framework for modeling uncertainties

in spatial data and analysis

Real World

Object

Point

Line

Polygon

3D objects

Uncertain Topology

Uncertainty From

Multi-data source

Field Uncertainty of Remote

Sensing data

Errors in DEM

Positional Uncertainty

Hybrid DEM

Interpolation

Uncertain spatial Query

Geometric Correction and image

fusion

Pro

ce

ssin

g a

nd

the

un

ce

rtain

c

on

trol o

f Sp

atia

l da

ta

Vis

ua

lizatio

n a

nd

the

dis

tribu

tion

of

Un

ce

rtain

ty In

form

atio

n

Real World Data type Classification Of spatial data

Description of Uncertainty

Uncertainty modeling In spatial analysis

and query Control of

Uncertainties Visualization of Uncertainties Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 95

The transformation equation between U,V and X,Y is:

X

U V

Sx

Sy

Sv

Su

t

Y

X

t

t

t

t

V

U

cos

sin

sin

cos

t is rotation angle from Y axis to axis of largest error.

Su is the semi-major axis of ellipse. (Largest error) u

Sv is the semi-minor axis of ellipse. (Least error) v

Sx is the standard deviation in X of coordinate x

Sy is the standard deviation in Y of coordinate y

Error model of point – Error Ellipse

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 96

Page 25: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

25

X

U V Sx

Sy

Sv

Su

t 22

22tan

YX

xy

SS

St

2222

4

)(XY

YX SSS

K

KSS

S YXu

2

222 K

SSS YX

u

2

222

Error model of point – Error Ellipse

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 97

Error model of line - Epsilon band

Assumptions:

1. each error effect relevant to a particular digital line in a

GIS can be treated as a random variable, perturbing the

true line to obtain the observed line.

2. the processes of generating a digital line in a GIS can be

treated as being independent.

The bandwidth is determined from a statistical function of

those positional errors on the line accumulated from the

first stage to the final stage of data capture.

The measured Line

The true Line Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 98

Error model of a polygon

The area S of the polygon is computed from:

The differential of the area is given as:

1 1 1, 1

1 1

1 1[ ( )] [ ]

2 2

n n

i i i i i i

i i

S x y y x y

D

1, 1 1, 1

1

1[ ]

2

n

i i i i i i

i

dS y dx x dy

D D

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 99

Error model of a polygon

For simplicity assume all coordinate accuracies are equal

to σo and covariance is 0 we get:

Where: li-1,i+1 is the distance between points Pi-1 and Pi+1

2 2 2 2 2

1, 1 1, 1 1, 1

1 1

1 1[ ] [ ]

4 4

n n

S i i i i o i i o

i i

y x ls s s

D D

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 100

Page 26: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

26

What is a standard?

Standards are documented agreements containing

technical specifications or other precise criteria to be

used

consistently as rules, guidelines, or definitions of

characteristics, to

ensure that materials, products, processes and

services are fit for their purpose.

(as defined by ISO)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 101

Traffic Signals – Road Signs

VISA / Mastercard: standards allow people to use a single card to obtain cash in the local currency around the world

Commerce/Manufacturing/Industry

World War II - Allied supplies and facilities were severely strained due to the incompatibility of tools, replacements parts, and equipment. The establishment of international standards helped to increase compatibility.

Examples of Everyday Standards

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 102

Disasters (fire, flood, …)

Great Baltimore Fire of 1904 - fire engines from different

regions arrived to help put out the fire, only they had

different hose coupling sizes that did not fit the Baltimore

hydrants - fire burned over 30 hours, resulted in destruction

of 1526 building covering 17 city blocks.

Metric System vs US Customary System

The Importance of Standards (when standards do not exist)

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 103

The Need for Standards in Geographic Information

To ensure common understanding through a common set of

terminology

To promote/enable interoperability

To support the establishment of geospatial infrastructures at

local, regional, and global levels

To promote data and information sharing/exchange

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 104

Page 27: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

27

Types of geospatial standards

Data Classification

e.g., Vegetation Classification

Data Content

e.g., Digital Geospatial Metadata, Spatial Schema

Data Symbology or Presentation

e.g., Digital Geologic Map Symbolization

Data Transfer

Data Usability

e.g., Geospatial Positioning Accuracy

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 105

Evaluating and Reporting Quality Evaluation

Results [ISO 19114]

Dataset as specified by the scope

Identify a data quality measure

Select and apply a data quality evaluation method

Determine the data quality result

Identify an applicable data quality element, data quality subelement,

and data quality scope

Conformance quality level

Determine conformance

Product specification or user requirements

Report data quality result (quantitative)

Report data quality result (pass / fail)

work item

19131

ISO 19113 ISO 19113

ISO

19

11

4

5 step process on quality evaluation

1

2

3

4

5

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 106

Metadata Example

Without…

With…

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 108

Page 28: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

28

Metadata need Example

WQPW- ID DIN Pb

PB-31 .34 .012

HK-14 .12 .023

PB12 35 034

PB-12 .35 .034

WA-3 .28 .001

PB-4 .23 .022

PB-5 .21 .013

HUH?

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 109

The Standard

Metadata has four major roles:

Availability- information needed to determine the

sets of data that exist for a geographic location.

Fitness for use- information needed to determine if a

set of data meets a specific need.

Access- information needed to acquire an identified

set of data.

Transfer- information needed to process and use a

set of data

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 110

Information that can be found in Metadata

• Title, Abstract, Publication Date (Section 1: Identification information) • Data Accuracy and Completeness (Section 2: Data Quality Information) • Data Form: Vector or Raster? (Section 3: Spatial Data Organization Information) • Projection or Geographic Reference System (Section 4: Spatial Reference Information) • What Values Are Associated with Geodata? (Section 5: Entity and Attribute Information) • How Do You Get It? Cost? (Section 6: Distribution Information) • How Current Is the Documentation? (Section 7: Metadata Reference Information) Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 111

The Value of Metadata

Organize and maintain an organization’s investment in data

Provide information to data catalogs and clearinghouses

Provide information to aid data transfer

Food for thought... Nothing happens overnight: get used to thinking of the long term benefits

of metadata. $$$

Documentation = defense

The Standard: don't judge a book by its cover

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 112

Page 29: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

29

Metadata resources

The FGDC Federal Geographic Data Committee: Interagency committee that

coordinates federal geo-data activities.

The Content Standard for Digital Geospatial Metadata (CSDGM)

•The current US Federal Metadata standard

•Often referred to as the 'FGDC Metadata Standard‘

•Has been implemented in federal state and local governments

International Organization of Standards (ISO), has developed and

approved an international metadata standard, ISO 19115 – Geographic

Information Metadata

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 113

Metadata resources

• The objective of this International Standard is to provide a clear

procedure for the description of digital geographic datasets so that users

will be able to determine whether the data in a holding will be of use to

them and how to access the data. By establishing a common set of

metadata terminology, definitions and extension procedures, this

standard will promote the proper use and effective retrieval of geographic

data.

• Supplementary benefits of this standard for metadata are to facilitate the

organization and management of geographic data and to provide

information about an organization’s database to others.

• This standard for the implementation and documentation of metadata

furnishes those unfamiliar with geographic data the appropriate

information to characterize their geographic data and it makes possible

dataset cataloguing enabling data discovery, retrieval and reuse. Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 114

Entity and

Attribute

Informatio

n

Graphical Representation of the:

US Geological Survey Biological Resources Division

DRAFT Content Standard for Biological Metadata

Based on : The Federal Geographic Data Committee’s Content Standard for Digital Geospatial

Metadata June 8, 1994 version 1.0

Prepared by Susan Stitt, Center for Biological Informatics

1. 2. 3. 4. 5. 6. 7.

Identification

Information

Data Quality

Information

Spatial Data

Organization

Information

Spatial

Reference

Informatio

n

Distribution

Information

Metadata

Reference

Information

Mandatory Mandatory

if Applicable

Optional Biological

Items Added

Metadata

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 115

Best Practices for Writing Quality Metadata

Writing Principles

Write simply but completely

Document for a general audience

Adopt a consistent style

Avoid using jargon

Define technical terms

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 116

Page 30: ABSTRACT and - cdn.ymaws.com...2 Importance of Quality No unified Gain confidence in geodata Reduce users‘ complaints Get customer’s satisfaction Minimize consecutive costs caused

30

Best Practices for Writing Quality Metadata

In Practice

State clearly what your data are not

Find, evaluate, and reuse good examples

See examples from FGDC workbook

Mine the Clearinghouse for other examples

Use keywords as indicators of the contents of a dataset

Use a thesaurus or controlled vocabulary when possible

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 117

Best Practices for Writing Quality Metadata

In Practice (continued)

Use subtitles to define and clarify long passages

Quantify assessments wherever possible

Use “None” and “Unknown” carefully

Format date: YYYYMMD

Avoid using confusing symbols & conventions:

! @ # % { } | / \ < > ~

Unnecessary carriage returns, tabs, indents, etc.

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 118

Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 119 Data Quality and Error Analysis in GIS (c) Dr. J. Greenfeld 120