Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

Carsten Keßler a,b and René de Groot a a Institute for Geoinformatics, University of Münster | b soon: Hunter College, CUNY

http://carsten.io | @carstenkessler

Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

http://carsten.io

http://carsten.io

The Idea

‣ Develop a measure to assess the degree to which a data consumer can trust the quality of a feature

The Idea


‣ Trust measure is based on a feature’s editing history

The Idea


‣ Trust measure is based on a feature’s editing history

‣ Benefits‣ Works at feature level‣ Filter features by quality‣ Spot problematic features

Does this work?

Can we reliably assess the quality of a feature in OpenStreetMap based on its editing history?

Does this work?


amenity = universityname = Institute for Geoinformatics

v1

Does this work?



amenity = universitybuilding = yesname = Institute for Geoinformatics

v1 v2

http://wiki.openstreetmap.org/wiki/DE:Key:amenity?uselang=de


http://wiki.openstreetmap.org/wiki/Tag:amenity=university?uselang=de


http://wiki.openstreetmap.org/wiki/DE:Key:building?uselang=de


http://wiki.openstreetmap.org/wiki/DE:Key:name?uselang=de




Does this work?



amenity = universitybuilding = yesname = Institute for Geoinformatics

addr:city = Münsteraddr:country = DEaddr:housenumber = 253addr:street = Weseler Straßebuilding = yeswheelchair = limited

v1 v2 v3 …











http://wiki.openstreetmap.org/wiki/DE:Key:wheelchair?uselang=de

http://wiki.openstreetmap.org/wiki/DE:Key:wheelchair?uselang=de

OSM Heatmap Kudos: Johannes Trame

OSM Provenance Ontology

http://carsten.io/osm/osm-provenance.rdf

prv:Tag

includesEdit

Changeset prv:CreationGuideline

Edit

prv:createdBy

prv:precededBy

prv:usedData

NodeState

WayState

prv:DataCreation User

prv:performedBy

changesGeometryaddsTag

removesTag

changesValueOfKey

rdfs:Literal

prv:DataItem

prv:HumanActor

subClassOfhasTagFeatureState



Does this work?

‣ Get a first idea whether this is a viable approach‣ Compare results of

‣ a simple trust measure and‣ observed feature quality

‣ Is there a correlation between the two?

Study area:Münster’s old town

Feature Selection

Feature Selection

‣ Re-mapping the whole district was not feasible

Feature Selection

‣ Re-mapping the whole district was not feasible‣ Up to 100 features were manageable

Feature Selection

‣ Re-mapping the whole district was not feasible‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions

Feature Selection

‣ Re-mapping the whole district was not feasible‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions‣ 74 features with 6+ versions

74 features selected

Trust measure

Trust measure

‣ Positive factors:‣ Versions ‣ Users ‣ Indirect confirmations =

edits in the direct vicinity (50m)

Trust measure

‣ Positive factors:‣ Versions ‣ Users ‣ Indirect confirmations =

edits in the direct vicinity (50m)

‣ Negative factors:‣ Tag corrections‣ Rollbacks

Trust measure (contd.)

‣ Classification for each factor: 5 equal classes‣ Combined into one classification‣ Equal weights

Trust measure

Field Survey

‣ Thematic accuracy 4 classes:

1. Main tag wrong

2. Other tags wrong

3. Thematic ambiguities

4. Thematically correct

Field Survey

‣ Thematic accuracy 4 classes:

1. Main tag wrong

2. Other tags wrong

3. Thematic ambiguities

4. Thematically correct

‣ Results:

‣ 6 features (~8%) ‣ 2 features (~3%)‣ 9 features (~12%) ‣ 57 features (~77%)

Field Survey (contd.)

‣ Topological consistency


‣ Topological consistency‣ Is the feature correctly

positioned relative to the surrounding features?




‣ Results:‣ 73 out of 74 features (~99%)





‣ Information completeness‣ TF-IDF measure to identify

relevant tags per main tag





‣ Information completeness‣ TF-IDF measure to identify

relevant tags per main tag

‣ ~37% tags missing (avg.)

Observed quality: combined results

Trust measure

mean quality class: ~4.2

mean trust class: ~2.8

Do we get the trend right?

Do we get the trend right?

‣ Removed outliers‣ Kendall’s τ: 0.52 ‣ Moderate, but significant

positive correlation

Conclusions

Conclusions

‣ Initial study

Conclusions

‣ Initial study‣ A feature’s history can determine its trustworthiness

Conclusions

‣ Initial study‣ A feature’s history can determine its trustworthiness‣ Trust values correlate with observed quality

Conclusions

‣ Initial study‣ A feature’s history can determine its trustworthiness‣ Trust values correlate with observed quality‣ Even with a very simple model

Conclusions

‣ Initial study‣ A feature’s history can determine its trustworthiness‣ Trust values correlate with observed quality‣ Even with a very simple model‣ Outliers cannot be explained yet

Tons of Future Work

Tons of Future Work

‣ Extend and refine the trust model:Classification, weighting, positive vs negative aspects, …

Tons of Future Work


‣ Social aspects: Who has edited a feature?

Tons of Future Work


‣ Social aspects: Who has edited a feature?‣ Repeat study without spatial focus

Tons of Future Work


‣ Social aspects: Who has edited a feature?‣ Repeat study without spatial focus ‣ How to scale the data collection?

Tons of Future Work


‣ Social aspects: Who has edited a feature?‣ Repeat study without spatial focus ‣ How to scale the data collection?‣ Learn the trust model from the data

Thank you!

All data used in this research © OpenStreetMap contributors.

[email protected] | http://carsten.io | @carstenkessler

Carsten Keßler | René de Groot

http://openstreetmap.org/

http://openstreetmap.org/

mailto:[email protected]

mailto:[email protected]

http://carsten.io

http://carsten.io

Technology

Trust as a Proxy Measure for the Quality of VGI in the Case of OSM