Upload
carsten-kessler
View
244
Download
1
Embed Size (px)
DESCRIPTION
Paper presented at AGILE 2013 in Leuven, Belgium. The paper is available from http://carsten.io/kessler-de_groot-agile-2013.pdf
Citation preview
Carsten Keßler a,b and René de Groot a a Institute for Geoinformatics, University of Münster | b soon: Hunter College, CUNY
http://carsten.io | @carstenkessler
Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
The Idea
‣ Develop a measure to assess the degree to which a data consumer can trust the quality of a feature
The Idea
‣ Develop a measure to assess the degree to which a data consumer can trust the quality of a feature
‣ Trust measure is based on a feature’s editing history
The Idea
‣ Develop a measure to assess the degree to which a data consumer can trust the quality of a feature
‣ Trust measure is based on a feature’s editing history
‣ Benefits‣ Works at feature level‣ Filter features by quality‣ Spot problematic features
Does this work?
Can we reliably assess the quality of a feature in OpenStreetMap based on its editing history?
Does this work?
Can we reliably assess the quality of a feature in OpenStreetMap based on its editing history?
amenity = universityname = Institute for Geoinformatics
v1
Does this work?
Can we reliably assess the quality of a feature in OpenStreetMap based on its editing history?
amenity = universityname = Institute for Geoinformatics
amenity = universitybuilding = yesname = Institute for Geoinformatics
v1 v2
Does this work?
Can we reliably assess the quality of a feature in OpenStreetMap based on its editing history?
amenity = universityname = Institute for Geoinformatics
amenity = universitybuilding = yesname = Institute for Geoinformatics
addr:city = Münsteraddr:country = DEaddr:housenumber = 253addr:street = Weseler Straßebuilding = yeswheelchair = limited
v1 v2 v3 …
OSM Heatmap Kudos: Johannes Trame
OSM Provenance Ontology
http://carsten.io/osm/osm-provenance.rdf
prv:Tag
includesEdit
Changeset prv:CreationGuideline
Edit
prv:createdBy
prv:precededBy
prv:usedData
NodeState
WayState
prv:DataCreation User
prv:performedBy
changesGeometryaddsTag
removesTag
changesValueOfKey
rdfs:Literal
prv:DataItem
prv:HumanActor
subClassOfhasTagFeatureState
Does this work?
‣ Get a first idea whether this is a viable approach‣ Compare results of
‣ a simple trust measure and‣ observed feature quality
‣ Is there a correlation between the two?
Study area:Münster’s old town
Feature Selection
Feature Selection
‣ Re-mapping the whole district was not feasible
Feature Selection
‣ Re-mapping the whole district was not feasible‣ Up to 100 features were manageable
Feature Selection
‣ Re-mapping the whole district was not feasible‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions
Feature Selection
‣ Re-mapping the whole district was not feasible‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions‣ 74 features with 6+ versions
74 features selected
Trust measure
Trust measure
‣ Positive factors:‣ Versions ‣ Users ‣ Indirect confirmations =
edits in the direct vicinity (50m)
Trust measure
‣ Positive factors:‣ Versions ‣ Users ‣ Indirect confirmations =
edits in the direct vicinity (50m)
‣ Negative factors:‣ Tag corrections‣ Rollbacks
Trust measure (contd.)
‣ Classification for each factor: 5 equal classes‣ Combined into one classification‣ Equal weights
Trust measure
Field Survey
‣ Thematic accuracy 4 classes:
1. Main tag wrong
2. Other tags wrong
3. Thematic ambiguities
4. Thematically correct
Field Survey
‣ Thematic accuracy 4 classes:
1. Main tag wrong
2. Other tags wrong
3. Thematic ambiguities
4. Thematically correct
‣ Results:
‣ 6 features (~8%) ‣ 2 features (~3%)‣ 9 features (~12%) ‣ 57 features (~77%)
Field Survey (contd.)
‣ Topological consistency
Field Survey (contd.)
‣ Topological consistency‣ Is the feature correctly
positioned relative to the surrounding features?
Field Survey (contd.)
‣ Topological consistency‣ Is the feature correctly
positioned relative to the surrounding features?
‣ Results:‣ 73 out of 74 features (~99%)
Field Survey (contd.)
‣ Topological consistency‣ Is the feature correctly
positioned relative to the surrounding features?
‣ Results:‣ 73 out of 74 features (~99%)
‣ Information completeness‣ TF-IDF measure to identify
relevant tags per main tag
Field Survey (contd.)
‣ Topological consistency‣ Is the feature correctly
positioned relative to the surrounding features?
‣ Results:‣ 73 out of 74 features (~99%)
‣ Information completeness‣ TF-IDF measure to identify
relevant tags per main tag
‣ ~37% tags missing (avg.)
Observed quality: combined results
Trust measure
mean quality class: ~4.2
mean trust class: ~2.8
Do we get the trend right?
Do we get the trend right?
‣ Removed outliers‣ Kendall’s τ: 0.52 ‣ Moderate, but significant
positive correlation
Conclusions
Conclusions
‣ Initial study
Conclusions
‣ Initial study‣ A feature’s history can determine its trustworthiness
Conclusions
‣ Initial study‣ A feature’s history can determine its trustworthiness‣ Trust values correlate with observed quality
Conclusions
‣ Initial study‣ A feature’s history can determine its trustworthiness‣ Trust values correlate with observed quality‣ Even with a very simple model
Conclusions
‣ Initial study‣ A feature’s history can determine its trustworthiness‣ Trust values correlate with observed quality‣ Even with a very simple model‣ Outliers cannot be explained yet
Tons of Future Work
Tons of Future Work
‣ Extend and refine the trust model:Classification, weighting, positive vs negative aspects, …
Tons of Future Work
‣ Extend and refine the trust model:Classification, weighting, positive vs negative aspects, …
‣ Social aspects: Who has edited a feature?
Tons of Future Work
‣ Extend and refine the trust model:Classification, weighting, positive vs negative aspects, …
‣ Social aspects: Who has edited a feature?‣ Repeat study without spatial focus
Tons of Future Work
‣ Extend and refine the trust model:Classification, weighting, positive vs negative aspects, …
‣ Social aspects: Who has edited a feature?‣ Repeat study without spatial focus ‣ How to scale the data collection?
Tons of Future Work
‣ Extend and refine the trust model:Classification, weighting, positive vs negative aspects, …
‣ Social aspects: Who has edited a feature?‣ Repeat study without spatial focus ‣ How to scale the data collection?‣ Learn the trust model from the data
Thank you!
All data used in this research © OpenStreetMap contributors.
[email protected] | http://carsten.io | @carstenkessler
Carsten Keßler | René de Groot