By Tatiana Cristea
Supervised by Lora Aroyo (VU) & Robert-Jan Sips (IBM)
Visualizations for quality assessment of crowdsourced data
Noisy Crowdsourced data
Quality data
Current practices: based on the consensus of workers
CrowdTruth metrics : considers disagreement informative
Select from the list the objects depicted in the image:
Balloon Flower Human Car Ghost Person
Can you identify the low quality worker(s)?
Balloon Flower Human Car Ghost Person
Balloon Flower Human Car Ghost Person
Worker 1 Worker 2 Worker 3
Unclear image (content
unit)
Select from the list the objects depicted in the image:
Can you identify the low quality worker(s)?
Balloon Flower Human Car Ghost Person
Worker 1 Balloon Flower Human Car Ghost Person
Worker 2 Balloon Flower Human Car Ghost Person
Worker 3
Not clearly separable
answers
Select from the list the objects depicted in the image:
Can you identify the low quality workers?
Balloon Flower Human Car Ghost Person
Worker 1 Balloon
Flower Human Car Ghost Person
Worker 2 Balloon
Flower Human Car Ghost Person
Worker 3
Low quality
workers
How good is the unit for the specific task?
How well the worker understood the task?
Are the annotation options clear and separable?
Unit
AnnotationWorker
Unit
AnnotationWorker
Unit
AnnotationWorker
Unit
AnnotationWorker
JOB 1 JOB 2
JOB N
Unit Unit
Unit
Worker
Worker
Worker Annotation
Annotation
Annotation
Visualization approach for quality assessment of
crowdsourced data :
a) at aggregate level
b) at a specific level
c) and in the context of their interdependencies
Extracted through interviews
Visualization of properties, statistics and metrics of: single job/unit/worker collection of jobs/unit/workers
Functional requirements: Filtering, sorting Support for detection of outliers Visualization of connected workers, content units and jobs Support of comparative analysis Support for navigation between connected elements, etc.
DEMO TOUR
We evaluated the design with 9
persons
Different levels of experience with
crowdsourcing tasks
useful in:
the assessment of quality
deep analysis of the data
But….
The amount of information was a (little) bit overwhelming…
The interactions are great!
… if you know about them
The time dimension is not always present…
Create user profiles
Decouple the visualization component and provide it as a separate plugin
Add the time dimension to the visualizations
Time