94
http://lora-aroyo.org @laroyo Lora Aroyo DATA SCIENCE WITH HUMANS IN THE LOOP

Data Science with Humans in the Loop

Embed Size (px)

Citation preview

Page 1: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Lora Aroyo

DATA SCIENCE WITH HUMANS IN THE LOOP

Page 2: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Bulgaria NYC

The Netherlands

2

ABOUT ME ... Personal Data Science

Sofia

Page 3: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 3

E-LEARNING & AI

To understand the value of semantic technologies for e-learning

we need to understand the people, specifically how they

interact and consume information

Page 4: Data Science with Humans in the Loop

⌂ http://lora-aroyo.org @laroyo

MY RESEARCH FAMILY

4

… and many many many many more

Page 5: Data Science with Humans in the Loop

⌂ http://lora-aroyo.org @laroyo

MY RESEARCH FAMILY

5

… and many many many many more

Page 6: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 6

CROWDTRUTH TEAM

Page 7: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

EVOLUTION OF SEMANTIC WEB

7

Great moments from 1980s till now

Page 8: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

EMPIRE OF THE EXPERTS

8

80’s

Advances in hardware and SDEsPCs, workstations, Symbolics, SunNew architectures like Hypercube LISP, Prolog, OPSAI can now BUILD SYSTEMS

Primary focus on experts and rules

What is the knowledge of expertsGraphs, logic, rules, frames

How do experts reason?Deduction, induction

Work on form & process academicinside the system, to make the

reasoning inside the system proper and as good as possible

Industry forged ahead with ad-hoc & proprietary systems and actually tried to build expert systems

Originals of uncertain KRFuzzy, probabilistic

Page 9: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

EMPIRE OF THE EXPERTS

9

80’s

Piero Bonissone and the DELTA/CATS expert system for locomotive repair with David Smith, a locomotive repair expert

Page 10: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

EMPIRE OF THE EXPERTS

10

80’s

Buchanan and Shortliff’s MYCIN project at Stanford built a huge rule base for medical diagnosis working with an extensive team of medical experts.

Page 11: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

KNOWLEDGE ACQUISITION FROM EXPERTS

11

90’s

Common KADS

founded by Bob Wielinga as a methodology for expert knowledge acquisition. It was deeply psychology based - it was about people, about their knowledge and especially about their expertise. How do people know what they know, and how can you acquire that knowledge?

Page 12: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

STRUCTURED KNOWLEDGE ENGINEERING

Page 13: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

INTEROPERABILITY & STANDARDS ODYSSEY

13

00’s

Page 14: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

AI AWAKENS

14

10’s

Page 15: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

15

2011

IBM WATSON @JEOPARDY

Page 16: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

BIG DATA

16

10’s

Page 17: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

17

BIG CROWDS10’s

Human Annotation Central in Machine Learning Training & Evaluation

Page 18: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

COMFORT ZONE

18

7 MYTHS ABOUT HUMAN ANNOTATION

Page 19: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

ONE TRUTH

19

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every example

7 MYTHS ABOUT HUMAN ANNOTATION

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 20: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 20

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or false

7 MYTHS ABOUT HUMAN ANNOTATION

ALL EXAMPLES EQUAL

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 21: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 21

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problem

7 MYTHS ABOUT HUMAN ANNOTATION

DISAGREEMENT BAD

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 22: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 22

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain experts

7 MYTHS ABOUT HUMAN ANNOTATION

EXPERTS RULE

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 23: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 23

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficient

7 MYTHS ABOUT HUMAN ANNOTATION

ONE IS ENOUGH

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 24: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 24

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficientDetailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old7 MYTHS ABOUT HUMAN ANNOTATION

BINARY WORLD

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 25: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 25

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

DOES THIS SENTENCE EXPRESS TREATS RELATION?

Page 26: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 26

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

DOES THIS SENTENCE EXPRESS TREATS RELATION?

Page 27: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 27

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

DOES THIS SENTENCE EXPRESS TREATS RELATION?

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Page 28: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 28

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

WHAT DO EXPERTS SAY?

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Page 29: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 29

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

WHAT DOES A LAY ANNOTATOR SAY?

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Page 30: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 30

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

WHAT DOES ANOTHER LAY ANNOTATOR SAY?

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Page 31: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 31

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

WHAT DOES A THIRD LAY ANNOTATOR SAY?

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Page 32: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 32

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

WHAT DOES THE CROWD SAY?

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.Intuition: This is better

95%

75%

50%

Page 33: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 33

For prevention of malaria, use only in individuals traveling

to malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

95%

75%

50%

There’s a difference between these two

This one isn’t utterly wrong

BETTER

WORSE

WHAT DOES THE CROWD SAY?

Page 34: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 34

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficientDetailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old

COMFORT ZONEDisrupted

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 35: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 35

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficientDetailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old

COMFORT ZONEDisrupted

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 36: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 36

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficientDetailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old

COMFORT ZONEDisrupted

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 37: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 37

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficientDetailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old

COMFORT ZONEDisrupted

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 38: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 38

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficientDetailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old

COMFORT ZONEDisrupted

“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 39: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 39

For prevention of malaria, use only in individuals traveling to

malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

ENCOURAGING DISAGREEMENT

Rheumatoid arthritis and MALARIA have been treated

with CHLOROQUINE for decades.

Treats: Chloroquine, Malaria

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.Intuition: This is better

95%

75%

50%

Page 40: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWD TASK

Page 41: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

WORKER VECTOR FOR A SENTENCE

treats associated _with othersymptom

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Page 42: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

MANY WORKERS FOR THE SAME SENTENCE

treats otherassociated _withsymptom

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Page 43: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

ALL WORKER VECTORS AGGREGATED IN A SENTENCE VECTOR

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

treats othernoneassociated _withsymptommanifestationside

effect

Page 44: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

SENTENCE VECTORS FOR THE 3 SENTENCES

treats othernoneassociated _withsymptommanifestationside

effect

treats othernoneassociated _withcontraindicatesmanifestation

treats other

Page 45: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo 45

One truth: knowledge acquisition for the semantic web assumes one correct interpretation for every exampleAll examples are created equal: triples are triples, one is not more important than another, they are all either true or falseDisagreement bad: when people disagree, they don’t understand the problemExperts rule: knowledge is captured from domain expertsOne is enough: knowledge by a single expert is sufficientDetailed explanations help: if examples cause disagreement - add instructions Once done, forever valid: knowledge is not updated; new data not aligned with old

TIME TO DISRUPT THE COMFORT ZONE“Truth is a Lie: 7 Myths about Human Annotation”, AI Magazine 2014, L. Aroyo, C. Welty

Page 46: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

EXCITING DISCOVERIESEXCITING DISCOVERIES

Page 47: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

UMLS RELATION EXTRACTION PROJECT

NLP

UMLS

Page 48: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

The final frontier

VECTOR SPACE

Page 49: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

KNOWLEDGE

ACQUISITIONSTRUCTURED KNOWLEDGE ENGINEERING

Page 50: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

HYPER-DIMENSIONAL SPACE

Page 51: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

HYPER-DIMENSIONAL SPACE 3-axis tensor

Page 52: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

HYPER-DIMENSIONAL SPACE … matrix

Page 53: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

3-AXIS TENSORWorkers axis

Page 54: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Relations axis

3-AXIS TENSOR

Page 55: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Sentences axis

3-AXIS TENSOR

Page 56: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

HYPER-DIMENSIONAL SPACEWorker votes

Page 57: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

HYPER-DIMENSIONAL SPACE

R1R2R3R4R5R6R7R8R9R10R11R12

Sentence plane into a sentence vector

Page 58: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

HYPER-DIMENSIONAL SPACESentence Slice

Page 59: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

HYPER-DIMENSIONAL SPACE3 Sentence Slices

Page 60: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

DISAGREEMENT IS SIGNALVariety of sources for disagreement

Page 61: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Source 1: People’s bias & perspective

DISAGREEMENT IS SIGNAL

Page 62: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

DISAGREEMENT IS SIGNALSource 1: Worker systematically give same answer

Page 63: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

DISAGREEMENT IS SIGNALSource 1: Worker systematically give same answer

Page 64: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

DISAGREEMENT IS SIGNALSource 1: Worker systematically give same answer

Page 65: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Source 2: Target semantics

DISAGREEMENT IS SIGNAL

Page 66: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

SentencesSource 3: Sentences

DISAGREEMENT IS SIGNAL

Page 67: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

TRIANGLE OF MEANINGModel of semantic interpretation (Ogden & Richards, 1936)

Page 68: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

TRIANGLE OF MEANINGModel of semantic interpretation

Page 69: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

treats other

CrowdTruth metrics for quality assessment

TRIANGLE OF MEANING

Page 70: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

treats other

Spam

QUALITY ASSESSMENT

Page 71: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

treats othernoneassociated _withsymptommanifestationside

effect

Among 56 subjects reporting to a clinic with symptoms

of MALARIA 53 (95%) had ordinarily effective levels

of CHLOROQUINE in blood.

Sentence ambiguity

QUALITY ASSESSMENT

Page 72: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

TREATS RELATIONYes or No?

treats othernoneassociated _withcontraindicatesmanifestation

treats othernoneassociated _withcontraindicatesmanifestation

treats other

Page 73: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

THE WORLD IS SMOOTH AND NOT BINARY

Page 74: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Agreement as percentage

75%25% 25% 12% 12%50%

CROWDTRUTH METRICS

For prevention of malaria, use only in individuals traveling to

malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

25% 25% 75% 12% 12% 50%

treats othernoneassociated _withcontraindicatesmanifestation

Page 75: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

For prevention of malaria, use only in individuals traveling to

malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

CROWDTRUTH METRICSApplying all sides of the triangle

treats other

Page 76: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH METRICS

For prevention of malaria, use only in individuals traveling to

malarious areas where CHLOROQUINE resistant P.

falciparum MALARIA has not been reported.

treats other

Applying all sides of the triangle

99%

Page 77: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH METRICSApplying all sides of the triangle

Page 78: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

For prevention of malaria, use only in individuals traveling to malarious areas where CHLOROQUINE resistant P. falciparum MALARIA has not been reported.

CROWDTRUTH KNOWLEDGE TENSOR

Page 79: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH VS. EXPERTScrowd as good or better than from experts

Page 80: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

AMBIGUITY IMPACTS ACCURACYmore ambiguous sentences were harder to classify

Page 81: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH METRICSQuality assessment

Page 82: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH.ORGa spatial representation of meaning that harnesses disagreement

Page 83: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

On the role of user-generated metadata in audio visual collections (2011). R. Gligorov, M. Hildebrand, J. van Ossenbruggen, G. Schreiber, L. Aroyo K-CAP2011

VIDEO METADATA ENRICHMENTThe Netherlands Institute for Sound and Vision

1

Page 84: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

DIVE+Explorative Search

2

DIVE into the event-based browsing of linked historical media (2015)

V De Boer, J Oomen, O Inel, L Aroyo, E Van Staveren, in Journal of Web Semantics:

Page 85: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

DEEP QA IN CULTURAL HERITAGEMauritshuis use case

3

Page 86: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH IN DEPLOYMENTGoogle Maps

questions

Google Maps

reviewers

4

Page 87: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH IN DEPLOYMENTGoogle Maps

emotions

mTURK crowd

5

Page 88: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

WHAT DOES THE FUTURE HOLD

Page 89: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

USER-CENTRIC DATA SCIENCEFormerly the Web & Media group

Page 90: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

H2020 ReTVTrans-Vector Platform (TVP)

Lora Aroyo, (coordinator) VU Amsterdam, Computer Science

Lyndon Nixon, MODUL, AT

Vasileios Mezaris, CERTH, GR

Arno Scharl, Weblyzard, AT

Bea Knecht, Zattoo, DE

Johan Oomen, Sound and Vision, NL

Nicolas Patz, Rundfunk Berlin-brandenburg, DE

Page 91: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CAPTURING BIASStartimpuls for the Dutch National Science Agenda

Lora Aroyo, (coordinator) VU Amsterdam, Computer Science

Alessandro Bozzon, TU Delft CS & Delft Data Science

Alec Badenoch, Utrecht University, Media & Culture Studies

Antoaneta Dimitrova, Leiden University, Institute of Public Administration

Johan Oomen, Netherlands Institute for Sound and Vision

Page 92: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

CROWDTRUTH ROCKS!

Disagreement is signal

CrowdTruth is a spatial representation of meaning that harnesses disagreement

Crowds bring natural diversity

CrowdTruth defines hyper-dimensional space to represent ambiguity

Crowds help gathering real human semantics

Page 93: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

The world is full of shades of grey

Experts and crowds are complimentary

Capturing and understanding opinions, perspectives and contexts in the center of understanding people

TIME TO BREAK FREE

CrowdTruth defines multi-dimensional space to measure quality

Page 94: Data Science with Humans in the Loop

http://lora-aroyo.org @laroyo

Lora Aroyo

DATA SCIENCE WITH HUMANS IN THE LOOP