Upload
elena-simperl
View
641
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Invited talk at the LIBER2014
Citation preview
CROWDSOURCING
CONTENT
MANAGEMENT:
CHALLENGES AND
OPPORTUNITIES
ELENA SIMPERL
UNIVERSITY OF SOUTHAMPTON
03-Jul-14
LIBER2014 1
EXECUTIVE SUMMARY
Crowdsourcing helps with content
management tasks.
However,
• there is crowdsourcing and crowdsourcing
pick your faves and mix them
• human intelligence is a valuable resource
experiment design is key
• sustaining engagement is an art
crowdsourcing analytics may help
• computers are sometimes better than humans
the age of ‘social machines’
2
CROWDSOURCING:
PROBLEM SOLVING VIA
OPEN CALLS
"Simply defined, crowdsourcing represents the act of a
company or institution taking a function once performed by
employees and outsourcing it to an undefined (and generally
large) network of people in the form of an open call. This can
take the form of peer-production (when the job is performed
collaboratively), but is also often undertaken by sole
individuals. The crucial prerequisite is the use of the open
call format and the large network of potential .“
[Howe, 2006]
03-Jul-14
3
THE MANY FACES OF
CROWDSOURCING
03-Jul-14
4
CROWDSOURCING AND
RESEARCH LIBRARIES
CHALLENGES
Understand what drives
participation
Design systems to reach
critical mass and sustain
engagement
OPPORTUNITIES
Better ‘customer’ experience
Enhanced information
management
Capitalize on crowdsourced
scientific workflows
03-Jul-14
5
03-Jul-14
Tutorial@ISWC2013
IN THIS TALK:
CROWDSOURCING AS
‚HUMAN COMPUTATION‘
Outsourcing tasks that machines find difficult to solve
to humans
6
IN THIS TALK:
CROWDSOURCING DATA
CITATION
‘The USEWOD experiment ‘
• Goal: collect information about the usage of Linked Data sets in research papers
• Explore different crowdsourcing methods
• Online tool to link publications to data sets (and their versions)
• 1st feasibility study with 10 researchers in May 2014
03-Jul-14
7
http://prov.usewod.org/
9650 publications
03-Jul-14
8
DIMENSIONS OF CROWDSOURCING
DIMENSIONS OF CROWDSOURCING
WHAT IS
OUTSOURCED
Tasks based on
human skills not
easily replicable by
machines
• Visual recognition
• Language
understanding
• Knowledge acquisition
• Basic human
communication
• ...
WHO IS THE CROWD
• Open call (crowd accessible through a platform)
• Call may target specific skills and expertise (qualification tests)
• Requester typically knows less about the ‘workers’ than in other ‘work’ environments
03-Jul-14
9
See also [Quinn & Bederson, 2012]
DIMENSIONS OF CROWDSOURCING (2)
HOW IS THE TASK OUTSOURCED
• Explicit vs. implicit participation
• Tasks broken down into smaller units
undertaken in parallel by different people
• Coordination required to handle cases with
more complex workflows
• Partial or independent answers consolidated
and aggregated into complete solution
03-Jul-14
10
See also [Quinn & Bederson, 2012]
EXAMPLE: CITIZEN SCIENCE
WHAT IS OUTSOURCED
• Object recognition, labeling,
categorization in media content
WHO IS THE CROWD
• Anyone
HOW IS THE TASK
OUTSOURCED
• Highly parallelizable tasks
• Every item is handled by multiple
annotators
• Every annotator provides an answer
• Consolidated answers solve scientific
problems
03-Jul-14
11
USEWOD EXPERIMENT: TASK
AND CROWD
WHAT IS
OUTSOURCED
Annotating research papers with data set information
• Alternative representations of the domain
• What if the paper is not available?
• What if the domain is not known in advance or is infinite?
• Do we know the list of potential answers?
• Is there only one correct solution to each atomic task?
• How many people would solve the same task?
WHO IS THE CROWD
• People who know the papers or the data sets
• Experts in the (broader ) field
• Casual gamers
• Librarians
• Anyone (knowledgeable of English, with a computer/cell phone…)
• Combinations thereof…
03-Jul-14
12
USEWOD EXPERIMENT: TASK
DESIGN
HOW IS THE TASK OUTSOURCED:
ALTERNATIVE MODELS
• Use the data collected here to train a IE algorithm
• Use paid microtask workers to go a first screening, then expert crowd to sort out challenging cases
• What if you have very long documents potentially mentioning different/unknown data sets?
• Competition via Twitter
• ‘Which version of DBpedia does this paper use?’
• One question a day, prizes
• Needs golden standard to bootstrap and redundancy
• Involve the authors
• Use crowdsourcing to find out Twitter accounts, then launch campaign on Twitter
• Write an email to the authors…
• Change the task
• Which papers use Dbpedia 3.X? • Competition to find all papers
03-Jul-14
13
DIMENSIONS OF CROWDSOURCING (3)
HOW ARE THE
RESULTS VALIDATED
• Solutions space closed vs. open
• Performance measurements/ground truth
• Statistical techniques employed to predict accurate solutions
• May take into account confidence values of algorithmically generated solutions
HOW CAN THE
PROCESS BE
OPTIMIZED
• Incentives and motivators
• Assigning tasks to people based on their skills and performance (as opposed to random assignments)
• Symbiotic combinations of human- and machine-driven computation, including combinations of different forms of crowdsourcing
03-Jul-14
14
See also [Quinn & Bederson, 2012]
USEWOD EXPERIMENT:
VALIDATION
• Domain is fairly restricted
• Spam and obvious wrong answers can be detected easily
• When are two answers the same? Can there be more
than one correct answer per question?
• Redundancy may not be the final answer
• Most people will be able to identify the data set, but
sometimes the actual version is not trivial to reproduce
• Make educated version guess based on time intervals
and other features
03-Jul-14
15
ALIGNING INCENTIVES
IS ESSENTIAL
Successful volunteer crowdsourcing is difficult to predict or replicate
• Highly context-specific
• Not applicable to arbitrary tasks
Reward models often easier to study and control (if performance can be reliably measured)
• Different models: pay-per-time, pay-per-unit, winner-takes-it-all
• Not always easy to abstract from social aspects (free-riding, social pressure)
• May undermine intrinsic motivation
16
IT‘S NOT ALWAYS
JUST ABOUT MONEY
03-Jul-14
17
http://www.crowdsourcing.org/editorial/how-to-motivate-the-crowd-infographic/
http://www.oneskyapp.com/blog/tips-to-motivate-participants-of-crowdsourced-
translation/
[Source: Kaufmann,
Schulze, Viet, 2011]
[Source: Ipeirotis, 2008]
CROWDSOURCING
ANALYTICS
03-Jul-14
18 0
2
4
6
8
10
12
14
16
18
20
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Acti
ve u
sers
in
%
Month since registration
See also [Luczak-Rösch et al. 2014]
USEWOD EXPERIMENT:
OTHER INCENTIVES
MODELS
• Who benefits from the results
• Who owns the results
• Twitter-based contest
• ‘Which version of DBpedia does this paper use?’
• One question a day, prizes
• If question is not answered correctly, increase the prize
• If low participation, re-focus the audience or change the
incentive.
• Altruism: for each ten papers annotated we send a
student to ESWC…
03-Jul-14
19
[Source: Nature.com]
DIFFERENT CROWDS FOR
DIFFERENT TASKS
Contest
Linked Data experts
Difficult task
Final prize
Find Verify
Microtasks
Workers
Easy task
Micropayments
TripleCheckMate [Kontoskostas2013]
MTurk http://mturk.com
See also [Acosta et al., 2013]
20
Not sure
COMBINING HUMAN AND
COMPUTATIONAL INTELLIGENCE
EXAMPLE: BIBLIOGRAPHIC DATA
INTEGRATION
21
paper conf
Data integration VLDB-01
Data mining SIGMOD-02
title author email
OLAP Mike mike@a
Social media Jane jane@b
Generate plausible matches
– paper = title, paper = author, paper = email, paper = venue
– conf = title, conf = author, conf = email, conf = venue
Ask users to verify
paper conf
Data integration VLDB-01
Data mining SIGMOD-02
title author email venue
OLAP Mike mike@a ICDE-02
Social media Jane jane@b PODS-05
Does attribute paper match attribute author?
No Yes
See also [McCann, Shen, Doan, 2008]
03-Jul-14
22
SUMMARY AND FINAL REMARKS
[Source: Dave de Roure]
SUMMARY
• There is crowdsourcing and
crowdsourcing pick your faves
and mix them
• Human intelligence is a valuable
resource experiment design is
key
• Sustaining engagement is an art
crowdsourcing analytics may help
• Computers are sometimes better
than humans the age of ‘social
machines’
03-Jul-14
23
THE AGE OF SOCIAL
MACHINES
03-Jul-14
24
@ESIMPERL
WWW.SOCIAM.ORG
WWW.PLANET-DATA.EU
THANKS TO MARIBEL ACOSTA, LAURA
DRAGAN, MARKUS LUCZAK-RÖSCH, RAMINE
TINATI, AND MANY OTHERS
03-Jul-14
25