11
Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle, WA [email protected] Andrey Kolobov Microsoft Research Redmond, WA [email protected] Mausam Indian Institute of Technology Delhi, India [email protected] Daniel S. Weld University of Washington (CSE) Seattle, WA [email protected] Abstract An ideal crowdsourcing or citizen-science system would route tasks to the most appropriate workers, but the best as- signment is unclear because workers have varying skill, tasks have varying difficulty, and assigning several workers to a single task may significantly improve output quality. This paper defines a space of task routing problems, proves that even the simplest is NP-hard, and develops several approx- imation algorithms for parallel routing problems. We show that an intuitive class of requesters’ utility functions is sub- modular, which lets us provide iterative methods for dynami- cally allocating batches of tasks that make near-optimal use of available workers in each round. Experiments with live oDesk workers show that our task routing algorithm uses only 48% of the human labor compared to the commonly used round- robin strategy. Further, we provide versions of our task rout- ing algorithm which enable it to scale to large numbers of workers and questions and to handle workers with variable response times while still providing significant benefit over common baselines. Introduction While there are millions of workers present and thousands of tasks available on crowdsourcing platforms today, the prob- lem of effectively matching the two, known as task routing, remains a critical open question. Law & von Ahn (2011) describe two modes of solving this problem – the pull and the push modes. In the pull mode, such as on Amazon Me- chanical Turk (AMT), workers themselves select tasks based on price, keywords, etc. In contrast, a push-oriented labor market, popular on volunteer crowdsourcing platforms (e.g., Zooniverse (Lintott et al. 2008)), directly allocates appropri- ate tasks to workers as they arrive. Increasing amounts of historical data about tasks and workers create the potential to greatly improve this assignment process. For example, a crowdsourcing system might give easy tasks to novice work- ers and route difficult problems to experts. Unfortunately, this potential is hard to realize. Exis- ting task routing algorithms (e.g., (Donmez, Carbonell, and Schneider 2009; Wauthier and Jordan 2011; Ho and Vaughan 2012; Chen, Lin, and Zhou 2013)) make too re- strictive assumptions to be applied to realistic crowdsourc- ing platforms. For example, they typically assume at least Copyright c 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. one of the following simplifications: (1) tasks can be allo- cated purely sequentially, (2) workers are willing to wait patiently for a task to be assigned, or (3) the quality of a worker’s output can be evaluated instantaneously. In con- trast, an ideal practical task router should be completely un- supervised, since labeling gold data is expensive. Further- more, it must assign tasks in parallel to all available work- ers, since making engaged workers wait for assignments leads to an inefficient platform and frustrated workers. Fi- nally, the task router should operate in real-time so that workers need not wait. The two latter aspects are especially important in citizen science and other forms of volunteer (non-economically motivated) crowdsourcing. In this paper we investigate algorithms for task routing that satisfy these desiderata. We study a setting, called J OCR (Joint Crowdsourcing (Kolobov, Mausam, and Weld 2013)), in which the tasks are questions, possibly with varying dif- ficulties. We assume that the router has access to a (possi- bly changing) set of workers with different levels of skill. The system keeps each available worker busy as long as the worker stays online by assigning him or her questions from the pool. It observes responses from the workers that com- plete their tasks and can aggregate their votes using majority vote or more sophisticated Bayesian methods in order to es- timate its confidence in answers to the assigned questions. The system’s objective is to answer all questions from the pool as accurately as possible after a fixed number of allo- cations. Even within this setting, various versions of the prob- lem are possible depending upon the amount of informa- tion available to the system (Figure 1). In this paper we fo- cus on the case where task difficulties and worker abilities are known a priori. Not only does this setting lay a foun- dation for the other cases, but it is of immediate practical use. Recently developed community-based aggregation can learn very accurate estimates of worker accuracy from lim- ited interactions (Venanzi et al. 2014). Also, for a wide range of problems, there are domain-specific features that allow a system to roughly predict the difficulty of a task. We develop algorithms for both offline and adaptive J OCR task routing problems. For the offline setting, we first prove that the problem is NP hard. Our approximation algorithm, JUGGLER OFF , is based on the realization that a natu- ral objective function for this problem, expected informa-

Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

Parallel Task Routing for CrowdsourcingJonathan Bragg

University of Washington (CSE)Seattle, WA

[email protected]

Andrey KolobovMicrosoft Research

Redmond, [email protected]

MausamIndian Institute of Technology

Delhi, [email protected]

Daniel S. WeldUniversity of Washington (CSE)

Seattle, [email protected]

AbstractAn ideal crowdsourcing or citizen-science system wouldroute tasks to the most appropriate workers, but the best as-signment is unclear because workers have varying skill, taskshave varying difficulty, and assigning several workers to asingle task may significantly improve output quality. Thispaper defines a space of task routing problems, proves thateven the simplest is NP-hard, and develops several approx-imation algorithms for parallel routing problems. We showthat an intuitive class of requesters’ utility functions is sub-modular, which lets us provide iterative methods for dynami-cally allocating batches of tasks that make near-optimal use ofavailable workers in each round. Experiments with live oDeskworkers show that our task routing algorithm uses only 48%of the human labor compared to the commonly used round-robin strategy. Further, we provide versions of our task rout-ing algorithm which enable it to scale to large numbers ofworkers and questions and to handle workers with variableresponse times while still providing significant benefit overcommon baselines.

IntroductionWhile there are millions of workers present and thousands oftasks available on crowdsourcing platforms today, the prob-lem of effectively matching the two, known as task routing,remains a critical open question. Law & von Ahn (2011)describe two modes of solving this problem – the pull andthe push modes. In the pull mode, such as on Amazon Me-chanical Turk (AMT), workers themselves select tasks basedon price, keywords, etc. In contrast, a push-oriented labormarket, popular on volunteer crowdsourcing platforms (e.g.,Zooniverse (Lintott et al. 2008)), directly allocates appropri-ate tasks to workers as they arrive. Increasing amounts ofhistorical data about tasks and workers create the potentialto greatly improve this assignment process. For example, acrowdsourcing system might give easy tasks to novice work-ers and route difficult problems to experts.

Unfortunately, this potential is hard to realize. Exis-ting task routing algorithms (e.g., (Donmez, Carbonell,and Schneider 2009; Wauthier and Jordan 2011; Ho andVaughan 2012; Chen, Lin, and Zhou 2013)) make too re-strictive assumptions to be applied to realistic crowdsourc-ing platforms. For example, they typically assume at least

Copyright c© 2014, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

one of the following simplifications: (1) tasks can be allo-cated purely sequentially, (2) workers are willing to waitpatiently for a task to be assigned, or (3) the quality of aworker’s output can be evaluated instantaneously. In con-trast, an ideal practical task router should be completely un-supervised, since labeling gold data is expensive. Further-more, it must assign tasks in parallel to all available work-ers, since making engaged workers wait for assignmentsleads to an inefficient platform and frustrated workers. Fi-nally, the task router should operate in real-time so thatworkers need not wait. The two latter aspects are especiallyimportant in citizen science and other forms of volunteer(non-economically motivated) crowdsourcing.

In this paper we investigate algorithms for task routingthat satisfy these desiderata. We study a setting, called JOCR(Joint Crowdsourcing (Kolobov, Mausam, and Weld 2013)),in which the tasks are questions, possibly with varying dif-ficulties. We assume that the router has access to a (possi-bly changing) set of workers with different levels of skill.The system keeps each available worker busy as long as theworker stays online by assigning him or her questions fromthe pool. It observes responses from the workers that com-plete their tasks and can aggregate their votes using majorityvote or more sophisticated Bayesian methods in order to es-timate its confidence in answers to the assigned questions.The system’s objective is to answer all questions from thepool as accurately as possible after a fixed number of allo-cations.

Even within this setting, various versions of the prob-lem are possible depending upon the amount of informa-tion available to the system (Figure 1). In this paper we fo-cus on the case where task difficulties and worker abilitiesare known a priori. Not only does this setting lay a foun-dation for the other cases, but it is of immediate practicaluse. Recently developed community-based aggregation canlearn very accurate estimates of worker accuracy from lim-ited interactions (Venanzi et al. 2014). Also, for a wide rangeof problems, there are domain-specific features that allow asystem to roughly predict the difficulty of a task.

We develop algorithms for both offline and adaptive JOCRtask routing problems. For the offline setting, we first provethat the problem is NP hard. Our approximation algorithm,JUGGLEROFF, is based on the realization that a natu-ral objective function for this problem, expected informa-

Page 2: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

Policy   Concurrency  

Skill,  Difficulty  

Latent  

Known  

Offline  

Adap<ve  

Sequen<al  

Parallel  

Figure 1: The space of allocation problems.

tion gain, is submodular. This allows JUGGLEROFF toefficiently obtain a near-optimal allocation. We use simi-lar ideas to devise an algorithm called JUGGLERAD forthe adaptive problem, which achieves even better empiricalperformance. JUGGLERBIN is an approximate version ofJUGGLERAD, which is able to scale to large numbers ofworkers and tasks in an efficient manner. Finally, we presentJUGGLERRT, which handles workers who take differentamounts of time to complete tasks.

We test our algorithms both in simulation and using realdata. Experiments with live oDesk workers on a natural lan-guage named entity linking task show that JUGGLERAD’sallocation approach achieves the same accuracy as the com-monly used round-robin strategy, using only 48% of the la-bor. Additional experiments on simulated data show that (1)our adaptive method significantly outperforms offline rout-ing, (2) the adaptivity’s savings grow when worker skills andtask difficulties are more varied, (3) our binned (approxi-mate) adaptive algorithm provides scalability without com-promising performance, and (4) our approach yields signifi-cant savings even when worker response times vary.

Problem DefinitionIn our JOCR model, we posit a set of workersW and a set ofquestionsQ. Each question q ∈ Q has a (possibly latent) dif-ficulty dq ∈ [0, 1], and each worker w ∈ W has a (possiblylatent) skill parameter γw ∈ (0,∞). While more complexmodels are possible, ours has the advantage that it is learn-able even with small amounts of data. We assume that theprobability of a worker w providing a correct answer to aquestion q is a monotonically increasing function P (dq, γw)of worker skill and a monotonically decreasing function ofquestion difficulty. For example, one possible such function,adapted from (Dai, Mausam, and Weld 2010), is

P (dq, γw) =1

2(1 + (1− dq)

1γw ). (1)

We assume that JOCR algorithms will have access to apool of workers, and will be asking workers from the poolto provide answers to the set of questions submitted by arequester. This pool is a subset of W consisting of work-ers available at a given time. The pool need not be static— workers may come and go as they please. To formal-ize the problem, however, we assume that workers disappear

or reappear at regularly spaced points in time, separated bya duration equal to the time it takes a worker to answer aquestion. We call each such a decision point a round. At thebeginning of every round t, our algorithms will assign ques-tions to the pool Pt ⊆ W of workers available for that round,and collect their responses at the round’s end.

The algorithms will attempt to assign questions to work-ers so as to maximize some utility over a fixed number ofrounds, a time horizon, denoted as T . Since we focus onvolunteer crowdsourcing platforms, in our model asking aworker a question incurs no monetary cost. To keep work-ers engaged and get the most questions answered, our tech-niques assign a question to every worker in pool Pt in ev-ery round t = 1, . . . , T . An alternative model, useful on ashared citizen-science platform would allow the routing al-gorithm a fixed budget of nworker requests over an arbitraryhorizon.

Optimization CriteriaThe JOCR framework allows the use of various utility func-tions U(S) to optimize for, where S ∈ 2Q×W denotesan assignment of questions to workers. One natural util-ity function choice is the expected information gain fromobserving a set of worker responses. Specifically, let A ={A1, A2, . . . , A|Q|} denote the set of random variables overcorrect answers to questions, and let X = {Xq,w | q ∈Q ∧ w ∈ W} be the set of random variables correspond-ing to possible worker responses. Further, let XS denote thesubset of X corresponding to assignment S. We can quan-tify the uncertainty in our predictions for A using the jointentropy

H(A) = −∑

a∈domA

P (a) logP (a),

where the domain of A consists of all possible assignmentsto the variables in A, and a denotes a vector representingone assignment. The conditional entropy

H(A | XS) = −∑

a ∈ domA,x ∈ domXS

P (a,x) logP (a | x)

represents the expected uncertainty in the predictions afterobserving worker votes corresponding to the variables inXS . Following earlier work (Whitehill et al. 2009), we as-sume that questions are independent (assuming known pa-rameters) and that worker votes are independent given thetrue answer to a question, i.e.,

P (a,x) =∏q∈Q

P (aq)∏

w∈W who answered q

P (xq,w | aq),

where P (xq,w | aq) depends on worker skill and questiondifficulty as in Equation 1. We can now define the value of atask-to-worker assignment as the expected reduction in en-tropy

UIG(S) = H(A)−H(A | XS). (2)

Page 3: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

Problem DimensionsGiven the information gain maximization objective, we con-sider the problem of finding such an assignment under com-binations of assumptions lying along three dimensions (Fig-ure 1):

• Offline vs. online (adaptive) assignment construction.In the offline (or static) mode, a complete assignment ofT questions to each worker is chosen once, before the be-ginning of the first round, and is not changed as workersprovide their responses. Offline assignment constructionis the only available option when workers’ responses can-not be collected in real time. While we provide an algo-rithm for the offline case, JUGGLEROFF, it is not ourmain focus. Indeed, worker responses typically are avail-able immediately after workers provide them, and takingthem into account allows the router to resolve worker dis-agreements on the fly. Our experimental results confirmthis, with adaptive JUGGLERAD outperforming the of-fline algorithm as well as our baselines.

• Sequential vs. parallel hiring of workers. A centralquestion in online allocation is the number of workers tohire in a given round. If one could only hire n workers intotal, an optimal strategy would ask workers sequentially,one worker per round, because this allows the maximumuse of information when building the rest of the assign-ment. However, this strategy is impractical in most con-texts, especially in citizen science — using just a singleworker wastes the time of the majority and demotivatesthem. Therefore, in this paper we consider the harder caseof parallel allocation of questions to a set of workers. Inour case, this set comprises all workers available at themoment of allocation.

• Known vs. latent worker skill and question difficulty.Worker skill and question difficulty are critical for match-ing tasks with workers. Intuitively, the optimal way touse skillful workers is to give them the most difficulttasks, leaving easier ones to the less proficient mem-bers of the pool. In this paper, we assume these quanti-ties are known — a reasonable assumption in practice.Since workers and requesters often develop a long-termrelationships, EM-based learning (Whitehill et al. 2009;Welinder et al. 2010) can estimate worker accuracy froma surprisingly short record using community-based aggre-gation (Venanzi et al. 2014).Similarly, domain-specific features often allow a systemto estimate a task’s difficulty. For example, the linguis-tic problem of determining whether two nouns corefer istypically easy in the case of apposition, but harder whena pronoun is distant from the target (Stoyanov et al. 2009;Raghunathan et al. 2010). In the citizen science domain,detecting planet transits (the focus of the Zooniverseproject Planet Hunters) is more difficult for planets thatare fast-moving, small, or in front of large stars (Mao etal. 2013).If skill or difficulty is not known, the router faces a chal-lenging exploration/exploitation tradeoff. For example, itgains information about a worker’s skill by asking a ques-

tion for which it is confident of the answer, but of coursethis precludes asking a question whose answer it does notknow.

Offline AllocationWe first address the simplest form of the allocation problem,since its solution (and complexity) forms a basis for subse-quent, more powerful algorithms. Suppose that an agent isable to assign tasks to a pool of workers, but unable to ob-serve worker responses until they have all been received andhence has no way to reallocate questions based on workerdisagreement. We assume that the agent has at its disposalan estimate of question difficulties and worker skills, as dis-cussed above.

Theorem 1. Finding an assignment of workers to questionsthat maximizes an arbitrary utility function U(S) is NP-hard.

Proof sketch. We prove the result via a reduction from thePartition Problem to offline JOCR. An instance of this prob-lem is specified by a set X containing n elements anda function s : X → Z+. The Partition Problem askswhether we can divide X into two subsets X1 and X2 s.t.∑

x∈X1s(x) =

∑x∈X2

s(x).We construct an instance of offline JOCR to solve an in-

stance of the Partition Problem as follows. For each elementxi ∈ X , we define a corresponding worker wi ∈ W withskill γi = s(xi)/maxxi∈X s(xi), and let the horizon equal1. We also define two questions, q1 and q2, with the samedifficulty d. Let Sq,w be an indicator variable that takes avalue of 1 iff worker w has been assigned question q. Letf(S, q) = log [

∑w Sq,wγw + 1] and define the utility func-

tion as U(S) =∑

q f(S, q).

The solution to a Partition Problem instance is true ifff(S, q1) = f(S, q2) for an optimal solution S to the cor-responding JOCR problem constructed above.⇒: Suppose that there exist subsets of X , X1 and X2, suchthat the sum of elements in the two sets are equal. Further,suppose for contradiction that the optimal solution S fromJOCR assigns sets of workers with different sums of skillsto work on the two questions. Then we can improve U(S)by making sum of skills equal (by monotonicity and sub-modularity of logarithm), implying that S was suboptimal,leading to a contradiction.⇐: Suppose that S is an optimal solution to the JOCR prob-lem s.t. f(S, q1) = f(S, q2). Then sums of worker skills(multiplied by the maximum worker skill) for each ques-tion are equal and the solution to the Partition Problem istrue.

Given the intractability of this problem, we next deviseapproximation methods. Before proceeding further, wereview the combinatorial concept of submodularity. A func-tion f : 2N → R is submodular if for every A ⊆ B ⊆ Nand e ∈ N : f(A∪{e})−f(A) ≥ f(B∪{e})−f(B). Fur-ther, f is monotone if for everyA ⊆ B ⊆ N : f(A) ≤ f(B).Intuitively, this is a diminishing returns property; a functionis submodular if the marginal value of adding a particular

Page 4: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

element never increases as other elements are added to a set.

Theorem 2. The utility function UIG(S) based on value ofinformation defined in the previous section is monotone sub-modular in S, provided that worker skill and question diffi-culty are known.

Proof sketch. Given worker skill, question difficulty, and thetrue answers, worker ballots are independent. We can usethe methods of (Krause and Guestrin 2005) to show that ex-pected information gain UIG(S) is submodular and nonde-creasing.

Since the utility function UIG(S) is monotone submodu-lar, we naturally turn to approximation algorithms. Our opti-mization problem is constrained by the horizon T , meaningthat we may not assign more than T questions to any worker.We encode this constraint as a partition matroid. Matroidscapture the notion of linear independence in vector spacesand are specified as M = (N , I), where N is the groundset and I ∈ 2N are the subsets of elements in N that areindependent in the matroid. A partition matroid is specifiedwith an additional constraint. If we partition N = Q ×Winto disjoint sets Bw = {w}×Q corresponding to the set ofpossible assignments for each worker w, we can constructthe desired partition matroid by defining an independent setto include no more than T elements from each of these sets.

JUGGLEROFF uses the greedy selection process shownin Algorithm 1, which is guaranteed to provide a 1/2-approximation for monotone submodular optimization witha matroid constraint (Nemhauser, Wolsey, and Fisher 1978),yielding the following theorem:

Theorem 3. Let S∗ be a question-to-worker assignmentwith the highest information gain UIG, and let S bethe assignment found by maximizing UIG greedily. ThenUIG(S) ≥ 1

2UIG(S∗).

JUGGLEROFF improves efficiency by making assignmentsseparately for each worker (exploiting the fact that Equa-tion 1 is monotonically increasing in worker skill) and by us-ing lazy submodular evaluation (Krause and Golovin 2014).Interestingly, while the greedy algorithm dictates selectingworkers in order of descending skill in Algorithm 1, we ob-serve drastic improvement in our empirical results from se-lecting workers in reverse order; our implementation sortsby increasing worker skill, which prioritizes assigning eas-ier questions to the less-skilled workers. We have simpli-fied the utility computation of ∆X by observing that en-tropy decomposes by question in our model. We note thatmore involved algorithms can improve the theoretical guar-antee to (1 − 1/e) (Vondrak 2008; Filmus and Ward 2012;Badanidiyuru and Vondrak 2014), but given the large pos-itive effect of simply reversing the order of workers, it isunlikely that these methods will provide significant benefit.

Since we are motivated by the citizen science scenario,we have not included cost in the agent’s utility function.However, we note that our utility function remains submod-ular (but not longer monotone) with the addition of a linear

Algorithm 1 The JUGGLEROFF algorithmInput: WorkersW , prior P (a) over answers, unobservedvotes XR, horizon T , initial assignment SOutput: Final assignment S of questions to workersfor w in sorted(W, key = γw) do

for i = 1 to T doXw ← {Xq,w′ ∈ XR | w′ = w}for Xq,w ∈ Xw do

∆X ← H(Aq | x)−H(Aq | Xq,w,S,x)end forX∗ ← arg max{∆X : X ∈ Xw}Set S ← S ∪ {X∗} and XR ← XR \ {X∗}

end forend for

cost term. Thus, one can formulate the optimization prob-lem as non-monotone submodular optimization, for whichalgorithms exist.

Adaptive (Online) AllocationJUGGLEROFF performs a complete static allocation anddoes not adapt based on worker response. However, in mostcrowdsourcing platforms, we have access to intermediateoutput after each worker completes her attempt on a ques-tion; it makes sense to consider all previous responses dur-ing each round, when assigning tasks. Intuitively, the routermay choose to allocate more workers to a question, even ifeasy, that has generated disagreement. In this section we de-scribe our algorithms for the adaptive setting. We believethat these are the first adaptive task allocation algorithms forcrowdsourcing to engage all available workers at each timestep.

Unlike in the offline setting, the objective of an optimal al-gorithm for the online setting is to compute an adaptive wayof constructing assignment, not any fixed assignment per se.Formally, this problem can be seen as a Partially ObservableMarkov Decision Process (POMDP) with a single (hidden)state representing the set of answers to all questions q ∈ Q.In each round t, the available actions correspond to possibleallocations St ∈ 2Q×Pt of tasks to all workers in the poolPt ⊆ W s.t. each worker gets assigned only one task, theobservations are worker responses XSt , and the total plan-ning horizon is T . The optimal solution to this POMDP is apolicy π∗ that satisfies, for each round t and the set of obser-vations ∪t−1i=1xi received in all rounds up to t,

π∗(

t⋃i=1

xi) = argmaxπE

[U(

H⋃j=t

{Sj ∼ π(t−1⋃i=1

xi,

j−1⋃i=t

Xi))}

]

Existing methods for solving POMDPs apply almost ex-clusively to linear additive utility functions U and are in-tractable even in those cases, because the number of actions(allocations) in each round is exponential. Besides, sinceworkers can come and go, the POMDP’s action set keepschanging, and standard POMDP approaches do not handlethis complication.

Page 5: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

Algorithm 2 The JUGGLERAD algorithmInput: WorkersW , prior P (a) over answers, unobservedvotes XR, horizon Tx← ∅for t = 1 to T do

St ← call JUGGLEROFF(T = 1, S = ∅)Observe xSt and set x← x ∪ xStXR ← XR \ St

Update prior as P (a | x)end for

Instead, we use a replanning approach, JUGGLERAD(Algorithm 2), that uses the information gain utility UIG(Equation 2). JUGGLERAD allocates tasks by maximiz-ing this utility in each round separately given the observa-tions so far, modifies its beliefs based on worker responses,and repeats the process for subsequent rounds. In this case,the allocation for a single round is simply the offline prob-lem from the previous section with a horizon of 1. Thus,JUGGLERAD uses JUGGLEROFF as a subcomponent ateach allocation step.

Although we can prove theoretical guarantees for theoffline problem (and therefore for the problem of makingan assignment in each round), it is more difficult to provideguarantees for the T -horizon adaptive allocation. A naturalanalysis approach is to use adaptive submodularity (Golovinand Krause 2011), which generalizes performance guaran-tees of the greedy algorithm for submodular optimizationto adaptive planning. Intuitively, a function f is adaptivesubmodular with respect to probability distribution p if theconditional expected marginal benefit of any fixed itemdoes not increase as more items are selected and their statesare observed. Unfortunately, we have the following negativeresult.

Theorem 4. The utility function UIG(S) based on the valueof information is not adaptive submodular even when ques-tion difficulties and worker abilities are known.

Proof. In order for U(S) to be adaptive submodular, theconditional expected marginal benefit of a worker responseshould never decrease upon making more observations. As acounterexample, suppose we have one binary question withdifficulty d = 0.5 and two workers with skills γ1 = 1and γ2 → ∞. If the prior probability of the answer isP (A = True) = 0.5, the expected information gainH(A) − H(A | X2) of asking for a vote from the secondworker is initially one bit, since she will always give thecorrect answer. However, if we observe a vote from the firstworker before asking for a vote from the second worker, theexpected information gain for the second worker will be lessthan one bit since the posterior probability P (A = True)has changed.

While we are unable to prove theoretical bounds on thequality of the adaptive policy in the absence of adaptivesubmodularity, our experiments in the next section show

that JUGGLERAD uses dramatically less human labor thancommonly-used baseline algorithms.

ScalabilityAssigning at Most One Worker at a Time to a Task Asan adaptive scheduler, JUGGLERAD must perform alloca-tions quickly for all available workers, so that workers donot abandon the system due to high latency. Computing theincremental value of assigning a worker to a question in agiven round is normally very fast, but can become compu-tationally intensive if the system has already assigned manyother workers to that question in that round (and has not yetobserved their responses). Such a computation requires tak-ing an expectation over a combinatorial number of possibleoutcomes for this set of votes.

This computational challenge may lead one to wonderwhether it is beneficial to restrict the router to assigning eachquestion to at most one worker in every round. With this re-striction in place, finding a question-to-worker assignmentin a given round amounts to constructing a maximum-weightmatching in the bipartite graph of questions and availableworkers (where edge weights give the expected utility gainfrom the corresponding assignment). This problem lookssimpler than what we have discussed above; in fact, theo-retically, it is solvable optimally in polynomial time. More-over, the aforementioned restriction should be moot in prac-tice, because large crowdsourcing systems typically havemany more tasks to complete than available workers (i.e.,|W| � |Q|). Intuitively, in such scenarios a scheduling al-gorithm has some flexibility with regard to the exact roundin which it schedules any particular task, and hence is rarelyforced to allocate a question to several workers in a round.

Experiments presented later in the paper confirm the in-tuition that the restriction of one worker per question realis-tically has little impact on the solution quality. We find thatassigning multiple workers to a task in a round was ben-eficial only when the fraction of workers to tasks was atleast 1/4. Nonetheless, we have discovered that optimallycomputing maximum matching is still very intensive com-putationally. Fortunately, our tests also suggest a cheaperalternative — greedily constructing an approximation of amaximum matching. Doing so (1) does not hurt the solutionquality of JUGGLERAD, implying that our greedy algo-rithm finds a solution that is close to optimal, and (2) despitebeing in the same theoretical complexity class as the algo-rithm for finding an exact optimal matching, is much fasterin practice, making it more appropriate for the large-scalesetting. For these reasons, we do not discuss the exact max-imum matching approach further, concentrating instead onthe greedy approximate strategy.

Binning Questions for Faster Performance When scal-ing to large numbers of workers and questions, even a greedystrategy that never assigns more than one worker to a task ata time can be prohibitively expensive, as it requires mak-ing O(|Q|) utility computations per worker in each round.However, since the utility of an assignment is a function ofthe current belief about the true answer, as well as of thequestion difficulty and worker skill, we can reduce the com-

Page 6: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

Algorithm 3 The JUGGLERBIN algorithmInput: WorkersW , prior P (a) over answers, unobservedvotes XR, horizon Tfunction LOAD(Qin)

for w inW dofor q in Qin do

if xq,w /∈ x thenb← TO BIN(dq, P (aq | x))bins[w][b].add(q)

end ifend for

end forend function

# Initialize binsbins← {}for w inW do

bins[w]← {}for b in ENUMERATE BINS() do

bins[w][b]← ∅q′ ← temp question with d, P (a) for bin∆Xb,w ← H(Aq′)−H(Aq′ | Xq′,w)

end forend forLOAD(Q)

# Make assignmentsx← ∅for t = 1 to T do

S ← ∅for w in sorted(W, key = γw) do

for b in sorted(bins[w], key = ∆Xb,w) doif bins[w][b] 6= ∅ then

q ← bins[w][b].pop()S ← S ∪ {Xq,w}for w′ inW \ {w} do

bins[w′][b].remove(q)end forBreak and move to next w

end ifend for

end forObserve xS and set x← x ∪ xS

XR ← XR \ SUpdate prior as P (a | x)LOAD({q for xq,w in xS)}

end for

plexity by partitioning assignments into bins.JUGGLERBIN (Algorithm 3) is an approximate version

of JUGGLERAD, which makes uses of binning in orderto reduce the cost of making an assignment to O(|W|2 +|W|C2), where C is user-specified parameter for the num-ber of partitions. JUGGLERBIN uses C2 bins representingthe cross product of C question difficulty partitions and Cbelief partitions (evenly-spaced on the interval from 0 to 1).During initialization, the system sorts bins from highest to

Diff

icul

ty

Current belief 0

1

1 0

1

1 Current belief

0.5 0.5

(a) Low-skilled worker (b) High-skilled worker

0.00

0.32

0.64

Figure 2: Visualization of the IG-based utility of askingquestions of (a) a low-skilled worker (γ = 0.71) and (b)a high-skilled worker (γ = 5.0), as a function of ques-tion difficulty and current belief. Darker color indicateshigher expected utility. (Bin-size corresponds to runningJUGGLERBIN with C = 21.)

lowest utility for each worker (computed using the meanquestion difficulty and belief for the bin, assuming a uni-form distribution of questions). For each worker, the sys-tem maintains a hash from each bin to a set of unansweredquestions whose difficulty and current belief fall within thebin boundaries — and which are not currently being an-swered by another worker. The system makes an assignmentby popping a question from the nonempty bin with high-est utility. JUGGLERBIN approximates the performanceof JUGGLERAD arbitrarily closely as the parameter C in-creases, and can scale independently of the total number ofquestions since it does not need to consider all questions ineach round.

Note that the order in which JUGGLERBIN visits binsis not the same for all workers, and depends on a particularworker’s skill. Figure 2 shows, for instance, that it may bemore valuable to ask a high-skilled worker a difficult ques-tion we know little about (belief close to 0.5) than an easyquestion whose answer is quite certain (belief close to 0or 1), but same statement might not hold for a low-skilledworker.

Variable Worker Response TimesUp to this point, we have assumed that each worker com-pletes his assignment in each round. In practice, however,workers take different amounts of time to respond to ques-tions, and a task router must either force some workers towait until others finish or make new assignments to subsetsof workers as they finish. JUGGLERRT described in Algo-rithm 4, performs the latter service by maintaining a recordof outstanding assignments in order to route the most usefulnew tasks to available workers.

ExperimentsWe seek to answer the following questions: (1) How welldoes our adaptive approach perform in practice? We answerthis question by asking oDesk workers to perform a chal-lenging NLP task called named entity linking (NEL) (Minand Grishman 2012). (2) How much benefit do we derive

Page 7: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

Algorithm 4 The JUGGLERRT algorithmInput: WorkersW , prior P (a) over answers, unobservedvotes XR, horizon Tx← ∅Sbusy ← ∅for t = 1 to T do

St ← call JUGGLEROFF(T = 1, S = Sbusy)Snew ← St \ Sbusy and assign Snew

Sdone ← retrieve completed assignmentsSbusy ← (Sbusy ∪ Snew) \ Sdone

Observe xSdone and set x← x ∪ xSdoneXR ← XR \ Snew

Update prior as P (a | x)end for

from an adaptive approach, compared to our offline alloca-tion algorithm? We answer this question by looking at av-erage performance over many experiments with simulateddata. (3) How much does our adaptive approach benefitfrom variations in worker skill or problem difficulty? Weanswer this question through simulations using different dis-tributions for skill and difficulty. (4) Finally, how well doesour approach scale to many workers and tasks? We answerthis question by first evaluating our strategy of limiting oneworker per task per round, and then evaluating how speedgains from binning impact the quality of results.

BenchmarksTo evaluate the performance of our adaptive strategy, wecompare its implementation in the information gain-basedJUGGLERAD to a series of increasingly powerful alterna-tives, some of which are also variants of JUGGLER:

• Random (Ra). The random strategy simply assigns eachworker a random question in each round.

• Round robin (RR). The round robin strategy orders thequestions by the number of votes they got so far, and ineach round iteratively assigns each worker a random ques-tion with the fewest votes.

• Uncertainty-based (Unc). The uncertainty-based policyorders workers from the least to most skilled 1 and in eachround iteratively matches the least skilled yet-unassignedworker to the unassigned question with the highest la-bel uncertainty, measured by the entropy of the poste-rior distribution over the labels already received for thatquestion. Note that this strategy differs from our imple-mentation of the information gain-based JUGGLERADin just two ways: (a) it never assigns two workers to thesame question in the same round and (b) it ignores ques-tion difficulty when performing the assignment, therebycomputing a difficulty-oblivious version of informationgain. Thus, the uncertainty-based strategy can be viewedas a simpler version of JUGGLERAD and is expected toperform similarly to JUGGLERAD if all questions haveroughly the same difficulty.1For consistency with other approaches (we did not observe a

significant ordering effect).

• Accuracy gain-based (AG). This policy is identical tothe information gain-based JUGGLERAD, but in everyround greedily optimizes expected accuracy gain instead,i.e., seeks to find a single-round question-to-worker as-signment S that maximizes

E

∑q∈Q

maxaq

(P (aq | XS), 1− P (aq | XS))

.Unlike information gain, accuracy gain is not submodu-lar, so the worker allocation procedure that maximizes itgreedily does not have a constant error bound even withina single allocation round. Nonetheless, intuitively it is apowerful heuristic that provides a reasonable alternativeto information gain in the JUGGLER framework.

In the next subsections, we denote the information gain-based JUGGLERAD as IG for ease of comparison with theother benchmarks. While the random and round robin strate-gies a-priori look weaker than IG, the relative qualitativestrength of the other two is not immediately clear and pro-vides important insights into IG’s practical operation.

In the experiments that follow, we initially force allbenchmarks and IG itself to assign questions that have notyet been assigned, until each question has been asked once,before proceeding normally. This ensures that all predictionsare founded on at least one observation and provides empir-ical benefits.

Comparison of Adaptive StrategiesIn order to measure performance with real workers andtasks, we selected a binary named entity linking task (Lin,Mausam, and Weld 2012; Ling and Weld 2012). Since wewished to compare different policies on the same data, wecontrolled for variations in worker performance by comingup with a set of 198 questions and hiring 12 oDesk workers,each of whom completed the entire set. This ensured repro-ducibility. Our different policies could now request that anyworker perform any problem as they would on a real citizenscience platform. To avoid confounding factors, we random-ized the order in which workers answered questions, as wellas the order in which possible answers were shown on thescreen.

In order to estimate worker skill and problem difficulty,we computed a maximum likelihood estimate of these quan-tities by running gradient descent using gold answers. Notethat in general, these parameters can be estimated with littledata by modeling worker communities and task features, asdiscussed previously.

Recall that in each round, each of our workers is as-signed one of the 198 questions. After each round, for eachpolicy we calculate the most likely answers to each ques-tion by running the same EM procedure on the worker re-sponses. From the predictions, we can calculate the accuracyachieved by each policy after each round.

Figure 3 compares the performance of our adaptive poli-cies, using the observed votes from the live experiment.Since simulating policy runs using all 12 workers would

Page 8: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

result in a deterministic procedure for IG and AG, we av-erage the performance of many simulations that use 8 ran-domly chosen workers to produce a statistically significantresult. One way to compare the performance of our poli-cies is to compute the relative labor savings compared tothe round robin policy. First, we compute a desired accuracyas a fraction of the maximum accuracy obtained by askingeach worker to answer each question, and then compute thefraction of votes saved by running an adaptive policy com-pared to the round robin policy. Using this metric, all adap-tive approaches achieve 95% of the total possible accuracyusing fewer than 50% of the votes required by round robinto achieve that same accuracy.

In order to tease apart some of the relative differencesbetween the adaptive policies, we also generated syntheticdata for the 198 questions and 12 workers by randomly sam-pling worker quality and question difficulty using the param-eters we estimated from the live experiment. Figure 4 showsthe results of this experiment. IG significantly outperformsAG, which in turn significantly outperforms UNC, in termsof fraction of votes saved compared to round robin in or-der to achieve 95% or 97% of the total possible accuracy(p < 0.0001 using two-tailed paired t-tests). These resultsdemonstrate three important points:

• The largest savings are provided by JUGGLERAD (IG),followed by AG and UNC.

• All three adaptive algorithms significantly outperform thenon-adaptive baselines.

• Although the information gain-based JUGGLERAD“wins” overall, the other modifications of JUGGLERAD(AG and UNC) perform very well too, despite providingno theoretical guarantees.

We also compared these policies in the context ofJUGGLERRT, as shown in Figure 5. In order to simulateworker response times, we fit log-normal distributions to theobserved response times and sampled from those distribu-tions. Although the relative benefits are smaller, the rankingof policies remains the same as in the pure round-based set-ting, demonstrating the promise of our approach to general-ize to the more realistic setting of variable worker responsetimes.

Figure 6 summarizes the relative savings of our policiesin each of these three scenarios outlined above—fixed votes,simulated votes, and simulated votes with variable responsetimes. Since there is a clear advantage to IG across scenar-ios, in the following sections we conduct additional simula-tions to further characterize its performance and answer ourremaining empirical research questions.

Benefit of AdaptivityIn order to determine the benefit of adaptivity, we generatedsynthetic data for 50 questions and 12 workers by randomlysampling worker quality and question difficulty.

Figure 7 shows the relative labor savings ofJUGGLERAD and JUGGLEROFF compared to theround robin policy. The desired accuracy is again computedas a fraction of the maximum accuracy obtained by asking

0.5  0.55  0.6  

0.65  0.7  

0.75  0.8  

0.85  0.9  

0.95  1  

0   500   1000   1500   2000   2500  

Pred

ic'o

n  accuracy  

Number  of  votes  

Ra  RR  Unc  AG  IG  

Figure 3: In live experiments, JUGGLERAD (IG) reaches95% of the maximum attainable accuracy with only 48%of the labor employed by the commonly used round robinpolicy (RR).

0.5  0.55  0.6  

0.65  0.7  

0.75  0.8  

0.85  0.9  

0.95  1  

0   500   1000   1500   2000   2500  

Pred

ic'o

n  accuracy  

Number  of  votes  

Ra  RR  Unc  AG  IG  

Figure 4: In experiments with simulated data using workerskills and question difficulties estimated from the live exper-iment, JUGGLERAD (IG) outperforms the accuracy gain(AG) and uncertainty-based (UNC) policies.

0.5  0.55  0.6  

0.65  0.7  

0.75  0.8  

0.85  0.9  

0.95  1  

0   500   1000   1500   2000   2500  

Pred

ic'o

n  accuracy  

Number  of  votes  

Ra  RR  Unc  AG  IG  

Figure 5: In live experiments with variable worker responsetimes, JUGGLERRT (IG) also outperforms the other poli-cies, despite the need to make assignments in real time asworkers complete tasks.

Page 9: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

IG   AG   Unc  

Frac%o

n  of  votes  sa

ved  

compa

red  to  ro

und  robin  

Policy  

Real  votes  

Simulated  votes  (real  parameters)  

Variable  response  Cmes  

Figure 6: Summary of the fraction of savings of IG, AG,and UNC compared to the round robin policy for reachingan accuracy of 95% of the total achievable accuracy (fromasking each worker each question).

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

95%   97%  

Frac%o

n  of  votes  sa

ved  

compa

red  to  ro

und  robin  

Desired  accuracy  

Juggler_OFF  

Juggler_AD  

Figure 7: The “adaptivity gap.” JUGGLERAD saves signifi-cantly more human labor than JUGGLEROFF by observingand responding to responses adaptively.

each worker to answer each question. Our simulationdraws question difficulties uniformly and inverse workerskills 1/γ ∼ N (0.79, 0.29), a realistic distribution that fitsestimates from the live experiment. JUGGLERAD savessignificantly more labor than JUGGLEROFF underscoringthe value of adaptive routing compared to offline.

Varying Worker Skills and Task DifficultiesWe now investigate how various parameter distributions af-fect adaptive performance. Again, we generated syntheticdata for 50 questions and 12 workers. Figure 8 shows thatthe relative savings of JUGGLERAD over round robin in-creases as the pool of workers becomes more diverse. Wecontrol the experiment by again sampling difficulties uni-formly and using our earlier worker skill distribution, butvarying the standard deviation from σ = 0 to σ = 0.2. Likebefore, accuracies are computed as a fraction of the totalachievable accuracy given a set of workers. Our algorithmsperform better for larger values of σ because they are ableto exploit differences between workers to make the best as-signments.

0  

0.1  

0.2  

0.3  

0.4  

0.5  

0.6  

0   0.1   0.2  Frac%o

n  of  votes  sa

ved  

compa

red  to  ro

und  robin  

Spread  of  worker  skill  distribu%on  (𝞂)  

95%  

97%  

Figure 8: The benefit provided by JUGGLERAD increasesas the pool of workers becomes more diverse.

Increasing the diversity of question difficulties whilekeeping the skill distribution fixed produces similar savings.For example, sampling difficulties from a Beta distributionwith a single peak produces relative savings over roundrobin that are significantly less than sampling uniformly orfrom a bimodal distribution.

Scaling to Many Tasks and WorkersIn order to empirically validate the speed and allocation ef-fectiveness of our proposed scaling strategies, we experi-mented with data simulating a larger number of workers andquestions (100 and 2000, respectively) using the same ques-tion difficulty and worker skill settings as in the earlier adap-tivity experiment. Without imposing a limit on the numberof simultaneously-operating workers per task, it is infeasi-ble to run JUGGLERAD on problems of this size. We firstchecked to see if there was a drop in question-answeringperformance caused by limiting the number of workers as-signed to a given task in each round. We found no statisti-cally significant difference between limiting the number ofworkers per task to 1, 2, or 3 (at the 0.05-significance levelusing one-tailed paired t-tests to measure the fraction ofvotes saved to reach various thresholds). This result matchesour experience running smaller experiments.

Restricting the number of workers per task to a smallnumber reduces JUGGLERAD’s scheduling time to 10–15seconds2 per round (scheduling 100 workers), but this is stillsignificantly longer than the fraction-of-a-second schedul-ing times for our earlier experiments with 12 workers and198 questions. JUGGLERBIN, by contrast, is able to sched-ule workers in fewer than 3 seconds for any number of binswe tried (202, 402, 802, and 1602). Indeed, JUGGLERBINmakes assignments using O(|W|2) computations, which isdrastically better than O(|W||Q|) for the unbinned versionwhen there are many questions. Although our runtime anal-ysis for JUGGLERBIN includes an additional term on theorder of |W|C2, in practice this cost is much smaller sincewe do not need to traverse all bins in order to find a valuable

2All reported runtimes use an unoptimized implementation on asingle machine with two 2.66 GHz processors and 32 GB of RAM.

Page 10: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

0.5  

0.55  

0.6  

0.65  

0.7  

0.75  

0.8  

0.85  

0.9  

0.95  

1  

0   10000   20000   30000   40000   50000  

Pred

ic'o

n  accuracy  

Number  of  votes  

IG  (unbinned)  IG  (C=160)  IG  (C=80)  IG  (C=40)  IG  (C=20)  

Figure 9: Increasing the parameter C governing the num-ber of bins enables JUGGLERBIN to closely approximateJUGGLEROFF.

question to assign to a worker.Finally, the performance of JUGGLERBIN approaches

that of the unbinned algorithms as we increase the value ofthe parameter C, since smaller bins more closely approxi-mate the true utility values. Figure 9 shows the results of thisexperiment. JUGGLERBIN asymptotes at a slightly loweraccuracy than the unbinned approach, suggesting that moresophisticated approaches like adaptive binning may producefurther gains.

Related WorkPrevious work has considered task routing in a number ofrestricted settings. Some approaches (Karger, Oh, and Shah2011a; 2011b; 2013) assume worker error is independent ofproblem difficulty and hence gain no benefit from adaptiveallocation of tasks.

Most adaptive approaches, on the other hand, assign prob-lems to one worker at a time, not recognizing the costof leaving a worker idle (Donmez, Carbonell, and Schnei-der 2009; Yan et al. 2011; Wauthier and Jordan 2011;Chen, Lin, and Zhou 2013). Other adaptive approaches as-sume that a requester can immediately evaluate the qual-ity of a worker’s response (Ho and Vaughan 2012; Tran-Thanh et al. 2012), or wait for a worker to complete allher tasks before moving on to the next worker (Ho, Jab-bari, and Vaughan 2013). Some, unlike ours, also assumethat any given question can be assigned to just one worker(Tran-Thanh et al. 2012). Moreover, the adaptive approachthat best models real workers optimizes annotation accuracyby simply seeking to discover the best workers and use themexclusively (Chen, Lin, and Zhou 2013).

The Generalized Task Markets (GTM) framework (Sha-haf and Horvitz 2010) has much in common with our prob-lem; they seek to find a coalition of workers whose multi-attribute skills meet the requirements of a job. However, theGTM approach assumes a binary utility model (tasks arecompleted or not) – it does not reason about answer accu-racy. Kamar et al. (2012) also focus on citizen science, butdon’t perform task routing.

Conclusions and Future WorkIn citizen science and other types of volunteer crowdsourc-ing, it is important to send hard tasks to experts while stillutilizing novices as they become proficient. Since workersare impatient, a task router should allocate questions to allavailable workers in parallel in order to keep workers en-gaged. Unfortunately, choosing the optimal set of tasks forworkers is challenging, even if worker skill and problem dif-ficulty are known.

This paper introduces the JOCR framework, characteriz-ing the space of task routing problems based on adaptivity,concurrency and amount of information known. We provethat even offline task routing is NP-hard, but submodularityof the objective function (maximizing expected value of in-formation) enables reasonable approximations. We presentJUGGLEROFF and JUGGLERAD, two systems that per-form parallel task allocation for the offline and adaptive set-tings respectively. Using live workers hired on oDesk, weshow that our best routing algorithm uses just 48% of the la-bor required by the commonly used round-robin policy ona natural language named-entity linking task. Further, wepresent JUGGLERBIN, a system that can scale to manyworkers and questions, and JUGGLERRT, a system thathandles variable worker response times and yields similarempirical savings.

Much remains to be done. While we have been motivatedby volunteer labor from a citizen science scenario, we be-lieve our methods extend naturally to paid crowdsourcingwhere the utility function includes a linear cost component.In the future we hope to relax the assumptions of perfect in-formation about workers and tasks, using techniques frommulti-armed bandits to learn these parameters during therouting process. We also wish to study the case where sev-eral different requesters are using the same platform and therouting algorithm needs to balance workers across problemtypes.

AcknowledgementsWe thank Christopher Lin and Adish Singla for helpful dis-cussions, as well as the anonymous reviewers for insightfulcomments. This work was supported by the WRF / TJ CableProfessorship, Office of Naval Research grant N00014-12-1-0211, and National Science Foundation grants IIS-1016713,IIS-1016465, and IIS-1420667.

Page 11: Parallel Task Routing for Crowdsourcingmausam/papers/hcomp14b.pdf · 2014-08-06 · Parallel Task Routing for Crowdsourcing Jonathan Bragg University of Washington (CSE) Seattle,

ReferencesBadanidiyuru, A., and Vondrak, J. 2014. Fast algorithms for max-imizing submodular functions. In Proceedings of SODA ’14.Chen, X.; Lin, Q.; and Zhou, D. 2013. Optimistic knowledge gradi-ent policy for optimal budget allocation in crowdsourcing. In Pro-ceedings of the 30th International Conference on Machine Learn-ing (ICML ’13).Dai, P.; Mausam; and Weld, D. S. 2010. Decision-theoretic controlof crowd-sourced workflows. In Proceedings of the Twenty-FourthAAAI Conference on Artificial Intelligence (AAAI ’10).Donmez, P.; Carbonell, J. G.; and Schneider, J. 2009. Efficientlylearning the accuracy of labeling sources for selective sampling. InKDD.Filmus, Y., and Ward, J. 2012. A tight combinatorial algorithmfor submodular maximization subject to a matroid constraint. In53rd Annual IEEE Symposium on Foundations of Computer Sci-ence (FOCS ’12).Golovin, D., and Krause, A. 2011. Adaptive submodularity: The-ory and applications in active learning and stochastic optimization.Journal of Artificial Intelligence Research (2005):1–53.Ho, C.-J., and Vaughan, J. W. 2012. Online task assignment incrowdsourcing markets. In AAAI.Ho, C.-J.; Jabbari, S.; and Vaughan, J. W. 2013. Adaptive taskassignment for crowdsourced classification. In Proceedings of the30th International Conference on Machine Learning (ICML ’13),534–542.Kamar, E.; Hacker, S.; and Horvitz, E. 2012. Combining humanand machine intelligence in large-scale crowdsourcing. In Pro-ceedings of the Eleventh International Conference on AutonomousAgents and Multiagent Systems (AAMAS ’12).Karger, D. R.; Oh, S.; and Shah, D. 2011a. Budget-optimal crowd-sourcing using low-rank matrix approximations. In Conference onCommunication, Control, and Computing.Karger, D. R.; Oh, S.; and Shah, D. 2011b. Iterative learning forreliable crowd-sourcing systems. In NIPS.Karger, D. R.; Oh, S.; and Shah, D. 2013. Efficient crowdsourc-ing for multi-class labeling. In Proceedings of the ACM SIGMET-RICS/International Conference on Measurement and Modeling ofComputer Systems, 81–92.Kolobov, A.; Mausam; and Weld, D. S. 2013. Joint crowdsourcingof multiple tasks. In Proceedings of the First AAAI Conference onHuman Computation and Crowdsourcing (HCOMP ’13).Krause, A., and Golovin, D. 2014. Submodular function maxi-mization. In Tractability: Practical Approaches to Hard Problems.Cambridge University Press. To appear.Krause, A., and Guestrin, C. 2005. Near-optimal Nonmyopic Valueof Information in Graphical Models. In Conference on Uncertaintyin Artificial Intelligence (UAI).Law, E., and von Ahn, L. 2011. Human Computation. SynthesisLectures on Artificial Intelligence and Machine Learning. Morgan& Claypool Publishers.Lin, C.; Mausam; and Weld, D. S. 2012. Dynamically switch-ing between synergistic workflows for crowdsourcing. In Proceed-ings of the Twenty-Sixth AAAI Conference on Artificial Intelligence(AAAI ’12).Ling, X., and Weld, D. S. 2012. Fine-grained entity resolution.In Proceedings of the Twenty-Sixth AAAI Conference on ArtificialIntelligence (AAAI ’12).Lintott, C.; Schawinski, K.; Slosar, A.; Land, K.; Bamford, S.;Thomas, D.; Raddick, M. J.; Nichol, R. C.; Szalay, A.; Andreescu,

D.; Murray, P.; and VandenBerg, J. 2008. Galaxy zoo : Morpholo-gies derived from visual inspection of galaxies from the sloan dig-ital sky survey. MNRAS 389(3):1179–1189.Mao, A.; Kamar, E.; Chen, Y.; Horvitz, E.; Schwamb, M. E.; Lin-tott, C. J.; and Smith, A. M. 2013. Volunteering versus work forpay: Incentives and tradeoffs in crowdsourcing. In Proceedings ofthe First AAAI Conference on Human Computation and Crowd-sourcing (HCOMP ’13).Min, B., and Grishman, R. 2012. Challenges in the knowledgebase population slot filling task. In LREC, 1137–1142.Nemhauser, G. L.; Wolsey, L. A.; and Fisher, M. L. 1978. An anal-ysis of approximations for maximizing submodular set functions-II. Mathematical Programming Study 8:73–87.Raghunathan, K.; Lee, H.; Rangarajan, S.; Chambers, N.; Sur-deanu, M.; Jurafsky, D.; and Manning, C. 2010. A multi-pass sievefor coreference resolution. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Processing, EMNLP’10, 492–501. Stroudsburg, PA, USA: Association for Computa-tional Linguistics.Shahaf, D., and Horvitz, E. 2010. Generalized task markets forhuman and machine computation. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI ’10).Stoyanov, V.; Gilbert, N.; Cardie, C.; and Riloff, E. 2009. Co-nundrums in noun phrase coreference resolution: Making senseof the state-of-the-art. In Su, K.-Y.; Su, J.; and Wiebe, J., eds.,ACL/IJCNLP, 656–664. The Association for Computer Linguis-tics.Tran-Thanh, L.; Stein, S.; Rogers, A.; and Jennings, N. R. 2012.Efficient crowdsourcing of unknown experts using multi-armedbandits. In ECAI, 768–773.Venanzi, M.; Guiver, J.; Kazai, G.; Kohli, P.; and Shokouhi, M.2014. Community-based bayesian aggregation models for crowd-sourcing. In Proceedings of the World Wide Web Conference.Vondrak, J. 2008. Optimal approximation for the submodular wel-fare problem in the value oracle model. In Proceedings of the 40thAnnual ACM Symposium on Theory of Computing (STOC ’08).Wauthier, F., and Jordan, M. 2011. Bayesian bias mitigation forcrowdsourcing. In Advances in Neural Information Processing Sys-tems 24 (NIPS ’11). 1800–1808.Welinder, P.; Branson, S.; Belongie, S.; and Perona, P. 2010. Themultidimensional wisdom of crowds. In Advances in Neural Infor-mation Processing Systems 23 (NIPS ’10). 2424–2432.Whitehill, J.; Ruvolo, P.; Wu, T.; Bergsma, J.; and Movellan, J.2009. Whose vote should count more: Optimal integration of la-bels from labelers of unknown expertise. In Advances in NeuralInformation Processing Systems 22 (NIPS ’09). 2035–2043.Yan, Y.; Rosales, R.; Fung, G.; and Dy, J. G. 2011. Active learningfrom crowds. In ICML.