23
Estimating the Completion Time of Crowdsourced Tasks using Survival Analysis Jing Wang, New York University Siamak Faridani, University of California, Berkeley Panos Ipeirotis, New York Univesity 1

Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Embed Size (px)

DESCRIPTION

Paper presentation at the Crowdsourcing for Search and Data Mining workshop (http://ir.ischool.utexas.edu/csdm2011/proceedings.html), organized with the WSDM 2011 conference. Abstract: In order to seamlessly integrate a human computation com- ponent (e.g., Amazon Mechanical Turk) within a larger pro- duction system, we need to have some basic understanding of how long it takes to complete a task posted for comple- tion in a crowdsourcing platform. We present an analysis of the completion time of tasks posted on Amazon Mechanical Turk, based on a dataset containing 165,368 HIT groups, with a total of 6,701,406 HITs, from 9,436 requesters, posted over a period of 15 months. We model the completion time as a stochastic process and build a statistical method for predict- ing the expected time for task completion. We use a survival analysis model based on Cox proportional hazards regression. We present the preliminary results of our work, showing how time-independent variables of posted tasks (e.g., type of the task, price of the HIT, day posted, etc) a ect completion time. We consider this a rst step towards building a comprehensive optimization module that provides recommendations for pric- ing, posting time, in order to satisfy the constraints of the requester.

Citation preview

Page 1: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Estimating the Completion Time of Crowdsourced Tasks using Survival Analysis

Jing Wang, New York UniversitySiamak Faridani, University of California, Berkeley

Panos Ipeirotis, New York Univesity

1

Page 2: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Crowdsourcing: Pricing and Time to completion?

Many firms use crowdsourcing for a variety of tasksy g y

Still unclear how to pricepPrior results indicate that price does not affect quality(Mason and Watts, 2009)

…but it does affect completion time

U l h l it ill t k f t k t fi i hUnclear how long it will take for a task to finish

2

Page 3: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Data Set:  Mechanical Turk Tracker (http://www.mturk‐tracker.com)

Crawled Amazon Mechanical Turk hourly (now every min)y ( y )Captured full market state (content, position, and characteristics of all available HITs).

15 months of data (now >24 months)165,368 HIT groups6,701,406 HIT assignments from 9,436 requestersValue of the HITs: $529,259 [guesstimate ~10% of actual value]

Missing very short tasks (posted and disappeared in <1hr)

3

Do not observe HIT redundancy

Page 4: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Completion Times: Power‐laws

4

HIT completion time: Time_last_seen – Time_first_posted

Page 5: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Completion Times: Power‐laws and Censoring

Censoring Effects

Jumps/Outliers: Expiration

Different slope: Requesters taking down HITstaking down HITs

5

HIT completion time: Time_last_seen – Time_first_posted

Page 6: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Parameter estimation

Maximum Likelihood Estimation, controlling for censored data Power‐law parameter α~1.5 Power‐laws with α<2 do not have well‐defined mean value Sample average increases as sample size increases Sample average increases as sample size increases

6

Page 7: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Why Power‐laws?

Queuing theory model by (Cobham, 1954): If workers pick tasks from two priority queues, completion time follows power‐law with α=1.5

Chilton et al, HCOMP 2010: workers rank either by “most recently posted” or by “most HITs available”

Result Inherent unpredictability of completion timeResult: Inherent unpredictability of completion timeReal solution: Amazon should change the interface

But let’s see how other factors affect completion time

7

Page 8: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Survival Analysis

Examine and model the time it takes for events to occur In our case: Event = HIT gets completed

Survival function S(t):  Probability that tasks will last longer than t

Used stratified Cox Proportional Hazards Model

8

Page 9: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Covariates Examined

HIT Characteristics Monetary reward Monetary reward Number of HITs Length in characters HIT topic (based on Latent Dirichlet Allocation analysis)

Market Characteristics Day of the week (when HIT was first posted) Time of the day (when HIT was first posted)

Requester CharacteristicsRequester Characteristics Activities of requester until time of submission Existing lifetime of requester

9

Page 10: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Effect of Price: Mostly monotonic

h(t) = 1.035^price40% d f 10 i40% speedup for 10x price

Half‐life for $0.025 reward ~ 2 days H lf lif f $1 d 12 h

10

Half‐life for $1 reward ~ 12 hours

Page 11: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Covariates Examined

HIT Characteristics Monetary reward Monetary reward Number of HITs Length in characters HIT topic (based on Latent Dirichlet Allocation analysis)

Market Characteristics Day of the week (when HIT was first posted) Time of the day (when HIT was first posted)

Requester CharacteristicsRequester Characteristics Activities of requester until time of submission Existing lifetime of requester

11

Page 12: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Effect of #HITs: Monotonic, but sublinear

h(t) = 0.998^#HITs

10 HITs  2% slower than 1 HIT 100 HITs  19% slower than 1 HIT  1000 HITs 87% slower than 1 HIT

12

1000 HITs  87% slower than 1 HIT or, 1 group of 1000  7 times faster than 1000 sequential groups of 1

Page 13: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Covariates Examined

HIT Characteristics Monetary reward Monetary reward  Number of HITs Length in characters (increases lifetime) HIT topic (based on Latent Dirichlet Allocation analysis)

Market Characteristics Day of the week (when HIT was first posted) Time of the day (when HIT was first posted)

Requester CharacteristicsRequester Characteristics Activities of requester until time of submission Existing lifetime of requester

13

Page 14: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

HIT Topics

topic 1 : cw castingwords  podcast  transcribe  english  mp3  edit  confirm  snippet  grade

i 2 d ll i h i li i b i i i itopic 2:  data  collection  search  image  entry  listings  website  review  survey  opinion

topic 3:  categorization  product  video  page  smartsheet web  comment  website  opinion

topic 4:  easy  quick  survey  money  research  fast  simple  form  answers  link

topic 5:  question  answer  nanonano dinkle article  write  writing  review  blog  articles

topic 6:  writing  answer  article  question  opinion  short  advice  editing  rewriting  paul

topic 7:  transcribe  transcription  improve  retranscribe edit  answerly voicemail  answer

14

Page 15: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Effect of Topic: The CastingWords Effect

topic 1 : cw castingwords  podcast  transcribe  english  mp3  edit  confirm  snippet  gradetopic 2:  data  collection  search  image  entry  listings  website  review  survey  opiniontopic 3:  categorization  product  video  page  smartsheet web  comment  website  opiniontopic 4:  easy  quick  survey  money  research  fast  simple  form  answers  linktopic 5:  question  answer  nanonano dinkle article  write  writing  review  blog  articles

15

p q g gtopic 6:  writing  answer  article  question  opinion  short  advice  editing  rewriting  paultopic 7:  transcribe  transcription  improve  retranscribe edit  answerly voicemail  query  question  answer

Page 16: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Effect of Topic: Surveys=fast (even with redundancy!)

topic 1 : cw castingwords  podcast  transcribe  english  mp3  edit  confirm  snippet  gradetopic 2:  data  collection  search  image  entry  listings  website  review  survey  opiniontopic 3:  categorization  product  video  page  smartsheet web  comment  website  opiniontopic 4:  easy  quick  survey  money  research  fast  simple  form  answers  linktopic 5:  question  answer  nanonano dinkle article  write  writing  review  blog  articles

16

p q g gtopic 6:  writing  answer  article  question  opinion  short  advice  editing  rewriting  paultopic 7:  transcribe  transcription  improve  retranscribe edit  answerly voicemail  query  question  answer

Page 17: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Effect of Topic: Writing takes time

topic 1 : cw castingwords  podcast  transcribe  english  mp3  edit  confirm  snippet  gradetopic 2:  data  collection  search  image  entry  listings  website  review  survey  opiniontopic 3:  categorization  product  video  page  smartsheet web  comment  website  opiniontopic 4:  easy  quick  survey  money  research  fast  simple  form  answers  linktopic 5:  question  answer  nanonano dinkle article  write  writing  review  blog  articles

17

p q g gtopic 6:  writing  answer  article  question  opinion  short  advice  editing  rewriting  paultopic 7:  transcribe  transcription  improve  retranscribe edit  answerly voicemail  query  question  answer

Page 18: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Covariates Examined

HIT Characteristics Monetary reward Monetary reward  Number of HITs Length in characters (increases lifetime) HIT topic (based on Latent Dirichlet Allocation analysis)

Market Characteristics: Not affecting Day of the week (when HIT was first posted) Time of the day (when HIT was first posted)

Requester CharacteristicsRequester Characteristics Activities of requester until time of submission Existing lifetime of requester (1yr ~ 50% speedup)

18

Page 19: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Covariates Examined

HIT Characteristics Monetary reward Monetary reward  Number of HITs Length in characters (increases lifetime) HIT topic (based on Latent Dirichlet Allocation analysis)

Market Characteristics: Not affectingWhy? We look at long‐running HIT til l ti Day of the week (when HIT was first posted)

Time of the day (when HIT was first posted)

Requester Characteristics

HITs until completion…

Requester Characteristics Activities of requester until time of submission Existing lifetime of requester

19

Page 20: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Covariates Examined

HIT Characteristics Monetary reward Monetary reward  Number of HITs Length in characters (increases lifetime) HIT topic (based on Latent Dirichlet Allocation analysis)

Market Characteristics: Not affecting Day of the week (when HIT was first posted) Time of the day (when HIT was first posted)

Requester CharacteristicsRequester Characteristics Activities of requester until time of submission Existing lifetime of requester (1yr ~ 50% speedup)

20

Page 21: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Conclusions

Completion times for tasks in Amazon Mechanical Turk follow a heavy tail distribution. (Paper studying MicroTasks.com has similar conclusions.)

Sample averages cannot be used to predict the expected completion Sample averages cannot be used to predict the expected completiontime of a task.

B fi i C i l h d i d l h d By fitting a Cox proportional hazards regression model to the data collected from AMT, we showed the effect of various HIT parameters in the completion time of the task

“Base survival function” still a power‐law  Still difficult to predict

23

Page 22: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Lessons Learned and Future Work

Current survival analysis too naive: Ignores many interactions across variables Ignores many interactions across variables Need time‐dependent covariates (market changes over time) More frequent crawling does not change the results

Important: Analysis ignores “refilling” of HITs

TODO:TODO: Better to model directly the HIT assignment disappearance rate 

(how many #HITs done per minute)( y p ) Use queuing model theories  Use hierarchical version of LDA and dynamic models (#topics and 

hift i t i ti )

24

shifts in topics over time)

Page 23: Estimating Completion Time for Crowdsourced Tasks Using Survival Analysis Models

Any Questions?Any Questions?