10
Research Article Labelling Training Samples Using Crowdsourcing Annotation for Recommendation Qingren Wang , 1 Min Zhang, 1 Tao Tao , 2 and Victor S. Sheng 3 1 Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei 230601, China 2 School of Computer Science and Technology, Anhui University of Technology, Ma’anshan 243002, China 3 Department of Computer Science, Texas Tech University, Lubbock 79409, USA Correspondence should be addressed to Tao Tao; [email protected] Received 15 January 2020; Revised 5 March 2020; Accepted 10 March 2020; Published 5 May 2020 Guest Editor: Xuyun Zhang Copyright©2020QingrenWangetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. e supervised learning-based recommendation models, whose infrastructures are sufficient training samples with high quality, have been widely applied in many domains. In the era of big data with the explosive growth of data volume, training samples should be labelled timely and accurately to guarantee the excellent recommendation performance of supervised learning-based models.Machineannotationcannotcompletethetasksoflabellingtrainingsampleswithhighqualitybecauseoflimitedmachine intelligence. Although expert annotation can achieve a high accuracy, it requires a long time as well as more resources. As a new way of human intelligence to participate in machine computing, crowdsourcing annotation makes up for shortages of machine annotation and expert annotation. erefore, in this paper, we utilize crowdsourcing annotation to label training samples. First, a suitable crowdsourcing mechanism is designed to create crowdsourcing annotation-based tasks for training sample labelling, and then two entropy-based ground truth inference algorithms (i.e., HILED and HILI) are proposed to achieve quality improvement ofnoiselabelsprovidedbythecrowd.Inaddition,thedescendingandrandomordermannersincrowdsourcingannotation-based tasks are also explored. e experimental results demonstrate that crowdsourcing annotation significantly improves the per- formanceofmachineannotation.Amongthegroundtruthinferencealgorithms,bothHILEDandHILIimprovetheperformance of baselines; meanwhile, HILED performs better than HILI. 1. Introduction tRecommendation systems have increasingly attracted at- tention, since they can significantly alleviate the problem of information overload on the Internet and help people find items of interest or make better decisions in their daily life. Among the recommendation models, the supervised learning-based ones have been widely applied in many domains, such as cloud/edge computing [1], complex sys- tems [2, 3], and Quality of Service (QoS) prediction [4, 5]. It is no doubt that sufficient training samples with high quality guarantee the excellent recommendation performance of supervised learning-based recommendation systems. us, it is necessary to study how to timely and accurately label sufficient training samples in the era of big data with the explosive growth of data volume. Although machine an- notation can label enough training samples timely, they do not meet the requirement of high quality because of limited machineintelligence.So,itisnaturaltothinkofutilizingthe intelligence of human beings. Indeed, expert annotation (i.e., hiring domain experts to label training samples) can achieve a high accuracy. How- ever, it requires a long time as well as more resources. Research studies [6, 7] demonstrated that crowdsourcing brings machine learning (and its related research fields) great opportunities because crowdsourcing can easily access the crowd via public or personal platforms [8, 9], such as MTurk [10], and efficiently deal with intelligent and com- puter-hard tasks by employing thousands of workers at a relatively low price. erefore, as a new way of human Hindawi Complexity Volume 2020, Article ID 1670483, 10 pages https://doi.org/10.1155/2020/1670483

LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

Research ArticleLabelling Training Samples Using CrowdsourcingAnnotation for Recommendation

Qingren Wang 1 Min Zhang1 Tao Tao 2 and Victor S Sheng3

1Key Laboratory of Intelligent Computing and Signal Processing of Ministry of EducationSchool of Computer Science and Technology Anhui University Hefei 230601 China2School of Computer Science and Technology Anhui University of Technology Marsquoanshan 243002 China3Department of Computer Science Texas Tech University Lubbock 79409 USA

Correspondence should be addressed to Tao Tao taotaoahuteducn

Received 15 January 2020 Revised 5 March 2020 Accepted 10 March 2020 Published 5 May 2020

Guest Editor Xuyun Zhang

Copyright copy 2020QingrenWang et al(is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

(e supervised learning-based recommendation models whose infrastructures are sufficient training samples with high qualityhave been widely applied in many domains In the era of big data with the explosive growth of data volume training samplesshould be labelled timely and accurately to guarantee the excellent recommendation performance of supervised learning-basedmodels Machine annotation cannot complete the tasks of labelling training samples with high quality because of limited machineintelligence Although expert annotation can achieve a high accuracy it requires a long time as well as more resources As a newway of human intelligence to participate in machine computing crowdsourcing annotation makes up for shortages of machineannotation and expert annotation (erefore in this paper we utilize crowdsourcing annotation to label training samples First asuitable crowdsourcing mechanism is designed to create crowdsourcing annotation-based tasks for training sample labelling andthen two entropy-based ground truth inference algorithms (ie HILED and HILI) are proposed to achieve quality improvementof noise labels provided by the crowd In addition the descending and random order manners in crowdsourcing annotation-basedtasks are also explored (e experimental results demonstrate that crowdsourcing annotation significantly improves the per-formance of machine annotation Among the ground truth inference algorithms both HILED andHILI improve the performanceof baselines meanwhile HILED performs better than HILI

1 Introduction

tRecommendation systems have increasingly attracted at-tention since they can significantly alleviate the problem ofinformation overload on the Internet and help people finditems of interest or make better decisions in their daily lifeAmong the recommendation models the supervisedlearning-based ones have been widely applied in manydomains such as cloudedge computing [1] complex sys-tems [2 3] and Quality of Service (QoS) prediction [4 5] Itis no doubt that sufficient training samples with high qualityguarantee the excellent recommendation performance ofsupervised learning-based recommendation systems (usit is necessary to study how to timely and accurately labelsufficient training samples in the era of big data with the

explosive growth of data volume Although machine an-notation can label enough training samples timely they donot meet the requirement of high quality because of limitedmachine intelligence So it is natural to think of utilizing theintelligence of human beings

Indeed expert annotation (ie hiring domain experts tolabel training samples) can achieve a high accuracy How-ever it requires a long time as well as more resourcesResearch studies [6 7] demonstrated that crowdsourcingbrings machine learning (and its related research fields)great opportunities because crowdsourcing can easily accessthe crowd via public or personal platforms [8 9] such asMTurk [10] and efficiently deal with intelligent and com-puter-hard tasks by employing thousands of workers at arelatively low price (erefore as a new way of human

HindawiComplexityVolume 2020 Article ID 1670483 10 pageshttpsdoiorg10115520201670483

intelligence to participate in machine computing crowd-sourcing annotation makes up for the shortages of machineannotation and expert annotation Crowdsourcing anno-tation has five steps (a) the requesters select a public orpersonal crowdsourcing platform and design crowdsourcingannotation tasks including price setting time constraintsand required responding number of each annotation task(b) (e requesters publish crowdsourcing annotation taskson the selected crowdsourcing platform (c) (e crowdlogged in the platform (also known as workers) selects tasksthat are suitable for themselves and complete tasks (ieproviding labels) Note that the requester does not know anyinformation (such as expertise and credit standing) of theworkers completing annotation tasks in this step (d) (erequesters download the labels provided by workers and fewadditional information of workers (ie the completing timesand the number of accepted tasks) from the crowdsourcingplatform (e) (e requesters utilize existing ground truthinference algorithms or propose novel one(s) to infer truthvalue(s) from all labels provided by workers In this paperwe focus on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation sinceextracting keyphrases from a text (especially a short text) is acomplex process that requires abundant auxiliary infor-mation such as background of entities discussed and theevents involved Machine annotation and expert annotationcannot effectively handle keyphrase extraction because oftheir shortages For convenience our entire approach isdenoted as Crowdsourced Keyphrase Extraction (CKE)hereafter meanwhile a single task of crowdsourcing an-notation generated by CKE is named L-HIT

Extracting keyphrases from training samples in CKEincludes labelling and ranking operations and each singleL-HITcontains three task types [9 11]multiple-choice fill-in-blank and rating (e first two are used to collect properkeyphrases for a training sample and the last one is usedfor importance ranking assignment of the proper key-phrases collected (is is different from binary labellingand most of multiclass labelling tasks which usually haveone single type Besides there are three important prob-lems (ie quality control cost control and latency con-trol) which are also required to be balanced in CKE [9]Quality control focuses on labelling and ranking high-quality keyphrases cost control aims to reduce the costs interms of labour and money while keeping high-qualityground truth and latency control studies how to cut downcycle of a single task [11] We utilize four ways to handletrade-off among the three problems stated above in CKE

In this paper a pruning-based technique [9] is first adoptedto prune the candidates provided by a machine-based algo-rithm meanwhile a complementary option is added to sup-plement the proper keyphrases that are lost because of variousreasons (e pruning-based technique and the complementaryoption can efficiently reduce labour cost and time cost (enfor each single L-HIT there is a time constraint set since timeconstraints can significantly reduce the latency of a singleworker [11] (irdly each individual worker is asked to selectan importance ranking for each keyphrase labelled by himselfinstead of sorting them Finally in order to conquer the

possible low quality of some workers for keyphrase labellingand ranking the designed crowdsourcing mechanism allowsmultiple workers [6] to complete a single L-HIT (e maincontributions of this paper are summarized as follows

(1) A suitable crowdsourcing mechanism is designedto create crowdsourcing annotation-based tasksfor training sample labelling In addition fouroptimization methods (ie a pruning-basedtechnique a complementary option time con-straint set and repeated labelling) are used tobalance the quality the cost and the latencycontrols in CKE

(2) Two entropy-based inference algorithms (ieHILED and HILI) are proposed to infer the groundtruth based on labels collected by crowdsourcingannotation In addition two different order mannersin L-HITs which are the descending one and ran-dom one are also explored

(3) We conduct multiple experiments on MTurk toverify the performance improvement of crowd-sourcing annotation (e experimental resultsdemonstrate that crowdsourcing annotation per-forms well Among the inference algorithms bothHILED and HILI improved the performance of thebaselines

(e remainder of the paper is organized as followsSection 2 will introduce the details of CKE Section 3 willreport the experimental results the related works will bediscussed in Section 4 and then we will reach a conclusion inSection 5

2 Crowdsourced Keyphrase Extraction

In this section we will first introduce the compositions of asingle L-HIT and then we will present the two proposedinference algorithms

21 A Single L-HIT Our multiple experiments are con-ducted on MTurk which is a welcome crowdsourcingmarketplace supporting crowdsourced execution of Hu-man Intelligence Tasks (HITs) [12] Since the structure of asingle task published by our experiments is essentiallyinherited from a single HIT supported by MTurk the onespublished by us are called Labelling Human IntelligenceTasks (L-HITs) A single L-HIT which corresponds to asingle training sample consists of five parts guidancecontent candidate option candidate supplement andsubmission As shown in Figure 1 the part of guidance(surrounded by a blue rectangle) helps workers completethe current task conveniently and efficiently (e part ofcontent (surrounded by a black rectangle) shows workersthe content of a single training sample (e part of sub-mission (surrounded by a blue ellipse) is utilized to submitthe completed L-HIT (ese three parts are basic elementsof the current task

2 Complexity

(1) Multiple-Choice When a worker has read the con-tent of the training sample heshe can directly selectthe proper option(s) from this part as the finalkeyphrase(s)

(2) Rating Once an option is selected as a final key-phrase the worker needs to select an importanceranking from the corresponding drop-down boxOur rating job is different from that in tasks ofpairwise comparison (or rating) that ask workers tocompare the selected items with each other [9] Itconverts a comparison operation into an assignmentone (at is workers do not need to consider otherselected options while assigning an importanceranking to a selected one based on their under-standing of the current training sample Such

conversion can reduce latency while obtaining anordered keyphrase list

(e part of candidate option (surrounded by a redrectangle) shows worker candidates (e candidates arekeyphrases labelled by machine annotation Note that thispart only holds 15 options at most If a training sample hasmore than 15 keyphrases labelled by machine annotationthis part only shows the top 15 ones with the highest scoresIn addition for each candidate there is an independentdrop-down box (providing importance rankings) above it(e importance ranking denotes how important the optionis to the current training sample It varies from minus2 to 2where 2 denotes the importance with the highest level andminus2 denotes the importance with the least level (e part ofcandidate option has two task types as follows

Fusion of qualitative bond graph and genetic algorithms A fault diagnosis applicationIn this paper the problem of fault diagnosis via integration of genetic algorithms (GAs) and qualitative bond graphs (QBGs) is addressed We suggest that GAs can be used tosearch for possible fault components among a system of qualitative equations e QBG is adopted as the modeling scheme to generate a set of qualitative equations e qualitative bond graph provides a unified approach for modeling engineering systems in particular mechatronic systems In order to

Title

demonstrate the performance of the proposed algorithm we have tested the proposed algorithm on an in-house designed and built floating disc experimental setupResults from fault diagnosis in the floating disc system are presented and discussed Additional measurements will be required to localize the fault when more thanone fault candidate is inferred Fault diagnosis is activated by a fault detection mechanism when a discrepancy between measured abnormal behaviorand predicted system behavior is observed e fault detection mechanism is not presented here

Text

THE DOCUMENT

Please select proper keyphrase(s)

Please select its ranking when its checked

fault diagnosi

Please select its ranking when its checked

qualit

Please select its ranking when its checked

fault

Please select its ranking when its checked

fault system

Please select its ranking when its checked

diagnosi

Please select its ranking when its checked

qualit bond graph

Please select its ranking when its checked

qualit bond

Please select its ranking when its checked

graph

Please select its ranking when its checked

qualit graph

Please select its ranking when its checked

garsquoqualit

Please select its ranking when its checked

float disc

Please select its ranking when its checked

bond

Please select its ranking when its checked

fault present

Please select its ranking when its checked

flault diagnosi system

Please select its ranking when its checked

qualit equat

Please provide additional proper keyphrase(s) from high to low according to their importance if it is necessary

1

4

7

10

13

2

5

8

11

14

3

6

9

12

15

Submit

Instructions (click to expand)

Step 1 Please read the following title and textStep 2 Please select proper keyphrase(s) from keyphrase candidates listed in the following table Please also rank the keyphases that you choose Note that all thekeyphrase candidates are represented using their corresponding stemsStep 3 (optional) Please provide additional proper keyphrase(s) from high to low according to their importance if it is necessary ese adding keyphrase(s) can berepresented using stem or word form(s)Step 4 Please submit the task if you have completed the above steps

(i)(ii)

(iii)

(iv)

Guidelines for selecting proper keyphrase(s) of the following document

Figure 1 (e main user interface of a single L-HIT

Complexity 3

Some proper keyphrases may not be listed in the part ofcandidate option because of various reasons for instancephrases with low appearing frequencies or ones with lowscores assigned by machine annotation (erefore for eachsingle L-HIT there is a candidate supplement part that letsworkers supplement lost keyphrases as well as the corre-sponding importance rankings (surrounded by a yellowrectangle) (e part of candidate supplement also has twotask types which are fill-in-blank (ie supplementing lostkeyphrase(s)) and rating (ie selecting importance rank-ings) respectively Note that supplementing the lost key-phrase(s) is an optional job for workers

22 Inference Algorithms In this paper inferring a truthkeyphrase list is still viewed as a process of first-integratinglast-grading phrases Although algorithms IMLK IMLK-Iand IMLK-ED [13] are suitable for inferring a truth key-phrase list from multiple lists of keyphrases they neglect tocalculate three inherent attributes of a keyphrase capturing atopic delivered by the training samples which are mean-ingfulness uncertainty and uselessness [14] Study [15]shows that calculating the information entropy [16] of akeyphrase is a significant way to measure these three in-herent attributes of a keyphrase (erefore we utilize theinformation entropy and corresponding equations in [15] tomeasure the three inherent properties of a keyphrase cap-turing a topic (e symbols used for ground truth inferencealgorithms are shown in Table 1

(e attribute meaningfulness of k in T denotes the krsquospositive probability of capturing a topic expressed by TNormally it is measured by the distribution of k as anindependent keyphrase since the more times kindie occursthe bigger positive probability the topic is delivered by k (eattribute meaningfulness is defined as follows

Ppos NKITN 0lt NKIlt TN

0 NKI 01113896 (1)

where Ppos 0 for the case that k does not exist in the corpusAs the name implies the attribute uncertainty of k in T

denotes the krsquos unsteadiness of capturing a topic expressedby T which is usually measured by the distribution of Tas asub-keyphrase A sub-keyphrase means it can be extendedinto another keyphrase with other words Note that (a)different keyphrases express a same point with differentexpression depth and (b) different keyphrases express to-tally different points For example although keyphraseldquotopic modelrdquo is a sub-keyphrase of ldquotopic aware propa-gation modelrdquo they express different points Intuitively themore times ksub occurs the more unsteady the topic isdelivered by k (e attribute uncertainty is defined asfollows

Psub NKSTN 0lt NKSlt TN

0 NKS 01113896 (2)

(e attribute uselessness of k in Tdenotes the krsquos negativeprobability of capturing a topic expressed by T which isdefined as follows

Pneg 1 minus Ppos minus Psub (3)In conclusion the information entropy of k can com-

pletely measure its three inherent attributes using equations(4) or (5) (when the situation Psub 0 occurs)

H(k) Pposlog1

Ppos1113888 1113889 + Psublog

1Psub

1113888 1113889 + Pneglog1

Pneg1113888 1113889

(4)

H(k) Pposlog1

Ppos1113888 1113889 + Pneglog

1Pneg

1113888 1113889 (5)

Finally by combining the information entropy algo-rithms HILED and HILI are proposed based on algorithmsIMLK-ED and IMLK-I stated in [13] respectively and thecorresponding equations recalculating the keyphrasesrsquogrades are modified as follows

Gij 1113936

mj1 QE D

j times RSij1113872 11138732

times H kij1113872 1113873

m

Gij 1113936

mj1 QI

j times RSij1113872 11138732

times H kij1113872 1113873

m

(6)

where H(kij) denotes the information entropy of the ith

keyphrase in the jth keyphrase list RSij denotes the im-portance scores provided by workers QED

j denotes thequality of a worker who provides the jth keyphrase list in thealgorithm HILED QI

j denotes the quality of a worker whoprovides the jth keyphrase list in the algorithm HILI and mdenotes the total number of keyphrase lists provided by aworker

3 Experiments and Discussion

In this section we will first introduce experiments withdifferent order manners which are the descending and therandom ones and then we will discuss the factors ofinfluencing performance improvement of crowdsourcingannotation

31 Crowdsourcing Experiment with Descending RankingSince IMLK IMLK-I and IMLK-ED proposed in [13] andKeyRank proposed in [15] perform very well we employed

Table 1 Symbols

No Symbols Presentation1 K A keyphrase2 T A training sample3 kindie An independent keyphrase4 Ppos (e attribute meaningfulness5 NKI (e number of kindie occurs6 ksub A sub-keyphrase7 Psub (e attribute uncertainty8 NKS (e number of ksub occurs9 TN (e total number of keyphrases in the corpus10 Pneg (e attribute uselessness11 H (k) (e information entropy of k

4 Complexity

them as baselines KeyRank is one of themachine annotationmethods and its performance is evaluated on datasetINSPEC [17] containing 2000 abstracts (1000 for training500 for development and 500 for testing) in [15] Con-sidering the cost and latency of workers we chose 100abstracts from the 500 test ones in dataset INSPEC whereKeyRank performs the best as the data for our multiplecrowdsourcing experiments In addition the gold standardsof these 100 test abstracts are treated as labelled ones fromexpert annotation As we said before each single abstractcorresponds to a single L-HIT (at is we have 100 cor-responding L-HITs (e part of candidate option in eachL-HIT lists 15 (or fewer) candidates with descendingranking (ese candidates are keyphrases labelled andweighted by KeyRank Again in order to overcome theshortage that the quality of an individual worker for key-phrase extraction is sometimes rather low we request 10responses for each L-HIT from 10 different workers (at isthe whole experiment has 1000 published L-HITs since eachone has to be published ten times on MTurk Each L-HITcosts 5 cents and the whole experiment costs 50 dollarstotally According to feedback from crowdsourcing platformMTurk more than four out of five workers completed theoptional ldquocandidate supplementrdquo tasks (e minimum timethat a single crowdsourcing task required is 50 seconds andthe maximum time is 5 minutes (e time required for mostof the crowdsourcing tasks was between 90 and 200 seconds

(e precision (P) recall (R) and F1 score are employedas performance metrics P R and F1 score are defined asfollows

P correctlabelled

R correctexpert

F1 2 times P timesR

(P + R)

(7)

where correct denotes the number of correct keyphrasesobtained from crowdsourcing annotation labelled denotesthe number of keyphrases obtained from crowdsourcingannotation and expert denotes the number of keyphrasesobtained from expert annotation Normally expert formost abstracts varies from 3 to 5 so that the value of la-belled in our experiment varies from 3 to 5

After 10 responses of each L-HIT are obtained from 10different workers algorithms IMLK IMLK-I IMLK-EDHILED and HILI are applied to infer a truth keyphrase listfrom these responses (e inferred results of IMLK IMLK-IIMLK-ED HILED and HILI are compared with those ofKeyRank in terms of P R and F1 score Besides in order toevaluate the performance of KeyRank IMLK IMLK-IIMLK-ED HILED and HILI clearly the comparisons aredivided into three different groups ie Group-3 Group-4and Group-5 For example Group-4 is named as such be-cause the number of labelled is 4 when it reports thecomparisons among KeyRank IMLK IMLK-I IMLK-EDHILED and HILI in terms of P R and F1 score respectively

In addition the relations between the workersrsquo numbers(denoted as WorkerNum) and the inferred results are alsoexplored by respectively conducting another seven com-parisons in all groups (e values of WorkerNum are set to3 4 5 6 7 8 and 9 respectively Since each abstract has 10keyphrase lists provided by 10 different workers respec-tively in order to get rid of the impact of workersrsquo ordereach algorithm on each abstract is run ten times under acertain WorkerNum and the corresponding number ofkeyphrase lists are randomly selected from its 10 keyphraselists at each time For example when the WorkerNum is 5we randomly select 5 keyphrase lists from the 10 keyphraselists All comparisons of all groups among KeyRank IMLKIMLK-I IMLK-ED HILED and HILI are shown in Figure 2

From Figure 2 we notice that IMLK IMLK-I andIMLK-ED significantly perform better than KeyRank in allgroups in terms of P R and F1 score We also notice thatboth HILED and HILI significantly perform better thanKeyRank IMLK IMLK-I and IMLK-ED in all groups interms of P R and F1 score Between HILED and HILIexcept the comparisons in Group-3 Group-4 and Group-5when the values of WorkerNum are 5 6 and 7 (the sit-uation of WorkerNum 7 only occurs in Group-3) interms of P R and F1 score HILED always performs betterthan HILI Moreover we notice that with the increment ofWorkerNum the performance of IMLK IMLK-I IMLK-ED HILI and HILED has a rising trend (erefore we canconclude that (1) both HILED and HILI perform better thanIMLK IMLK-I and IMLK-ED (2) HILED performs a littlebetter than HILI (3) WorkerNum does influence theinferred results and (4) employing crowdsourcing anno-tation is a feasible and effective way for training samplelabelling

32 Crowdsourcing Experiment with Random RankingFor each published L-HIT in the Crowdsourcing experimentwith Descending Ranking (denoted as CDR) in Section 31the 15 (or fewer) candidates listed in the part of candidateoption are ordered according to their scores assigned byKeyRank from high to low Is there any relevancy betweenthe order manners of the listed candidates and the im-provement performance of crowdsourcing annotation

In order to explore whether there is such a relevancybetween them we create another 100 L-HITs using theselected 100 representative abstracts mentioned in Section31 Meanwhile we also request 10 responses for each L-HITfrom 10 different workers For each L-HIT the 15 (or fewer)candidates are randomly listed in the part of candidateoptionWe named the experiments conducted in this sectionCrowdsourcing experiment with Random Ranking (denotedas CRR) To make a fair evaluation all experimental pa-rameters of CRR follow those of CDR All comparisonsamong KeyRank IMLK HILED and HILI in terms of P Rand F1 scores are shown in Figure 3

From Figure 3 we can see that IMLK HILED and HILIin CRR always significantly perform better than KeyRank interms of P R and F1 score It proves once again thatemploying crowdsourcing annotation is a feasible and

Complexity 5

effective way for training sample labelling However wenotice that the performance of IMLK HILED and HILI inCRR is worse than that of these algorithms in CDR whichproves that the order manners of the listed candidates doinfluence the improvement performance of crowdsourcingannotation and the descending order manner is more ef-fective than the random one

33 Discussion

3e Proper Number of Workers Either CDR or CRR showsus that with an increment of WorkerNum the improve-ment performance of crowdsourcing annotation has a risingtrend However more workers do not meanmore suitabilityOn the one hand more workers may result in more latencyFor instance workers may be distracted or tasks may not beappealing to enough workers On the other hand moreworkers mean more monetary cost since crowdsourcing

annotation is not free It is just a cheaper way to labelsufficient training samples timely Hence the trade-offamong quality latency and cost controls needs to beconsidered and balanced(e experimental results show thatthe proper number of workers varies from 6 to 8 because theimprovement performance of crowdsourcing annotation atthese stages is relatively stable and the quantity is appro-priate to avoid high latency and cost

3e Descending and Random Ranking Manners (e ex-perimental results demonstrate that the descending rankingmanner performs better than the random one (e reasonmay be that workers have limited patience since they are nottrained Normally workers just focus on the top 5 (or less 5)candidates listed in the part of candidate option If they donot find any proper one(s) from the top few candidates theymay lose patience to read the remaining ones so that theywould select randomly or supplement option(s) in the partof candidate supplement for completing the current L-HIT

30

40

50

60

70

80Pr

ecisi

on (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

2530354045505560

Reca

ll (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

30

40

50

60

70

F 1 v

alue

()

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(a)

Prec

ision

()

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

30

40

50

60

70

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(b)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

30

40

50

60

70

The number of workers

Prec

ision

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

40

50

60

70

80

The number of workers

Reca

ll (

)

3 4 5 6 7 8 930

40

50

60

70

The number of workers

F 1 v

alue

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(c)

Figure 2 Comparisons among KeyRank IMLK IMKL-I IMLK-ED HILED and HILI in all groups (a) Group-3 (b) Group-4 (c)Group-5

6 Complexity

However the random selected one(s) may not be properand the supplementary one(s) may be repeated with thecandidates listed in the part of candidate option (ereforethe loss of accuracy happens

4 Related Work

Recommendation models [18] have been widely applied inmany domains such as complex systems [19 20] Quality ofService (QoS) prediction [21 22] reliability detection forreal-time systems [23] social networks [24ndash26] and others[27ndash29] Among existing recommendation models the su-pervised learning-based ones have increasingly attractedattention because of effectiveness However it is well knownthat the supervised learning-based recommendation models

suffer from the quality of training samples (erefore la-belling sufficient training samples timely and accurately inthe era of big data becomes an important foundation to thesupervised learning-based recommendation Since this pa-per focuses on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation the relatedwork will be introduced in terms of keyphrase extraction andcrowdsourcing annotation

Most original works labelling keyphrases simply selectedsingle or contiguous words with a high frequency such asKEA [14] Yet these single or contiguous words do not al-ways deliver main points discussed in a text Study [30]demonstrated that semantic relations in context can helpextract high-quality keyphrases Hence some researchstudies employed knowledge bases and ontologies to obtain

50

60

70

80

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

35

40

45

50

55

60

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

40

45

50

55

60

65

70

F 1 v

alue

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(a)

Prec

ision

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(b)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9

45

50

55

60

65

The number of workers

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 950

55

60

65

70

75

The number of workers

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 945

50

55

60

65

70

The number of workers

F 1 va

lue (

)

(c)

Figure 3 Comparisons among KeyRank IMLK HILED andHILI in CRR and CDR IMLKrHILEDr and HILIr denote the performance ofalgorithms IMLK HILED and HILI in CRR IMLKd HILEDd andHILId denote the performance of algorithms IMLK HILED andHILI inCDR (a) Group-3 (b) Group-4 (c) Group-5

Complexity 7

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 2: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

intelligence to participate in machine computing crowd-sourcing annotation makes up for the shortages of machineannotation and expert annotation Crowdsourcing anno-tation has five steps (a) the requesters select a public orpersonal crowdsourcing platform and design crowdsourcingannotation tasks including price setting time constraintsand required responding number of each annotation task(b) (e requesters publish crowdsourcing annotation taskson the selected crowdsourcing platform (c) (e crowdlogged in the platform (also known as workers) selects tasksthat are suitable for themselves and complete tasks (ieproviding labels) Note that the requester does not know anyinformation (such as expertise and credit standing) of theworkers completing annotation tasks in this step (d) (erequesters download the labels provided by workers and fewadditional information of workers (ie the completing timesand the number of accepted tasks) from the crowdsourcingplatform (e) (e requesters utilize existing ground truthinference algorithms or propose novel one(s) to infer truthvalue(s) from all labels provided by workers In this paperwe focus on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation sinceextracting keyphrases from a text (especially a short text) is acomplex process that requires abundant auxiliary infor-mation such as background of entities discussed and theevents involved Machine annotation and expert annotationcannot effectively handle keyphrase extraction because oftheir shortages For convenience our entire approach isdenoted as Crowdsourced Keyphrase Extraction (CKE)hereafter meanwhile a single task of crowdsourcing an-notation generated by CKE is named L-HIT

Extracting keyphrases from training samples in CKEincludes labelling and ranking operations and each singleL-HITcontains three task types [9 11]multiple-choice fill-in-blank and rating (e first two are used to collect properkeyphrases for a training sample and the last one is usedfor importance ranking assignment of the proper key-phrases collected (is is different from binary labellingand most of multiclass labelling tasks which usually haveone single type Besides there are three important prob-lems (ie quality control cost control and latency con-trol) which are also required to be balanced in CKE [9]Quality control focuses on labelling and ranking high-quality keyphrases cost control aims to reduce the costs interms of labour and money while keeping high-qualityground truth and latency control studies how to cut downcycle of a single task [11] We utilize four ways to handletrade-off among the three problems stated above in CKE

In this paper a pruning-based technique [9] is first adoptedto prune the candidates provided by a machine-based algo-rithm meanwhile a complementary option is added to sup-plement the proper keyphrases that are lost because of variousreasons (e pruning-based technique and the complementaryoption can efficiently reduce labour cost and time cost (enfor each single L-HIT there is a time constraint set since timeconstraints can significantly reduce the latency of a singleworker [11] (irdly each individual worker is asked to selectan importance ranking for each keyphrase labelled by himselfinstead of sorting them Finally in order to conquer the

possible low quality of some workers for keyphrase labellingand ranking the designed crowdsourcing mechanism allowsmultiple workers [6] to complete a single L-HIT (e maincontributions of this paper are summarized as follows

(1) A suitable crowdsourcing mechanism is designedto create crowdsourcing annotation-based tasksfor training sample labelling In addition fouroptimization methods (ie a pruning-basedtechnique a complementary option time con-straint set and repeated labelling) are used tobalance the quality the cost and the latencycontrols in CKE

(2) Two entropy-based inference algorithms (ieHILED and HILI) are proposed to infer the groundtruth based on labels collected by crowdsourcingannotation In addition two different order mannersin L-HITs which are the descending one and ran-dom one are also explored

(3) We conduct multiple experiments on MTurk toverify the performance improvement of crowd-sourcing annotation (e experimental resultsdemonstrate that crowdsourcing annotation per-forms well Among the inference algorithms bothHILED and HILI improved the performance of thebaselines

(e remainder of the paper is organized as followsSection 2 will introduce the details of CKE Section 3 willreport the experimental results the related works will bediscussed in Section 4 and then we will reach a conclusion inSection 5

2 Crowdsourced Keyphrase Extraction

In this section we will first introduce the compositions of asingle L-HIT and then we will present the two proposedinference algorithms

21 A Single L-HIT Our multiple experiments are con-ducted on MTurk which is a welcome crowdsourcingmarketplace supporting crowdsourced execution of Hu-man Intelligence Tasks (HITs) [12] Since the structure of asingle task published by our experiments is essentiallyinherited from a single HIT supported by MTurk the onespublished by us are called Labelling Human IntelligenceTasks (L-HITs) A single L-HIT which corresponds to asingle training sample consists of five parts guidancecontent candidate option candidate supplement andsubmission As shown in Figure 1 the part of guidance(surrounded by a blue rectangle) helps workers completethe current task conveniently and efficiently (e part ofcontent (surrounded by a black rectangle) shows workersthe content of a single training sample (e part of sub-mission (surrounded by a blue ellipse) is utilized to submitthe completed L-HIT (ese three parts are basic elementsof the current task

2 Complexity

(1) Multiple-Choice When a worker has read the con-tent of the training sample heshe can directly selectthe proper option(s) from this part as the finalkeyphrase(s)

(2) Rating Once an option is selected as a final key-phrase the worker needs to select an importanceranking from the corresponding drop-down boxOur rating job is different from that in tasks ofpairwise comparison (or rating) that ask workers tocompare the selected items with each other [9] Itconverts a comparison operation into an assignmentone (at is workers do not need to consider otherselected options while assigning an importanceranking to a selected one based on their under-standing of the current training sample Such

conversion can reduce latency while obtaining anordered keyphrase list

(e part of candidate option (surrounded by a redrectangle) shows worker candidates (e candidates arekeyphrases labelled by machine annotation Note that thispart only holds 15 options at most If a training sample hasmore than 15 keyphrases labelled by machine annotationthis part only shows the top 15 ones with the highest scoresIn addition for each candidate there is an independentdrop-down box (providing importance rankings) above it(e importance ranking denotes how important the optionis to the current training sample It varies from minus2 to 2where 2 denotes the importance with the highest level andminus2 denotes the importance with the least level (e part ofcandidate option has two task types as follows

Fusion of qualitative bond graph and genetic algorithms A fault diagnosis applicationIn this paper the problem of fault diagnosis via integration of genetic algorithms (GAs) and qualitative bond graphs (QBGs) is addressed We suggest that GAs can be used tosearch for possible fault components among a system of qualitative equations e QBG is adopted as the modeling scheme to generate a set of qualitative equations e qualitative bond graph provides a unified approach for modeling engineering systems in particular mechatronic systems In order to

Title

demonstrate the performance of the proposed algorithm we have tested the proposed algorithm on an in-house designed and built floating disc experimental setupResults from fault diagnosis in the floating disc system are presented and discussed Additional measurements will be required to localize the fault when more thanone fault candidate is inferred Fault diagnosis is activated by a fault detection mechanism when a discrepancy between measured abnormal behaviorand predicted system behavior is observed e fault detection mechanism is not presented here

Text

THE DOCUMENT

Please select proper keyphrase(s)

Please select its ranking when its checked

fault diagnosi

Please select its ranking when its checked

qualit

Please select its ranking when its checked

fault

Please select its ranking when its checked

fault system

Please select its ranking when its checked

diagnosi

Please select its ranking when its checked

qualit bond graph

Please select its ranking when its checked

qualit bond

Please select its ranking when its checked

graph

Please select its ranking when its checked

qualit graph

Please select its ranking when its checked

garsquoqualit

Please select its ranking when its checked

float disc

Please select its ranking when its checked

bond

Please select its ranking when its checked

fault present

Please select its ranking when its checked

flault diagnosi system

Please select its ranking when its checked

qualit equat

Please provide additional proper keyphrase(s) from high to low according to their importance if it is necessary

1

4

7

10

13

2

5

8

11

14

3

6

9

12

15

Submit

Instructions (click to expand)

Step 1 Please read the following title and textStep 2 Please select proper keyphrase(s) from keyphrase candidates listed in the following table Please also rank the keyphases that you choose Note that all thekeyphrase candidates are represented using their corresponding stemsStep 3 (optional) Please provide additional proper keyphrase(s) from high to low according to their importance if it is necessary ese adding keyphrase(s) can berepresented using stem or word form(s)Step 4 Please submit the task if you have completed the above steps

(i)(ii)

(iii)

(iv)

Guidelines for selecting proper keyphrase(s) of the following document

Figure 1 (e main user interface of a single L-HIT

Complexity 3

Some proper keyphrases may not be listed in the part ofcandidate option because of various reasons for instancephrases with low appearing frequencies or ones with lowscores assigned by machine annotation (erefore for eachsingle L-HIT there is a candidate supplement part that letsworkers supplement lost keyphrases as well as the corre-sponding importance rankings (surrounded by a yellowrectangle) (e part of candidate supplement also has twotask types which are fill-in-blank (ie supplementing lostkeyphrase(s)) and rating (ie selecting importance rank-ings) respectively Note that supplementing the lost key-phrase(s) is an optional job for workers

22 Inference Algorithms In this paper inferring a truthkeyphrase list is still viewed as a process of first-integratinglast-grading phrases Although algorithms IMLK IMLK-Iand IMLK-ED [13] are suitable for inferring a truth key-phrase list from multiple lists of keyphrases they neglect tocalculate three inherent attributes of a keyphrase capturing atopic delivered by the training samples which are mean-ingfulness uncertainty and uselessness [14] Study [15]shows that calculating the information entropy [16] of akeyphrase is a significant way to measure these three in-herent attributes of a keyphrase (erefore we utilize theinformation entropy and corresponding equations in [15] tomeasure the three inherent properties of a keyphrase cap-turing a topic (e symbols used for ground truth inferencealgorithms are shown in Table 1

(e attribute meaningfulness of k in T denotes the krsquospositive probability of capturing a topic expressed by TNormally it is measured by the distribution of k as anindependent keyphrase since the more times kindie occursthe bigger positive probability the topic is delivered by k (eattribute meaningfulness is defined as follows

Ppos NKITN 0lt NKIlt TN

0 NKI 01113896 (1)

where Ppos 0 for the case that k does not exist in the corpusAs the name implies the attribute uncertainty of k in T

denotes the krsquos unsteadiness of capturing a topic expressedby T which is usually measured by the distribution of Tas asub-keyphrase A sub-keyphrase means it can be extendedinto another keyphrase with other words Note that (a)different keyphrases express a same point with differentexpression depth and (b) different keyphrases express to-tally different points For example although keyphraseldquotopic modelrdquo is a sub-keyphrase of ldquotopic aware propa-gation modelrdquo they express different points Intuitively themore times ksub occurs the more unsteady the topic isdelivered by k (e attribute uncertainty is defined asfollows

Psub NKSTN 0lt NKSlt TN

0 NKS 01113896 (2)

(e attribute uselessness of k in Tdenotes the krsquos negativeprobability of capturing a topic expressed by T which isdefined as follows

Pneg 1 minus Ppos minus Psub (3)In conclusion the information entropy of k can com-

pletely measure its three inherent attributes using equations(4) or (5) (when the situation Psub 0 occurs)

H(k) Pposlog1

Ppos1113888 1113889 + Psublog

1Psub

1113888 1113889 + Pneglog1

Pneg1113888 1113889

(4)

H(k) Pposlog1

Ppos1113888 1113889 + Pneglog

1Pneg

1113888 1113889 (5)

Finally by combining the information entropy algo-rithms HILED and HILI are proposed based on algorithmsIMLK-ED and IMLK-I stated in [13] respectively and thecorresponding equations recalculating the keyphrasesrsquogrades are modified as follows

Gij 1113936

mj1 QE D

j times RSij1113872 11138732

times H kij1113872 1113873

m

Gij 1113936

mj1 QI

j times RSij1113872 11138732

times H kij1113872 1113873

m

(6)

where H(kij) denotes the information entropy of the ith

keyphrase in the jth keyphrase list RSij denotes the im-portance scores provided by workers QED

j denotes thequality of a worker who provides the jth keyphrase list in thealgorithm HILED QI

j denotes the quality of a worker whoprovides the jth keyphrase list in the algorithm HILI and mdenotes the total number of keyphrase lists provided by aworker

3 Experiments and Discussion

In this section we will first introduce experiments withdifferent order manners which are the descending and therandom ones and then we will discuss the factors ofinfluencing performance improvement of crowdsourcingannotation

31 Crowdsourcing Experiment with Descending RankingSince IMLK IMLK-I and IMLK-ED proposed in [13] andKeyRank proposed in [15] perform very well we employed

Table 1 Symbols

No Symbols Presentation1 K A keyphrase2 T A training sample3 kindie An independent keyphrase4 Ppos (e attribute meaningfulness5 NKI (e number of kindie occurs6 ksub A sub-keyphrase7 Psub (e attribute uncertainty8 NKS (e number of ksub occurs9 TN (e total number of keyphrases in the corpus10 Pneg (e attribute uselessness11 H (k) (e information entropy of k

4 Complexity

them as baselines KeyRank is one of themachine annotationmethods and its performance is evaluated on datasetINSPEC [17] containing 2000 abstracts (1000 for training500 for development and 500 for testing) in [15] Con-sidering the cost and latency of workers we chose 100abstracts from the 500 test ones in dataset INSPEC whereKeyRank performs the best as the data for our multiplecrowdsourcing experiments In addition the gold standardsof these 100 test abstracts are treated as labelled ones fromexpert annotation As we said before each single abstractcorresponds to a single L-HIT (at is we have 100 cor-responding L-HITs (e part of candidate option in eachL-HIT lists 15 (or fewer) candidates with descendingranking (ese candidates are keyphrases labelled andweighted by KeyRank Again in order to overcome theshortage that the quality of an individual worker for key-phrase extraction is sometimes rather low we request 10responses for each L-HIT from 10 different workers (at isthe whole experiment has 1000 published L-HITs since eachone has to be published ten times on MTurk Each L-HITcosts 5 cents and the whole experiment costs 50 dollarstotally According to feedback from crowdsourcing platformMTurk more than four out of five workers completed theoptional ldquocandidate supplementrdquo tasks (e minimum timethat a single crowdsourcing task required is 50 seconds andthe maximum time is 5 minutes (e time required for mostof the crowdsourcing tasks was between 90 and 200 seconds

(e precision (P) recall (R) and F1 score are employedas performance metrics P R and F1 score are defined asfollows

P correctlabelled

R correctexpert

F1 2 times P timesR

(P + R)

(7)

where correct denotes the number of correct keyphrasesobtained from crowdsourcing annotation labelled denotesthe number of keyphrases obtained from crowdsourcingannotation and expert denotes the number of keyphrasesobtained from expert annotation Normally expert formost abstracts varies from 3 to 5 so that the value of la-belled in our experiment varies from 3 to 5

After 10 responses of each L-HIT are obtained from 10different workers algorithms IMLK IMLK-I IMLK-EDHILED and HILI are applied to infer a truth keyphrase listfrom these responses (e inferred results of IMLK IMLK-IIMLK-ED HILED and HILI are compared with those ofKeyRank in terms of P R and F1 score Besides in order toevaluate the performance of KeyRank IMLK IMLK-IIMLK-ED HILED and HILI clearly the comparisons aredivided into three different groups ie Group-3 Group-4and Group-5 For example Group-4 is named as such be-cause the number of labelled is 4 when it reports thecomparisons among KeyRank IMLK IMLK-I IMLK-EDHILED and HILI in terms of P R and F1 score respectively

In addition the relations between the workersrsquo numbers(denoted as WorkerNum) and the inferred results are alsoexplored by respectively conducting another seven com-parisons in all groups (e values of WorkerNum are set to3 4 5 6 7 8 and 9 respectively Since each abstract has 10keyphrase lists provided by 10 different workers respec-tively in order to get rid of the impact of workersrsquo ordereach algorithm on each abstract is run ten times under acertain WorkerNum and the corresponding number ofkeyphrase lists are randomly selected from its 10 keyphraselists at each time For example when the WorkerNum is 5we randomly select 5 keyphrase lists from the 10 keyphraselists All comparisons of all groups among KeyRank IMLKIMLK-I IMLK-ED HILED and HILI are shown in Figure 2

From Figure 2 we notice that IMLK IMLK-I andIMLK-ED significantly perform better than KeyRank in allgroups in terms of P R and F1 score We also notice thatboth HILED and HILI significantly perform better thanKeyRank IMLK IMLK-I and IMLK-ED in all groups interms of P R and F1 score Between HILED and HILIexcept the comparisons in Group-3 Group-4 and Group-5when the values of WorkerNum are 5 6 and 7 (the sit-uation of WorkerNum 7 only occurs in Group-3) interms of P R and F1 score HILED always performs betterthan HILI Moreover we notice that with the increment ofWorkerNum the performance of IMLK IMLK-I IMLK-ED HILI and HILED has a rising trend (erefore we canconclude that (1) both HILED and HILI perform better thanIMLK IMLK-I and IMLK-ED (2) HILED performs a littlebetter than HILI (3) WorkerNum does influence theinferred results and (4) employing crowdsourcing anno-tation is a feasible and effective way for training samplelabelling

32 Crowdsourcing Experiment with Random RankingFor each published L-HIT in the Crowdsourcing experimentwith Descending Ranking (denoted as CDR) in Section 31the 15 (or fewer) candidates listed in the part of candidateoption are ordered according to their scores assigned byKeyRank from high to low Is there any relevancy betweenthe order manners of the listed candidates and the im-provement performance of crowdsourcing annotation

In order to explore whether there is such a relevancybetween them we create another 100 L-HITs using theselected 100 representative abstracts mentioned in Section31 Meanwhile we also request 10 responses for each L-HITfrom 10 different workers For each L-HIT the 15 (or fewer)candidates are randomly listed in the part of candidateoptionWe named the experiments conducted in this sectionCrowdsourcing experiment with Random Ranking (denotedas CRR) To make a fair evaluation all experimental pa-rameters of CRR follow those of CDR All comparisonsamong KeyRank IMLK HILED and HILI in terms of P Rand F1 scores are shown in Figure 3

From Figure 3 we can see that IMLK HILED and HILIin CRR always significantly perform better than KeyRank interms of P R and F1 score It proves once again thatemploying crowdsourcing annotation is a feasible and

Complexity 5

effective way for training sample labelling However wenotice that the performance of IMLK HILED and HILI inCRR is worse than that of these algorithms in CDR whichproves that the order manners of the listed candidates doinfluence the improvement performance of crowdsourcingannotation and the descending order manner is more ef-fective than the random one

33 Discussion

3e Proper Number of Workers Either CDR or CRR showsus that with an increment of WorkerNum the improve-ment performance of crowdsourcing annotation has a risingtrend However more workers do not meanmore suitabilityOn the one hand more workers may result in more latencyFor instance workers may be distracted or tasks may not beappealing to enough workers On the other hand moreworkers mean more monetary cost since crowdsourcing

annotation is not free It is just a cheaper way to labelsufficient training samples timely Hence the trade-offamong quality latency and cost controls needs to beconsidered and balanced(e experimental results show thatthe proper number of workers varies from 6 to 8 because theimprovement performance of crowdsourcing annotation atthese stages is relatively stable and the quantity is appro-priate to avoid high latency and cost

3e Descending and Random Ranking Manners (e ex-perimental results demonstrate that the descending rankingmanner performs better than the random one (e reasonmay be that workers have limited patience since they are nottrained Normally workers just focus on the top 5 (or less 5)candidates listed in the part of candidate option If they donot find any proper one(s) from the top few candidates theymay lose patience to read the remaining ones so that theywould select randomly or supplement option(s) in the partof candidate supplement for completing the current L-HIT

30

40

50

60

70

80Pr

ecisi

on (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

2530354045505560

Reca

ll (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

30

40

50

60

70

F 1 v

alue

()

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(a)

Prec

ision

()

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

30

40

50

60

70

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(b)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

30

40

50

60

70

The number of workers

Prec

ision

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

40

50

60

70

80

The number of workers

Reca

ll (

)

3 4 5 6 7 8 930

40

50

60

70

The number of workers

F 1 v

alue

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(c)

Figure 2 Comparisons among KeyRank IMLK IMKL-I IMLK-ED HILED and HILI in all groups (a) Group-3 (b) Group-4 (c)Group-5

6 Complexity

However the random selected one(s) may not be properand the supplementary one(s) may be repeated with thecandidates listed in the part of candidate option (ereforethe loss of accuracy happens

4 Related Work

Recommendation models [18] have been widely applied inmany domains such as complex systems [19 20] Quality ofService (QoS) prediction [21 22] reliability detection forreal-time systems [23] social networks [24ndash26] and others[27ndash29] Among existing recommendation models the su-pervised learning-based ones have increasingly attractedattention because of effectiveness However it is well knownthat the supervised learning-based recommendation models

suffer from the quality of training samples (erefore la-belling sufficient training samples timely and accurately inthe era of big data becomes an important foundation to thesupervised learning-based recommendation Since this pa-per focuses on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation the relatedwork will be introduced in terms of keyphrase extraction andcrowdsourcing annotation

Most original works labelling keyphrases simply selectedsingle or contiguous words with a high frequency such asKEA [14] Yet these single or contiguous words do not al-ways deliver main points discussed in a text Study [30]demonstrated that semantic relations in context can helpextract high-quality keyphrases Hence some researchstudies employed knowledge bases and ontologies to obtain

50

60

70

80

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

35

40

45

50

55

60

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

40

45

50

55

60

65

70

F 1 v

alue

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(a)

Prec

ision

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(b)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9

45

50

55

60

65

The number of workers

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 950

55

60

65

70

75

The number of workers

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 945

50

55

60

65

70

The number of workers

F 1 va

lue (

)

(c)

Figure 3 Comparisons among KeyRank IMLK HILED andHILI in CRR and CDR IMLKrHILEDr and HILIr denote the performance ofalgorithms IMLK HILED and HILI in CRR IMLKd HILEDd andHILId denote the performance of algorithms IMLK HILED andHILI inCDR (a) Group-3 (b) Group-4 (c) Group-5

Complexity 7

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 3: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

(1) Multiple-Choice When a worker has read the con-tent of the training sample heshe can directly selectthe proper option(s) from this part as the finalkeyphrase(s)

(2) Rating Once an option is selected as a final key-phrase the worker needs to select an importanceranking from the corresponding drop-down boxOur rating job is different from that in tasks ofpairwise comparison (or rating) that ask workers tocompare the selected items with each other [9] Itconverts a comparison operation into an assignmentone (at is workers do not need to consider otherselected options while assigning an importanceranking to a selected one based on their under-standing of the current training sample Such

conversion can reduce latency while obtaining anordered keyphrase list

(e part of candidate option (surrounded by a redrectangle) shows worker candidates (e candidates arekeyphrases labelled by machine annotation Note that thispart only holds 15 options at most If a training sample hasmore than 15 keyphrases labelled by machine annotationthis part only shows the top 15 ones with the highest scoresIn addition for each candidate there is an independentdrop-down box (providing importance rankings) above it(e importance ranking denotes how important the optionis to the current training sample It varies from minus2 to 2where 2 denotes the importance with the highest level andminus2 denotes the importance with the least level (e part ofcandidate option has two task types as follows

Fusion of qualitative bond graph and genetic algorithms A fault diagnosis applicationIn this paper the problem of fault diagnosis via integration of genetic algorithms (GAs) and qualitative bond graphs (QBGs) is addressed We suggest that GAs can be used tosearch for possible fault components among a system of qualitative equations e QBG is adopted as the modeling scheme to generate a set of qualitative equations e qualitative bond graph provides a unified approach for modeling engineering systems in particular mechatronic systems In order to

Title

demonstrate the performance of the proposed algorithm we have tested the proposed algorithm on an in-house designed and built floating disc experimental setupResults from fault diagnosis in the floating disc system are presented and discussed Additional measurements will be required to localize the fault when more thanone fault candidate is inferred Fault diagnosis is activated by a fault detection mechanism when a discrepancy between measured abnormal behaviorand predicted system behavior is observed e fault detection mechanism is not presented here

Text

THE DOCUMENT

Please select proper keyphrase(s)

Please select its ranking when its checked

fault diagnosi

Please select its ranking when its checked

qualit

Please select its ranking when its checked

fault

Please select its ranking when its checked

fault system

Please select its ranking when its checked

diagnosi

Please select its ranking when its checked

qualit bond graph

Please select its ranking when its checked

qualit bond

Please select its ranking when its checked

graph

Please select its ranking when its checked

qualit graph

Please select its ranking when its checked

garsquoqualit

Please select its ranking when its checked

float disc

Please select its ranking when its checked

bond

Please select its ranking when its checked

fault present

Please select its ranking when its checked

flault diagnosi system

Please select its ranking when its checked

qualit equat

Please provide additional proper keyphrase(s) from high to low according to their importance if it is necessary

1

4

7

10

13

2

5

8

11

14

3

6

9

12

15

Submit

Instructions (click to expand)

Step 1 Please read the following title and textStep 2 Please select proper keyphrase(s) from keyphrase candidates listed in the following table Please also rank the keyphases that you choose Note that all thekeyphrase candidates are represented using their corresponding stemsStep 3 (optional) Please provide additional proper keyphrase(s) from high to low according to their importance if it is necessary ese adding keyphrase(s) can berepresented using stem or word form(s)Step 4 Please submit the task if you have completed the above steps

(i)(ii)

(iii)

(iv)

Guidelines for selecting proper keyphrase(s) of the following document

Figure 1 (e main user interface of a single L-HIT

Complexity 3

Some proper keyphrases may not be listed in the part ofcandidate option because of various reasons for instancephrases with low appearing frequencies or ones with lowscores assigned by machine annotation (erefore for eachsingle L-HIT there is a candidate supplement part that letsworkers supplement lost keyphrases as well as the corre-sponding importance rankings (surrounded by a yellowrectangle) (e part of candidate supplement also has twotask types which are fill-in-blank (ie supplementing lostkeyphrase(s)) and rating (ie selecting importance rank-ings) respectively Note that supplementing the lost key-phrase(s) is an optional job for workers

22 Inference Algorithms In this paper inferring a truthkeyphrase list is still viewed as a process of first-integratinglast-grading phrases Although algorithms IMLK IMLK-Iand IMLK-ED [13] are suitable for inferring a truth key-phrase list from multiple lists of keyphrases they neglect tocalculate three inherent attributes of a keyphrase capturing atopic delivered by the training samples which are mean-ingfulness uncertainty and uselessness [14] Study [15]shows that calculating the information entropy [16] of akeyphrase is a significant way to measure these three in-herent attributes of a keyphrase (erefore we utilize theinformation entropy and corresponding equations in [15] tomeasure the three inherent properties of a keyphrase cap-turing a topic (e symbols used for ground truth inferencealgorithms are shown in Table 1

(e attribute meaningfulness of k in T denotes the krsquospositive probability of capturing a topic expressed by TNormally it is measured by the distribution of k as anindependent keyphrase since the more times kindie occursthe bigger positive probability the topic is delivered by k (eattribute meaningfulness is defined as follows

Ppos NKITN 0lt NKIlt TN

0 NKI 01113896 (1)

where Ppos 0 for the case that k does not exist in the corpusAs the name implies the attribute uncertainty of k in T

denotes the krsquos unsteadiness of capturing a topic expressedby T which is usually measured by the distribution of Tas asub-keyphrase A sub-keyphrase means it can be extendedinto another keyphrase with other words Note that (a)different keyphrases express a same point with differentexpression depth and (b) different keyphrases express to-tally different points For example although keyphraseldquotopic modelrdquo is a sub-keyphrase of ldquotopic aware propa-gation modelrdquo they express different points Intuitively themore times ksub occurs the more unsteady the topic isdelivered by k (e attribute uncertainty is defined asfollows

Psub NKSTN 0lt NKSlt TN

0 NKS 01113896 (2)

(e attribute uselessness of k in Tdenotes the krsquos negativeprobability of capturing a topic expressed by T which isdefined as follows

Pneg 1 minus Ppos minus Psub (3)In conclusion the information entropy of k can com-

pletely measure its three inherent attributes using equations(4) or (5) (when the situation Psub 0 occurs)

H(k) Pposlog1

Ppos1113888 1113889 + Psublog

1Psub

1113888 1113889 + Pneglog1

Pneg1113888 1113889

(4)

H(k) Pposlog1

Ppos1113888 1113889 + Pneglog

1Pneg

1113888 1113889 (5)

Finally by combining the information entropy algo-rithms HILED and HILI are proposed based on algorithmsIMLK-ED and IMLK-I stated in [13] respectively and thecorresponding equations recalculating the keyphrasesrsquogrades are modified as follows

Gij 1113936

mj1 QE D

j times RSij1113872 11138732

times H kij1113872 1113873

m

Gij 1113936

mj1 QI

j times RSij1113872 11138732

times H kij1113872 1113873

m

(6)

where H(kij) denotes the information entropy of the ith

keyphrase in the jth keyphrase list RSij denotes the im-portance scores provided by workers QED

j denotes thequality of a worker who provides the jth keyphrase list in thealgorithm HILED QI

j denotes the quality of a worker whoprovides the jth keyphrase list in the algorithm HILI and mdenotes the total number of keyphrase lists provided by aworker

3 Experiments and Discussion

In this section we will first introduce experiments withdifferent order manners which are the descending and therandom ones and then we will discuss the factors ofinfluencing performance improvement of crowdsourcingannotation

31 Crowdsourcing Experiment with Descending RankingSince IMLK IMLK-I and IMLK-ED proposed in [13] andKeyRank proposed in [15] perform very well we employed

Table 1 Symbols

No Symbols Presentation1 K A keyphrase2 T A training sample3 kindie An independent keyphrase4 Ppos (e attribute meaningfulness5 NKI (e number of kindie occurs6 ksub A sub-keyphrase7 Psub (e attribute uncertainty8 NKS (e number of ksub occurs9 TN (e total number of keyphrases in the corpus10 Pneg (e attribute uselessness11 H (k) (e information entropy of k

4 Complexity

them as baselines KeyRank is one of themachine annotationmethods and its performance is evaluated on datasetINSPEC [17] containing 2000 abstracts (1000 for training500 for development and 500 for testing) in [15] Con-sidering the cost and latency of workers we chose 100abstracts from the 500 test ones in dataset INSPEC whereKeyRank performs the best as the data for our multiplecrowdsourcing experiments In addition the gold standardsof these 100 test abstracts are treated as labelled ones fromexpert annotation As we said before each single abstractcorresponds to a single L-HIT (at is we have 100 cor-responding L-HITs (e part of candidate option in eachL-HIT lists 15 (or fewer) candidates with descendingranking (ese candidates are keyphrases labelled andweighted by KeyRank Again in order to overcome theshortage that the quality of an individual worker for key-phrase extraction is sometimes rather low we request 10responses for each L-HIT from 10 different workers (at isthe whole experiment has 1000 published L-HITs since eachone has to be published ten times on MTurk Each L-HITcosts 5 cents and the whole experiment costs 50 dollarstotally According to feedback from crowdsourcing platformMTurk more than four out of five workers completed theoptional ldquocandidate supplementrdquo tasks (e minimum timethat a single crowdsourcing task required is 50 seconds andthe maximum time is 5 minutes (e time required for mostof the crowdsourcing tasks was between 90 and 200 seconds

(e precision (P) recall (R) and F1 score are employedas performance metrics P R and F1 score are defined asfollows

P correctlabelled

R correctexpert

F1 2 times P timesR

(P + R)

(7)

where correct denotes the number of correct keyphrasesobtained from crowdsourcing annotation labelled denotesthe number of keyphrases obtained from crowdsourcingannotation and expert denotes the number of keyphrasesobtained from expert annotation Normally expert formost abstracts varies from 3 to 5 so that the value of la-belled in our experiment varies from 3 to 5

After 10 responses of each L-HIT are obtained from 10different workers algorithms IMLK IMLK-I IMLK-EDHILED and HILI are applied to infer a truth keyphrase listfrom these responses (e inferred results of IMLK IMLK-IIMLK-ED HILED and HILI are compared with those ofKeyRank in terms of P R and F1 score Besides in order toevaluate the performance of KeyRank IMLK IMLK-IIMLK-ED HILED and HILI clearly the comparisons aredivided into three different groups ie Group-3 Group-4and Group-5 For example Group-4 is named as such be-cause the number of labelled is 4 when it reports thecomparisons among KeyRank IMLK IMLK-I IMLK-EDHILED and HILI in terms of P R and F1 score respectively

In addition the relations between the workersrsquo numbers(denoted as WorkerNum) and the inferred results are alsoexplored by respectively conducting another seven com-parisons in all groups (e values of WorkerNum are set to3 4 5 6 7 8 and 9 respectively Since each abstract has 10keyphrase lists provided by 10 different workers respec-tively in order to get rid of the impact of workersrsquo ordereach algorithm on each abstract is run ten times under acertain WorkerNum and the corresponding number ofkeyphrase lists are randomly selected from its 10 keyphraselists at each time For example when the WorkerNum is 5we randomly select 5 keyphrase lists from the 10 keyphraselists All comparisons of all groups among KeyRank IMLKIMLK-I IMLK-ED HILED and HILI are shown in Figure 2

From Figure 2 we notice that IMLK IMLK-I andIMLK-ED significantly perform better than KeyRank in allgroups in terms of P R and F1 score We also notice thatboth HILED and HILI significantly perform better thanKeyRank IMLK IMLK-I and IMLK-ED in all groups interms of P R and F1 score Between HILED and HILIexcept the comparisons in Group-3 Group-4 and Group-5when the values of WorkerNum are 5 6 and 7 (the sit-uation of WorkerNum 7 only occurs in Group-3) interms of P R and F1 score HILED always performs betterthan HILI Moreover we notice that with the increment ofWorkerNum the performance of IMLK IMLK-I IMLK-ED HILI and HILED has a rising trend (erefore we canconclude that (1) both HILED and HILI perform better thanIMLK IMLK-I and IMLK-ED (2) HILED performs a littlebetter than HILI (3) WorkerNum does influence theinferred results and (4) employing crowdsourcing anno-tation is a feasible and effective way for training samplelabelling

32 Crowdsourcing Experiment with Random RankingFor each published L-HIT in the Crowdsourcing experimentwith Descending Ranking (denoted as CDR) in Section 31the 15 (or fewer) candidates listed in the part of candidateoption are ordered according to their scores assigned byKeyRank from high to low Is there any relevancy betweenthe order manners of the listed candidates and the im-provement performance of crowdsourcing annotation

In order to explore whether there is such a relevancybetween them we create another 100 L-HITs using theselected 100 representative abstracts mentioned in Section31 Meanwhile we also request 10 responses for each L-HITfrom 10 different workers For each L-HIT the 15 (or fewer)candidates are randomly listed in the part of candidateoptionWe named the experiments conducted in this sectionCrowdsourcing experiment with Random Ranking (denotedas CRR) To make a fair evaluation all experimental pa-rameters of CRR follow those of CDR All comparisonsamong KeyRank IMLK HILED and HILI in terms of P Rand F1 scores are shown in Figure 3

From Figure 3 we can see that IMLK HILED and HILIin CRR always significantly perform better than KeyRank interms of P R and F1 score It proves once again thatemploying crowdsourcing annotation is a feasible and

Complexity 5

effective way for training sample labelling However wenotice that the performance of IMLK HILED and HILI inCRR is worse than that of these algorithms in CDR whichproves that the order manners of the listed candidates doinfluence the improvement performance of crowdsourcingannotation and the descending order manner is more ef-fective than the random one

33 Discussion

3e Proper Number of Workers Either CDR or CRR showsus that with an increment of WorkerNum the improve-ment performance of crowdsourcing annotation has a risingtrend However more workers do not meanmore suitabilityOn the one hand more workers may result in more latencyFor instance workers may be distracted or tasks may not beappealing to enough workers On the other hand moreworkers mean more monetary cost since crowdsourcing

annotation is not free It is just a cheaper way to labelsufficient training samples timely Hence the trade-offamong quality latency and cost controls needs to beconsidered and balanced(e experimental results show thatthe proper number of workers varies from 6 to 8 because theimprovement performance of crowdsourcing annotation atthese stages is relatively stable and the quantity is appro-priate to avoid high latency and cost

3e Descending and Random Ranking Manners (e ex-perimental results demonstrate that the descending rankingmanner performs better than the random one (e reasonmay be that workers have limited patience since they are nottrained Normally workers just focus on the top 5 (or less 5)candidates listed in the part of candidate option If they donot find any proper one(s) from the top few candidates theymay lose patience to read the remaining ones so that theywould select randomly or supplement option(s) in the partof candidate supplement for completing the current L-HIT

30

40

50

60

70

80Pr

ecisi

on (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

2530354045505560

Reca

ll (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

30

40

50

60

70

F 1 v

alue

()

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(a)

Prec

ision

()

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

30

40

50

60

70

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(b)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

30

40

50

60

70

The number of workers

Prec

ision

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

40

50

60

70

80

The number of workers

Reca

ll (

)

3 4 5 6 7 8 930

40

50

60

70

The number of workers

F 1 v

alue

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(c)

Figure 2 Comparisons among KeyRank IMLK IMKL-I IMLK-ED HILED and HILI in all groups (a) Group-3 (b) Group-4 (c)Group-5

6 Complexity

However the random selected one(s) may not be properand the supplementary one(s) may be repeated with thecandidates listed in the part of candidate option (ereforethe loss of accuracy happens

4 Related Work

Recommendation models [18] have been widely applied inmany domains such as complex systems [19 20] Quality ofService (QoS) prediction [21 22] reliability detection forreal-time systems [23] social networks [24ndash26] and others[27ndash29] Among existing recommendation models the su-pervised learning-based ones have increasingly attractedattention because of effectiveness However it is well knownthat the supervised learning-based recommendation models

suffer from the quality of training samples (erefore la-belling sufficient training samples timely and accurately inthe era of big data becomes an important foundation to thesupervised learning-based recommendation Since this pa-per focuses on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation the relatedwork will be introduced in terms of keyphrase extraction andcrowdsourcing annotation

Most original works labelling keyphrases simply selectedsingle or contiguous words with a high frequency such asKEA [14] Yet these single or contiguous words do not al-ways deliver main points discussed in a text Study [30]demonstrated that semantic relations in context can helpextract high-quality keyphrases Hence some researchstudies employed knowledge bases and ontologies to obtain

50

60

70

80

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

35

40

45

50

55

60

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

40

45

50

55

60

65

70

F 1 v

alue

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(a)

Prec

ision

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(b)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9

45

50

55

60

65

The number of workers

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 950

55

60

65

70

75

The number of workers

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 945

50

55

60

65

70

The number of workers

F 1 va

lue (

)

(c)

Figure 3 Comparisons among KeyRank IMLK HILED andHILI in CRR and CDR IMLKrHILEDr and HILIr denote the performance ofalgorithms IMLK HILED and HILI in CRR IMLKd HILEDd andHILId denote the performance of algorithms IMLK HILED andHILI inCDR (a) Group-3 (b) Group-4 (c) Group-5

Complexity 7

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 4: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

Some proper keyphrases may not be listed in the part ofcandidate option because of various reasons for instancephrases with low appearing frequencies or ones with lowscores assigned by machine annotation (erefore for eachsingle L-HIT there is a candidate supplement part that letsworkers supplement lost keyphrases as well as the corre-sponding importance rankings (surrounded by a yellowrectangle) (e part of candidate supplement also has twotask types which are fill-in-blank (ie supplementing lostkeyphrase(s)) and rating (ie selecting importance rank-ings) respectively Note that supplementing the lost key-phrase(s) is an optional job for workers

22 Inference Algorithms In this paper inferring a truthkeyphrase list is still viewed as a process of first-integratinglast-grading phrases Although algorithms IMLK IMLK-Iand IMLK-ED [13] are suitable for inferring a truth key-phrase list from multiple lists of keyphrases they neglect tocalculate three inherent attributes of a keyphrase capturing atopic delivered by the training samples which are mean-ingfulness uncertainty and uselessness [14] Study [15]shows that calculating the information entropy [16] of akeyphrase is a significant way to measure these three in-herent attributes of a keyphrase (erefore we utilize theinformation entropy and corresponding equations in [15] tomeasure the three inherent properties of a keyphrase cap-turing a topic (e symbols used for ground truth inferencealgorithms are shown in Table 1

(e attribute meaningfulness of k in T denotes the krsquospositive probability of capturing a topic expressed by TNormally it is measured by the distribution of k as anindependent keyphrase since the more times kindie occursthe bigger positive probability the topic is delivered by k (eattribute meaningfulness is defined as follows

Ppos NKITN 0lt NKIlt TN

0 NKI 01113896 (1)

where Ppos 0 for the case that k does not exist in the corpusAs the name implies the attribute uncertainty of k in T

denotes the krsquos unsteadiness of capturing a topic expressedby T which is usually measured by the distribution of Tas asub-keyphrase A sub-keyphrase means it can be extendedinto another keyphrase with other words Note that (a)different keyphrases express a same point with differentexpression depth and (b) different keyphrases express to-tally different points For example although keyphraseldquotopic modelrdquo is a sub-keyphrase of ldquotopic aware propa-gation modelrdquo they express different points Intuitively themore times ksub occurs the more unsteady the topic isdelivered by k (e attribute uncertainty is defined asfollows

Psub NKSTN 0lt NKSlt TN

0 NKS 01113896 (2)

(e attribute uselessness of k in Tdenotes the krsquos negativeprobability of capturing a topic expressed by T which isdefined as follows

Pneg 1 minus Ppos minus Psub (3)In conclusion the information entropy of k can com-

pletely measure its three inherent attributes using equations(4) or (5) (when the situation Psub 0 occurs)

H(k) Pposlog1

Ppos1113888 1113889 + Psublog

1Psub

1113888 1113889 + Pneglog1

Pneg1113888 1113889

(4)

H(k) Pposlog1

Ppos1113888 1113889 + Pneglog

1Pneg

1113888 1113889 (5)

Finally by combining the information entropy algo-rithms HILED and HILI are proposed based on algorithmsIMLK-ED and IMLK-I stated in [13] respectively and thecorresponding equations recalculating the keyphrasesrsquogrades are modified as follows

Gij 1113936

mj1 QE D

j times RSij1113872 11138732

times H kij1113872 1113873

m

Gij 1113936

mj1 QI

j times RSij1113872 11138732

times H kij1113872 1113873

m

(6)

where H(kij) denotes the information entropy of the ith

keyphrase in the jth keyphrase list RSij denotes the im-portance scores provided by workers QED

j denotes thequality of a worker who provides the jth keyphrase list in thealgorithm HILED QI

j denotes the quality of a worker whoprovides the jth keyphrase list in the algorithm HILI and mdenotes the total number of keyphrase lists provided by aworker

3 Experiments and Discussion

In this section we will first introduce experiments withdifferent order manners which are the descending and therandom ones and then we will discuss the factors ofinfluencing performance improvement of crowdsourcingannotation

31 Crowdsourcing Experiment with Descending RankingSince IMLK IMLK-I and IMLK-ED proposed in [13] andKeyRank proposed in [15] perform very well we employed

Table 1 Symbols

No Symbols Presentation1 K A keyphrase2 T A training sample3 kindie An independent keyphrase4 Ppos (e attribute meaningfulness5 NKI (e number of kindie occurs6 ksub A sub-keyphrase7 Psub (e attribute uncertainty8 NKS (e number of ksub occurs9 TN (e total number of keyphrases in the corpus10 Pneg (e attribute uselessness11 H (k) (e information entropy of k

4 Complexity

them as baselines KeyRank is one of themachine annotationmethods and its performance is evaluated on datasetINSPEC [17] containing 2000 abstracts (1000 for training500 for development and 500 for testing) in [15] Con-sidering the cost and latency of workers we chose 100abstracts from the 500 test ones in dataset INSPEC whereKeyRank performs the best as the data for our multiplecrowdsourcing experiments In addition the gold standardsof these 100 test abstracts are treated as labelled ones fromexpert annotation As we said before each single abstractcorresponds to a single L-HIT (at is we have 100 cor-responding L-HITs (e part of candidate option in eachL-HIT lists 15 (or fewer) candidates with descendingranking (ese candidates are keyphrases labelled andweighted by KeyRank Again in order to overcome theshortage that the quality of an individual worker for key-phrase extraction is sometimes rather low we request 10responses for each L-HIT from 10 different workers (at isthe whole experiment has 1000 published L-HITs since eachone has to be published ten times on MTurk Each L-HITcosts 5 cents and the whole experiment costs 50 dollarstotally According to feedback from crowdsourcing platformMTurk more than four out of five workers completed theoptional ldquocandidate supplementrdquo tasks (e minimum timethat a single crowdsourcing task required is 50 seconds andthe maximum time is 5 minutes (e time required for mostof the crowdsourcing tasks was between 90 and 200 seconds

(e precision (P) recall (R) and F1 score are employedas performance metrics P R and F1 score are defined asfollows

P correctlabelled

R correctexpert

F1 2 times P timesR

(P + R)

(7)

where correct denotes the number of correct keyphrasesobtained from crowdsourcing annotation labelled denotesthe number of keyphrases obtained from crowdsourcingannotation and expert denotes the number of keyphrasesobtained from expert annotation Normally expert formost abstracts varies from 3 to 5 so that the value of la-belled in our experiment varies from 3 to 5

After 10 responses of each L-HIT are obtained from 10different workers algorithms IMLK IMLK-I IMLK-EDHILED and HILI are applied to infer a truth keyphrase listfrom these responses (e inferred results of IMLK IMLK-IIMLK-ED HILED and HILI are compared with those ofKeyRank in terms of P R and F1 score Besides in order toevaluate the performance of KeyRank IMLK IMLK-IIMLK-ED HILED and HILI clearly the comparisons aredivided into three different groups ie Group-3 Group-4and Group-5 For example Group-4 is named as such be-cause the number of labelled is 4 when it reports thecomparisons among KeyRank IMLK IMLK-I IMLK-EDHILED and HILI in terms of P R and F1 score respectively

In addition the relations between the workersrsquo numbers(denoted as WorkerNum) and the inferred results are alsoexplored by respectively conducting another seven com-parisons in all groups (e values of WorkerNum are set to3 4 5 6 7 8 and 9 respectively Since each abstract has 10keyphrase lists provided by 10 different workers respec-tively in order to get rid of the impact of workersrsquo ordereach algorithm on each abstract is run ten times under acertain WorkerNum and the corresponding number ofkeyphrase lists are randomly selected from its 10 keyphraselists at each time For example when the WorkerNum is 5we randomly select 5 keyphrase lists from the 10 keyphraselists All comparisons of all groups among KeyRank IMLKIMLK-I IMLK-ED HILED and HILI are shown in Figure 2

From Figure 2 we notice that IMLK IMLK-I andIMLK-ED significantly perform better than KeyRank in allgroups in terms of P R and F1 score We also notice thatboth HILED and HILI significantly perform better thanKeyRank IMLK IMLK-I and IMLK-ED in all groups interms of P R and F1 score Between HILED and HILIexcept the comparisons in Group-3 Group-4 and Group-5when the values of WorkerNum are 5 6 and 7 (the sit-uation of WorkerNum 7 only occurs in Group-3) interms of P R and F1 score HILED always performs betterthan HILI Moreover we notice that with the increment ofWorkerNum the performance of IMLK IMLK-I IMLK-ED HILI and HILED has a rising trend (erefore we canconclude that (1) both HILED and HILI perform better thanIMLK IMLK-I and IMLK-ED (2) HILED performs a littlebetter than HILI (3) WorkerNum does influence theinferred results and (4) employing crowdsourcing anno-tation is a feasible and effective way for training samplelabelling

32 Crowdsourcing Experiment with Random RankingFor each published L-HIT in the Crowdsourcing experimentwith Descending Ranking (denoted as CDR) in Section 31the 15 (or fewer) candidates listed in the part of candidateoption are ordered according to their scores assigned byKeyRank from high to low Is there any relevancy betweenthe order manners of the listed candidates and the im-provement performance of crowdsourcing annotation

In order to explore whether there is such a relevancybetween them we create another 100 L-HITs using theselected 100 representative abstracts mentioned in Section31 Meanwhile we also request 10 responses for each L-HITfrom 10 different workers For each L-HIT the 15 (or fewer)candidates are randomly listed in the part of candidateoptionWe named the experiments conducted in this sectionCrowdsourcing experiment with Random Ranking (denotedas CRR) To make a fair evaluation all experimental pa-rameters of CRR follow those of CDR All comparisonsamong KeyRank IMLK HILED and HILI in terms of P Rand F1 scores are shown in Figure 3

From Figure 3 we can see that IMLK HILED and HILIin CRR always significantly perform better than KeyRank interms of P R and F1 score It proves once again thatemploying crowdsourcing annotation is a feasible and

Complexity 5

effective way for training sample labelling However wenotice that the performance of IMLK HILED and HILI inCRR is worse than that of these algorithms in CDR whichproves that the order manners of the listed candidates doinfluence the improvement performance of crowdsourcingannotation and the descending order manner is more ef-fective than the random one

33 Discussion

3e Proper Number of Workers Either CDR or CRR showsus that with an increment of WorkerNum the improve-ment performance of crowdsourcing annotation has a risingtrend However more workers do not meanmore suitabilityOn the one hand more workers may result in more latencyFor instance workers may be distracted or tasks may not beappealing to enough workers On the other hand moreworkers mean more monetary cost since crowdsourcing

annotation is not free It is just a cheaper way to labelsufficient training samples timely Hence the trade-offamong quality latency and cost controls needs to beconsidered and balanced(e experimental results show thatthe proper number of workers varies from 6 to 8 because theimprovement performance of crowdsourcing annotation atthese stages is relatively stable and the quantity is appro-priate to avoid high latency and cost

3e Descending and Random Ranking Manners (e ex-perimental results demonstrate that the descending rankingmanner performs better than the random one (e reasonmay be that workers have limited patience since they are nottrained Normally workers just focus on the top 5 (or less 5)candidates listed in the part of candidate option If they donot find any proper one(s) from the top few candidates theymay lose patience to read the remaining ones so that theywould select randomly or supplement option(s) in the partof candidate supplement for completing the current L-HIT

30

40

50

60

70

80Pr

ecisi

on (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

2530354045505560

Reca

ll (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

30

40

50

60

70

F 1 v

alue

()

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(a)

Prec

ision

()

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

30

40

50

60

70

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(b)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

30

40

50

60

70

The number of workers

Prec

ision

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

40

50

60

70

80

The number of workers

Reca

ll (

)

3 4 5 6 7 8 930

40

50

60

70

The number of workers

F 1 v

alue

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(c)

Figure 2 Comparisons among KeyRank IMLK IMKL-I IMLK-ED HILED and HILI in all groups (a) Group-3 (b) Group-4 (c)Group-5

6 Complexity

However the random selected one(s) may not be properand the supplementary one(s) may be repeated with thecandidates listed in the part of candidate option (ereforethe loss of accuracy happens

4 Related Work

Recommendation models [18] have been widely applied inmany domains such as complex systems [19 20] Quality ofService (QoS) prediction [21 22] reliability detection forreal-time systems [23] social networks [24ndash26] and others[27ndash29] Among existing recommendation models the su-pervised learning-based ones have increasingly attractedattention because of effectiveness However it is well knownthat the supervised learning-based recommendation models

suffer from the quality of training samples (erefore la-belling sufficient training samples timely and accurately inthe era of big data becomes an important foundation to thesupervised learning-based recommendation Since this pa-per focuses on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation the relatedwork will be introduced in terms of keyphrase extraction andcrowdsourcing annotation

Most original works labelling keyphrases simply selectedsingle or contiguous words with a high frequency such asKEA [14] Yet these single or contiguous words do not al-ways deliver main points discussed in a text Study [30]demonstrated that semantic relations in context can helpextract high-quality keyphrases Hence some researchstudies employed knowledge bases and ontologies to obtain

50

60

70

80

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

35

40

45

50

55

60

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

40

45

50

55

60

65

70

F 1 v

alue

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(a)

Prec

ision

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(b)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9

45

50

55

60

65

The number of workers

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 950

55

60

65

70

75

The number of workers

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 945

50

55

60

65

70

The number of workers

F 1 va

lue (

)

(c)

Figure 3 Comparisons among KeyRank IMLK HILED andHILI in CRR and CDR IMLKrHILEDr and HILIr denote the performance ofalgorithms IMLK HILED and HILI in CRR IMLKd HILEDd andHILId denote the performance of algorithms IMLK HILED andHILI inCDR (a) Group-3 (b) Group-4 (c) Group-5

Complexity 7

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 5: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

them as baselines KeyRank is one of themachine annotationmethods and its performance is evaluated on datasetINSPEC [17] containing 2000 abstracts (1000 for training500 for development and 500 for testing) in [15] Con-sidering the cost and latency of workers we chose 100abstracts from the 500 test ones in dataset INSPEC whereKeyRank performs the best as the data for our multiplecrowdsourcing experiments In addition the gold standardsof these 100 test abstracts are treated as labelled ones fromexpert annotation As we said before each single abstractcorresponds to a single L-HIT (at is we have 100 cor-responding L-HITs (e part of candidate option in eachL-HIT lists 15 (or fewer) candidates with descendingranking (ese candidates are keyphrases labelled andweighted by KeyRank Again in order to overcome theshortage that the quality of an individual worker for key-phrase extraction is sometimes rather low we request 10responses for each L-HIT from 10 different workers (at isthe whole experiment has 1000 published L-HITs since eachone has to be published ten times on MTurk Each L-HITcosts 5 cents and the whole experiment costs 50 dollarstotally According to feedback from crowdsourcing platformMTurk more than four out of five workers completed theoptional ldquocandidate supplementrdquo tasks (e minimum timethat a single crowdsourcing task required is 50 seconds andthe maximum time is 5 minutes (e time required for mostof the crowdsourcing tasks was between 90 and 200 seconds

(e precision (P) recall (R) and F1 score are employedas performance metrics P R and F1 score are defined asfollows

P correctlabelled

R correctexpert

F1 2 times P timesR

(P + R)

(7)

where correct denotes the number of correct keyphrasesobtained from crowdsourcing annotation labelled denotesthe number of keyphrases obtained from crowdsourcingannotation and expert denotes the number of keyphrasesobtained from expert annotation Normally expert formost abstracts varies from 3 to 5 so that the value of la-belled in our experiment varies from 3 to 5

After 10 responses of each L-HIT are obtained from 10different workers algorithms IMLK IMLK-I IMLK-EDHILED and HILI are applied to infer a truth keyphrase listfrom these responses (e inferred results of IMLK IMLK-IIMLK-ED HILED and HILI are compared with those ofKeyRank in terms of P R and F1 score Besides in order toevaluate the performance of KeyRank IMLK IMLK-IIMLK-ED HILED and HILI clearly the comparisons aredivided into three different groups ie Group-3 Group-4and Group-5 For example Group-4 is named as such be-cause the number of labelled is 4 when it reports thecomparisons among KeyRank IMLK IMLK-I IMLK-EDHILED and HILI in terms of P R and F1 score respectively

In addition the relations between the workersrsquo numbers(denoted as WorkerNum) and the inferred results are alsoexplored by respectively conducting another seven com-parisons in all groups (e values of WorkerNum are set to3 4 5 6 7 8 and 9 respectively Since each abstract has 10keyphrase lists provided by 10 different workers respec-tively in order to get rid of the impact of workersrsquo ordereach algorithm on each abstract is run ten times under acertain WorkerNum and the corresponding number ofkeyphrase lists are randomly selected from its 10 keyphraselists at each time For example when the WorkerNum is 5we randomly select 5 keyphrase lists from the 10 keyphraselists All comparisons of all groups among KeyRank IMLKIMLK-I IMLK-ED HILED and HILI are shown in Figure 2

From Figure 2 we notice that IMLK IMLK-I andIMLK-ED significantly perform better than KeyRank in allgroups in terms of P R and F1 score We also notice thatboth HILED and HILI significantly perform better thanKeyRank IMLK IMLK-I and IMLK-ED in all groups interms of P R and F1 score Between HILED and HILIexcept the comparisons in Group-3 Group-4 and Group-5when the values of WorkerNum are 5 6 and 7 (the sit-uation of WorkerNum 7 only occurs in Group-3) interms of P R and F1 score HILED always performs betterthan HILI Moreover we notice that with the increment ofWorkerNum the performance of IMLK IMLK-I IMLK-ED HILI and HILED has a rising trend (erefore we canconclude that (1) both HILED and HILI perform better thanIMLK IMLK-I and IMLK-ED (2) HILED performs a littlebetter than HILI (3) WorkerNum does influence theinferred results and (4) employing crowdsourcing anno-tation is a feasible and effective way for training samplelabelling

32 Crowdsourcing Experiment with Random RankingFor each published L-HIT in the Crowdsourcing experimentwith Descending Ranking (denoted as CDR) in Section 31the 15 (or fewer) candidates listed in the part of candidateoption are ordered according to their scores assigned byKeyRank from high to low Is there any relevancy betweenthe order manners of the listed candidates and the im-provement performance of crowdsourcing annotation

In order to explore whether there is such a relevancybetween them we create another 100 L-HITs using theselected 100 representative abstracts mentioned in Section31 Meanwhile we also request 10 responses for each L-HITfrom 10 different workers For each L-HIT the 15 (or fewer)candidates are randomly listed in the part of candidateoptionWe named the experiments conducted in this sectionCrowdsourcing experiment with Random Ranking (denotedas CRR) To make a fair evaluation all experimental pa-rameters of CRR follow those of CDR All comparisonsamong KeyRank IMLK HILED and HILI in terms of P Rand F1 scores are shown in Figure 3

From Figure 3 we can see that IMLK HILED and HILIin CRR always significantly perform better than KeyRank interms of P R and F1 score It proves once again thatemploying crowdsourcing annotation is a feasible and

Complexity 5

effective way for training sample labelling However wenotice that the performance of IMLK HILED and HILI inCRR is worse than that of these algorithms in CDR whichproves that the order manners of the listed candidates doinfluence the improvement performance of crowdsourcingannotation and the descending order manner is more ef-fective than the random one

33 Discussion

3e Proper Number of Workers Either CDR or CRR showsus that with an increment of WorkerNum the improve-ment performance of crowdsourcing annotation has a risingtrend However more workers do not meanmore suitabilityOn the one hand more workers may result in more latencyFor instance workers may be distracted or tasks may not beappealing to enough workers On the other hand moreworkers mean more monetary cost since crowdsourcing

annotation is not free It is just a cheaper way to labelsufficient training samples timely Hence the trade-offamong quality latency and cost controls needs to beconsidered and balanced(e experimental results show thatthe proper number of workers varies from 6 to 8 because theimprovement performance of crowdsourcing annotation atthese stages is relatively stable and the quantity is appro-priate to avoid high latency and cost

3e Descending and Random Ranking Manners (e ex-perimental results demonstrate that the descending rankingmanner performs better than the random one (e reasonmay be that workers have limited patience since they are nottrained Normally workers just focus on the top 5 (or less 5)candidates listed in the part of candidate option If they donot find any proper one(s) from the top few candidates theymay lose patience to read the remaining ones so that theywould select randomly or supplement option(s) in the partof candidate supplement for completing the current L-HIT

30

40

50

60

70

80Pr

ecisi

on (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

2530354045505560

Reca

ll (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

30

40

50

60

70

F 1 v

alue

()

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(a)

Prec

ision

()

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

30

40

50

60

70

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(b)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

30

40

50

60

70

The number of workers

Prec

ision

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

40

50

60

70

80

The number of workers

Reca

ll (

)

3 4 5 6 7 8 930

40

50

60

70

The number of workers

F 1 v

alue

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(c)

Figure 2 Comparisons among KeyRank IMLK IMKL-I IMLK-ED HILED and HILI in all groups (a) Group-3 (b) Group-4 (c)Group-5

6 Complexity

However the random selected one(s) may not be properand the supplementary one(s) may be repeated with thecandidates listed in the part of candidate option (ereforethe loss of accuracy happens

4 Related Work

Recommendation models [18] have been widely applied inmany domains such as complex systems [19 20] Quality ofService (QoS) prediction [21 22] reliability detection forreal-time systems [23] social networks [24ndash26] and others[27ndash29] Among existing recommendation models the su-pervised learning-based ones have increasingly attractedattention because of effectiveness However it is well knownthat the supervised learning-based recommendation models

suffer from the quality of training samples (erefore la-belling sufficient training samples timely and accurately inthe era of big data becomes an important foundation to thesupervised learning-based recommendation Since this pa-per focuses on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation the relatedwork will be introduced in terms of keyphrase extraction andcrowdsourcing annotation

Most original works labelling keyphrases simply selectedsingle or contiguous words with a high frequency such asKEA [14] Yet these single or contiguous words do not al-ways deliver main points discussed in a text Study [30]demonstrated that semantic relations in context can helpextract high-quality keyphrases Hence some researchstudies employed knowledge bases and ontologies to obtain

50

60

70

80

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

35

40

45

50

55

60

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

40

45

50

55

60

65

70

F 1 v

alue

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(a)

Prec

ision

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(b)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9

45

50

55

60

65

The number of workers

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 950

55

60

65

70

75

The number of workers

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 945

50

55

60

65

70

The number of workers

F 1 va

lue (

)

(c)

Figure 3 Comparisons among KeyRank IMLK HILED andHILI in CRR and CDR IMLKrHILEDr and HILIr denote the performance ofalgorithms IMLK HILED and HILI in CRR IMLKd HILEDd andHILId denote the performance of algorithms IMLK HILED andHILI inCDR (a) Group-3 (b) Group-4 (c) Group-5

Complexity 7

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 6: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

effective way for training sample labelling However wenotice that the performance of IMLK HILED and HILI inCRR is worse than that of these algorithms in CDR whichproves that the order manners of the listed candidates doinfluence the improvement performance of crowdsourcingannotation and the descending order manner is more ef-fective than the random one

33 Discussion

3e Proper Number of Workers Either CDR or CRR showsus that with an increment of WorkerNum the improve-ment performance of crowdsourcing annotation has a risingtrend However more workers do not meanmore suitabilityOn the one hand more workers may result in more latencyFor instance workers may be distracted or tasks may not beappealing to enough workers On the other hand moreworkers mean more monetary cost since crowdsourcing

annotation is not free It is just a cheaper way to labelsufficient training samples timely Hence the trade-offamong quality latency and cost controls needs to beconsidered and balanced(e experimental results show thatthe proper number of workers varies from 6 to 8 because theimprovement performance of crowdsourcing annotation atthese stages is relatively stable and the quantity is appro-priate to avoid high latency and cost

3e Descending and Random Ranking Manners (e ex-perimental results demonstrate that the descending rankingmanner performs better than the random one (e reasonmay be that workers have limited patience since they are nottrained Normally workers just focus on the top 5 (or less 5)candidates listed in the part of candidate option If they donot find any proper one(s) from the top few candidates theymay lose patience to read the remaining ones so that theywould select randomly or supplement option(s) in the partof candidate supplement for completing the current L-HIT

30

40

50

60

70

80Pr

ecisi

on (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

2530354045505560

Reca

ll (

)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

30

40

50

60

70

F 1 v

alue

()

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(a)

Prec

ision

()

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

30

40

50

60

70

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

30

40

50

60

70

3 4 5 6 7 8 9The number of workers

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(b)

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

30

40

50

60

70

The number of workers

Prec

ision

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

3 4 5 6 7 8 9

40

50

60

70

80

The number of workers

Reca

ll (

)

3 4 5 6 7 8 930

40

50

60

70

The number of workers

F 1 v

alue

()

KeyRankIMLKIMLK-I

IMLK-EDHILEDHILI

(c)

Figure 2 Comparisons among KeyRank IMLK IMKL-I IMLK-ED HILED and HILI in all groups (a) Group-3 (b) Group-4 (c)Group-5

6 Complexity

However the random selected one(s) may not be properand the supplementary one(s) may be repeated with thecandidates listed in the part of candidate option (ereforethe loss of accuracy happens

4 Related Work

Recommendation models [18] have been widely applied inmany domains such as complex systems [19 20] Quality ofService (QoS) prediction [21 22] reliability detection forreal-time systems [23] social networks [24ndash26] and others[27ndash29] Among existing recommendation models the su-pervised learning-based ones have increasingly attractedattention because of effectiveness However it is well knownthat the supervised learning-based recommendation models

suffer from the quality of training samples (erefore la-belling sufficient training samples timely and accurately inthe era of big data becomes an important foundation to thesupervised learning-based recommendation Since this pa-per focuses on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation the relatedwork will be introduced in terms of keyphrase extraction andcrowdsourcing annotation

Most original works labelling keyphrases simply selectedsingle or contiguous words with a high frequency such asKEA [14] Yet these single or contiguous words do not al-ways deliver main points discussed in a text Study [30]demonstrated that semantic relations in context can helpextract high-quality keyphrases Hence some researchstudies employed knowledge bases and ontologies to obtain

50

60

70

80

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

35

40

45

50

55

60

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

40

45

50

55

60

65

70

F 1 v

alue

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(a)

Prec

ision

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(b)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9

45

50

55

60

65

The number of workers

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 950

55

60

65

70

75

The number of workers

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 945

50

55

60

65

70

The number of workers

F 1 va

lue (

)

(c)

Figure 3 Comparisons among KeyRank IMLK HILED andHILI in CRR and CDR IMLKrHILEDr and HILIr denote the performance ofalgorithms IMLK HILED and HILI in CRR IMLKd HILEDd andHILId denote the performance of algorithms IMLK HILED andHILI inCDR (a) Group-3 (b) Group-4 (c) Group-5

Complexity 7

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 7: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

However the random selected one(s) may not be properand the supplementary one(s) may be repeated with thecandidates listed in the part of candidate option (ereforethe loss of accuracy happens

4 Related Work

Recommendation models [18] have been widely applied inmany domains such as complex systems [19 20] Quality ofService (QoS) prediction [21 22] reliability detection forreal-time systems [23] social networks [24ndash26] and others[27ndash29] Among existing recommendation models the su-pervised learning-based ones have increasingly attractedattention because of effectiveness However it is well knownthat the supervised learning-based recommendation models

suffer from the quality of training samples (erefore la-belling sufficient training samples timely and accurately inthe era of big data becomes an important foundation to thesupervised learning-based recommendation Since this pa-per focuses on labelling training samples to keyphrase ex-traction by utilizing crowdsourcing annotation the relatedwork will be introduced in terms of keyphrase extraction andcrowdsourcing annotation

Most original works labelling keyphrases simply selectedsingle or contiguous words with a high frequency such asKEA [14] Yet these single or contiguous words do not al-ways deliver main points discussed in a text Study [30]demonstrated that semantic relations in context can helpextract high-quality keyphrases Hence some researchstudies employed knowledge bases and ontologies to obtain

50

60

70

80

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

35

40

45

50

55

60

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

40

45

50

55

60

65

70

F 1 v

alue

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(a)

Prec

ision

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

Reca

ll (

)

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

F 1 v

alue

()

40

45

50

55

60

65

70

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9The number of workers

(b)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 9

45

50

55

60

65

The number of workers

Prec

ision

()

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 950

55

60

65

70

75

The number of workers

Reca

ll (

)

KeyRankIMLKr

HILEDr

HILIr

IMLKd

HILEDd

HILId

3 4 5 6 7 8 945

50

55

60

65

70

The number of workers

F 1 va

lue (

)

(c)

Figure 3 Comparisons among KeyRank IMLK HILED andHILI in CRR and CDR IMLKrHILEDr and HILIr denote the performance ofalgorithms IMLK HILED and HILI in CRR IMLKd HILEDd andHILId denote the performance of algorithms IMLK HILED andHILI inCDR (a) Group-3 (b) Group-4 (c) Group-5

Complexity 7

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 8: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

semantic relations in context to improve qualities ofextracting keyphrases [31] It is obvious that the semanticrelations obtained by these methods are restricted by thecorresponding knowledge bases and ontologies Studies[32 33] utilized graph-based ranking methods to labelkeyphrases in which a keyphrasersquos importance is deter-mined by its semantic relatedness to others As they justaggregate keyphrases from one single document the cor-responding semantic relatedness is not stable and could notaccurately reveal the ldquorelatednessrdquo between keyphrases ingeneral Studies [34 35] applied sequential pattern miningwith wildcards to label keyphrases since wildcards providegap constraints with flexibility for capturing semantic re-lations in context However most of them are computa-tionally expensive as they need to repeatedly scan the wholedocument In addition they require users to explicitlyspecify appropriate gap constraints beforehand which istime-consuming and not realistic According to the commonsense that words do not repeatedly appear in an effectivekeyphrase KeyRank [15] converted the repeated scanningoperation into a calculating model and significantly reducedtime consumption However it is also frequency-based al-gorithm that may lose important entities with low fre-quencies To sum up machine annotation can label enoughtraining samples timely and they do not meet the re-quirement of high quality because of limited machine in-telligence Hiring domain experts can achieve a highaccuracy However it requires a long time as well more highresources (erefore it is natural to think of utilizingcrowdsourcing annotation which is a new way of humanintelligence to participate in machine computing at a rela-tively low price to label sufficient training samples timelyand accurately

Studies [6ndash8] showed that crowdsourcing brings greatopportunities to machine learning as well as its relatedresearch fields With the appearance of crowdsourcingplatforms such as MTurk [10] and CrowdFlower [36]crowdsourcing has taken off in a wide range of applicationsfor example entity resolution [37] and sentiment analysis[38] Despite the diversity of applications they all employcrowdsourcing annotation at low cost to collect data (labelsof training samples) to resolve corresponding intelligentproblems In addition many crowdsourcing annotation-based systems (frameworks) are proposed to resolve com-puter-hard and intelligent tasks By utilizing crowdsourcingannotation-based methods CrowdCleaner [39] can detectand repair errors that usually cannot be solved by traditionaldata integration and cleaning techniques CrowdPlanner[40] recommends the best route with respect to theknowledge of experienced drivers AggNet [12] is a novelcrowdsourcing annotation-based aggregation frameworkwhich asks workers to detect the mitosis in breast cancerhistology images after training the crowd with a fewexamples

Since some individuals in the crowd may yield relativelylow-quality answers or even noise many researches focus onhow to infer the ground truth according to labels providedby workers [9] Zheng et al [41] employed a domain-sen-sitive worker model to accurately infer the ground truth

based on two principles (1) a label provided by a worker istrusted if the worker is a domain expert on the corre-sponding tasks and (2) a worker is a domain expert if heoften correctly completes tasks related to the specific do-main Zheng et al [42] provided a detailed survey on groundtruth inference on crowdsourcing annotation and per-formed an in-depth analysis of 17 existing methods Zhanget al tried to utilize active learning and label noise correctionto improve the quality of truth inference [43ndash45] One of ourpreliminary works [13] treated the ground truth inference oflabelling keyphrases as an integrating and ranking processand proposed three novel algorithms IMLK IMLK-I andIMLK-ED However these three algorithms ignore threeinherent properties of a keyphrase capturing a pointexpressed by the text which are meaningfulness uncer-tainty and uselessness

5 Conclusions

(is paper focuses on labelling training samples to key-phrase extraction by utilizing crowdsourcing annotationWe designed novel crowdsourcing mechanisms to createcorresponding crowdsourcing annotation-based tasks fortraining samples labelling and proposed two entropy-basedinference algorithms (HILED and HILI) to improve thequality of labelled training samples(e experimental resultsshowed that crowdsourcing annotation can achieve moreeffective improvement performance than the approach ofmachine annotation (ie KeyRank) does In addition wedemonstrated that the rankingmanners of candidates whichare listed in the part of candidate option do influence theimprovement performance of crowdsourcing annotationand the descending ranking manner is more effective thanthe random one In the future we will keep focusing oninference algorithms improving qualities of labelled trainingsamples

Data Availability

(edata used in this study can be accessed via httpsgithubcomsnkimAutomaticKeyphraseExtraction

Conflicts of Interest

(e authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

(is study was partially supported by the National Key RampDProgram of China (grant no 2019YFB1704101) the NationalNatural Science Foundation of China (grant nos U1936220and 31771679) the Anhui Foundation for Science andTechnology Major Project (grant nos 18030901034 and201904e01020006) the Key Laboratory of AgriculturalElectronic Commerce Ministry of Agriculture of China(grant nos AEC2018003 and AEC2018006) the 2019 AnhuiUniversity Collaborative Innovation Project (GXXT-2019-013) and the Hefei Major Research Project of Key Tech-nology (J2018G14)

8 Complexity

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 9: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

References

[1] X Xu Q Liu Y Luo et al ldquoA computation offloadingmethod over big data for IoT-enabled cloud-edge com-putingrdquo Future Generation Computer Systems vol 95pp 522ndash533 2019

[2] J Zhou J Sun P Cong et al ldquoSecurity-critical energy-awaretask scheduling for heterogeneous real-time MPSoCs in IoTrdquoIEEE Transactions on Services Computing 2019 In press

[3] J Zhou X S Hu Y Ma J Sun TWei and S Hu ldquoImprovingavailability of multicore real-time systems suffering bothpermanent and transient faultsrdquo IEEE Transactions onComputers vol 68 no 12 pp 1785ndash1801 2019

[4] Y Zhang K Wang Q He et al ldquoCovering-based web servicequality prediction via neighborhood-aware matrix factor-izationrdquo IEEE Transactions on Services Computing 2019 Inpress

[5] Y Zhang G Cui S Deng et al ldquoEfficient query of qualitycorrelation for service compositionrdquo IEEE Transactions onServices Computing 2019 In press

[6] M Lease ldquoOn quality control and machine learning incrowdsourcingrdquo in Proceedings of the Workshops at the 25thAAAI Conference on Artificial Intelligence pp 97ndash102San Francisco CA USA January 2011

[7] J Zhang X Wu and V S Sheng ldquoLearning from crowd-sourced labeled data a surveyrdquo Artificial Intelligence Reviewvol 46 no 4 pp 543ndash576 2016

[8] V S Sheng F Provost and P G Ipeirotis ldquoGet another labelImproving data quality and data mining using multiple noisylabelersrdquo in Proceedings of the 14th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Miningpp 614ndash622 August 2008

[9] G Li J Wang Y Zheng and M J Franklin ldquoCrowdsourceddata management a surveyrdquo IEEE Transactions on Knowledgeand Data Engineering vol 28 no 9 pp 2296ndash2319 2016

[10] Mturk 2020 httpswwwmturkcom[11] G Li Y Zheng J Fan J Wang and R Cheng ldquoCrowd-

sourced data management overview and challengesrdquo Pro-ceedings of the 2017 ACM International Conference onManagement of Data pp 1711ndash1716 Association for Com-puting Machinery New York NY USA 2017

[12] S Albarqouni C Baur F Achilles V Belagiannis S Demirciand N Navab ldquoAggNet deep learning from crowds formitosis detection in breast cancer histology imagesrdquo IEEETransactions onMedical Imaging vol 35 no 5 pp 1313ndash13212016

[13] Q Wang V S Sheng and Z Liu ldquoExploring methods ofassessing influence relevance of news articlesrdquo in CloudComputing and Security pp 525ndash536 Springer BerlinGermany 2018

[14] I H Witten G W Paynter E Frank C Gutwin andC G Nevill-Manning ldquoKEA Practical automatic keyphraseextractionrdquo in Proceedings of the 4th ACM Conference onDigital Libraries pp 1ndash23 Berkeley CA USA August 1999

[15] Q Wang V S Sheng and X Wu ldquoDocument-specifickeyphrase candidate search and rankingrdquo Expert Systems withApplications vol 97 pp 163ndash176 2018

[16] C E Shannon ldquoA mathematical theory of communicationrdquoBell System Technical Journal vol 27 no 3 pp 379ndash4231948

[17] INSPEC 2020 httpsgithubcomsnkimAutomaticKeyphraseExtraction

[18] A Ramlatchan M Yang Q Liu M Li J Wang and Y Li ldquoAsurvey of matrix completion methods for recommendation

systemsrdquo Big Data Mining and Analytics vol 1 no 4pp 308ndash323 2018

[19] X Xu R Mo F Dai W Lin S Wan and W Dou ldquoDynamicresource provisioning with fault tolerance for data-intensivemeteorological workflows in cloudrdquo IEEE Transactions onIndustrial Informatics 2019

[20] L Qi Y Chen Y Yuan S Fu X Zhang and X Xu ldquoA QoS-aware virtual machine scheduling method for energy con-servation in cloud-based cyber-physical systemsrdquo WorldWide Web vol 23 no 2 pp 1275ndash1297 2019

[21] Y Zhang C Yin Q Wu et al ldquoLocation-aware deep col-laborative filtering for service recommendationrdquo IEEETransactions on Systems Man and Cybernetics Systems 2019

[22] L Qi Q He F Chen et al ldquoFinding all you need web APIsrecommendation in web of things through keywords searchrdquoIEEE Transactions on Computational Social Systems vol 6no 5 pp 1063ndash1072 2019

[23] J Zhou J Sun X Zhou et al ldquoResource management forimproving soft-error and lifetime reliability of real-timeMPSoCsrdquo IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems vol 38 no 12 pp 2215ndash22282019

[24] G Liu Y Wang M A Orgun et al ldquoFinding the optimalsocial trust path for the selection of trustworthy serviceproviders in complex social networksrdquo IEEE Transactions onServices Computing vol 6 no 2 pp 152ndash167 2011

[25] G Liu Y Wang and M A Orgun ldquoOptimal social trust pathselection in complex social networksrdquo in Proceedings of theTwenty-Fourth AAAI Conference on Artificial IntelligenceAtlanta GA USA July 2010

[26] G Liu K Zheng Y Wang et al ldquoMulti-constrained graphpattern matching in large-scale contextual social graphsrdquo inProceedings of the 2015 IEEE 31st International Conference onData Engineering pp 351ndash362 IEEE Seoul South KoreaApril 2015

[27] C Zhang M Yang J Lv and W Yang ldquoAn improved hybridcollaborative filtering algorithm based on tags and timefactorrdquo Big Data Mining and Analytics vol 1 no 2pp 128ndash136 2018

[28] Y Liu S Wang M S Khan and J He ldquoA novel deep hybridrecommender system based on auto-encoder with neuralcollaborative filteringrdquo Big Data Mining and Analytics vol 1no 3 pp 211ndash221 2018

[29] H Liu H Kou C Yan and L Qi ldquoLink prediction in papercitation network to construct paper correlation graphrdquoEURASIP Journal on Wireless Communications and Net-working vol 2019 no 1 2019

[30] G Ercan and I Cicekli ldquoUsing lexical chains for keywordextractionrdquo Information Processing and Management vol 43no 6 pp 1705ndash1714 2007

[31] S Xu S Yang and C M Lau ldquoKeyword extraction andheadline generation using novel word featurerdquo in Proceedingsof the 24th AAAI Conference on Artificial Intelligencepp 1461ndash1466 Atlanta GA USA 2010

[32] R Mihalcea and P Tarau ldquoTextRank bringing order intotextsrdquoUNT Scholarly Works vol 43 no 6 pp 404ndash411 2004

[33] K S Hasan and V Ng ldquoAutomatic keyphrase extraction asurvey of the state of the artrdquo in Proceedings of the 52ndAnnual Meeting of the Association for Computational Lin-guistics pp 1262ndash1273 Baltimore MD USA June 2014

[34] F Xie X Wu and X Zhu ldquoDocument-specific keyphraseextraction using sequential patterns with wildcardsrdquo inProceedings of the 2014 IEEE International Conference on DataMining pp 1055ndash1060 Shenzhen China December 2014

Complexity 9

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity

Page 10: LabellingTrainingSamplesUsingCrowdsourcing ...downloads.hindawi.com/journals/complexity/2020/1670483.pdf · Crowdsourcing anno-tationhasfive steps:(a) ... m m m m KEEEEmEEEE-Figure

[35] J Feng F Xie X Hu P Li J Cao and X Wu ldquoKeywordextraction based on sequential patternminingrdquo in Proceedingsof the 3rd International Conference on Internet MultimediaComputing and Service pp 34ndash38 Chengdu China August2011

[36] Crowdflower 2020 httpwwwcrowdflowercom[37] S Wang X Xiao and C Lee ldquoCrowd-based deduplication an

adaptive approachrdquo in Proceedings of the 2015 ACM SIGMODInternational Conference on Management of Data pp 1263ndash1277 Melbourne Australia June 2015

[38] Y Zheng J Wang G Li R Cheng and J Feng ldquoQASCA aquality-aware task assignment system for crowdsourcingapplicationsrdquo in Proceedings of the 2015 ACM SIGMOD In-ternational Conference on Management of Data pp 1031ndash1046 Melbourne Australia June 2015

[39] Y Tong C C Cao C J Zhang Y Li and L ChenldquoCrowdCleaner data cleaning for multi-version data on theweb via crowdsourcingrdquo in Proceedings of the 2014 IEEE 30thInternational Conference on Data Engineering pp 1182ndash1185Chicago IL USA April 2014

[40] H Su K Zheng J Huang et al ldquoA crowd-based routerecommendation systemrdquo in Proceedings of the 2014 IEEE30th International Conference on Data Engineeringpp 1144ndash1155 Chicago IL USA May 2014

[41] Y Zheng G Li and R Cheng ldquoDOCS domain-awarecrowdsourcing systemrdquo Proceedings of the Vldb Endowmentvol 10 no 4 pp 361ndash372 2016

[42] Y Zheng G Li Y Li C Shan and R Cheng ldquoTruth inferencein crowdsourcing is the problem solvedrdquo Proceedings of theVldb Endowment vol 10 no 5 pp 541ndash552 2017

[43] J Wu S Zhao V S Sheng et al ldquoWeak-labeled activelearning with conditional label dependence for multilabelimage classificationrdquo IEEE Transactions on Multimediavol 19 no 6 pp 1156ndash1169 2017

[44] B Nicholson J Zhang V S Sheng et al ldquoLabel noise cor-rection methodsrdquo in Proceedings of the 2015 IEEE Interna-tional Conference on Data Science and Advanced Analytics(DSAA) pp 1ndash9 IEEE Paris France October 2015

[45] J Zhang V S Sheng Q Li J Wu and X Wu ldquoConsensusalgorithms for biased labeling in crowdsourcingrdquo InformationSciences vol 382-383 pp 254ndash273 2017

10 Complexity