13
Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang Educational Testing Service June 26, 2014

Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Embed Size (px)

Citation preview

Page 1: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Automatic Detection of Plagiarized Spoken Responses

Copyright © 2014 by Educational Testing Service. All rights reserved.

Keelan Evanini and Xinhao Wang

Educational Testing Service

June 26, 2014

Page 2: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Automated Detection of Plagiarized Spoken Responses

• Becomes an important application due to the increasing need of automated scoring for spontaneous speech

• Prevents one type of cheating strategy– Test takers may prepare canned answers using

test preparation materials prior to the examination.

June 18, 20142 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 3: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Plagiarized Spoken Response in TOEFL® iBT

• TOEFL® iBT, a large scale, high-stakes assessment of English for non-native speakers. – Independent speaking tasks, asking test takers to

draw upon their own ideas, opinions, and experiences in a 45-second spoken response.

• Plagiarized Spoken Responses– Test takers may attempt to game the assessment by

memorizing canned material from an external source and adapt it to a question.

– This type of plagiarism can affect the validity of a test taker’s speaking score.

June 18, 20143 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 4: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

June 18, 20144 Copyright © 2014 by Educational Testing Service. All rights reserved.

Well, the place I enjoy the most is a small town located in France. I like this small town because it has very charming ocean view. I mean the sky there is so blue and the beach is always full of sunshine. You know how romantic it can ever be, just relax yourself on the beach, when the sun is setting down, when the ocean breeze is blowing and the seabirds are singing. Of course I like this small French town also because there are many great French restaurants. They offer the best seafood in the world like lobsters and tuna fishes. The most important, I have been benefited a lot from this trip to France because I made friends with some gorgeous French girls. One of them even gave me a little watch as a souvenir of our friendship.

One Source Material One Plagiarized Response

family is a little trip to France when I was in primary school ten years ago I enjoy this activity first because we visited a small French town located by the beach the town has very charming ocean view and in the sky is so blue and the beach is always full of sunshine you know how romantic it can ever be just relax yourself on the beach when the sun is settling down the sea birds are singing of course I enjoy this activity with my family also because there are many great French restaurants they offer the best sea food in the world like lobsters and tuna fishes so I enjoy this activity with my family very much even it has passed several years

Page 5: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Canned Response Collection

• Step1: Human raters flag potentially plagiarized spoken responses.

• Step2: Rater supervisors review responses by comparing them to external source material.

• Step3: if the presence of plagiarized material made it impossible to provide a valid assessment of the test taker’s performance, a score of 0 was assigned.

June 18, 20145 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 6: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

• 719 potentially plagiarized responses; 239 canned responses with score 0

• 49 different source materials• Approximately 300 control responses from each of the four

most-frequent test questions.

June 18, 20146 Copyright © 2014 by Educational Testing Service. All rights reserved.

Data Set NNumber of Words

Mean Standard Deviation

Sources 49 122.5 36.5

Plagiarized 239 109.1 18.9

Control 1196 84.9 24.1

The plagiarized responses are on average a little longer than the control responses

Data Collection

Page 7: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Methodology (1)

• Comparison between a test response and each of the 49 reference sources with 9 text-to-text similarity metrics:

– Word Error Rate (WER)

– TER and TER-Plus (Snover et al., 2006), (Snover et al., 2008)

– Four similarity metrics based on WordNet (Wu and Palmer, 1994), (Leacock and Chodorow, 1998)

– Latent Semantic Analysis

– BLEU (Papineni et al., 2002)

June 18, 20147 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 8: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Methodology (2)

• 4 different features for each similarity metric:

– Document-level similarity

– Single maximum similarity value and from a sentence-by-sentence comparison

– Average of the similarity values for all sentence-by-sentence comparisons

– Average of the maximum similarity values for each sentence in the test response, where the maximum similarity of a sentence is obtained by comparing it with each sentence in the source text

June 18, 20148 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 9: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Experimental Setup

• Experiments on both human transcriptions and automatic speech recognition (ASR) outputs

• ASR system– Trained on approximately 800 hours of TOEFL® iBT

responses– WERs were 0.411 on the plagiarized set and 0.362 on

the control set

• Maximum Entropy-based sentence boundary detection system (Chen and Yoon, 2011)

June 18, 20149 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 10: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Results

June 18, 201410 Copyright © 2014 by Educational Testing Service. All rights reserved.

Text Features Accuracy Kappa

Transcriptions

ALL 0.903 0.807

Document-level 0.920 0.839

Sentence-level 0.847 0.693

ASR Outputs

ALL 0.852 0.703

Document-level 0.871 0.742

Sentence-level 0.735 0.470

Mean Accuracy and Kappa value for classification results using the 239 responses in the Plagiarized set and 1000 random subsets of 239 responses from the control set.

Page 11: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Discussion and Future Work (1)

Precision was higher than the recall: 0.948 vs. 0.888 on human transcriptions; 0.904 vs. 0.831 on ASR outputs.

In an operational system, it may be desirable to tune the classifier to increase the recall.

Balanced canned and control responses were obtained in experiments.

Distribution of actual responses is heavily unbalanced. A much larger control set will be experimented on ASR outputs.

June 18, 201411 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 12: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

Discussion and Future Work (2)

Matching source texts were required: plagiarized responses based on unseen sources cannot been detected.

Obtain additional source texts; compare a test response with all previously collected spoken responses for a given population of test takers.

Above methods may lead to high number of false positives, especially when based on ASR outputs.

Apply N-best list to compute similarity metrics; introduce additional sources of information, such as stylistic patterns and prosodic features.

June 18, 201412 Copyright © 2014 by Educational Testing Service. All rights reserved.

Page 13: Automatic Detection of Plagiarized Spoken Responses Copyright © 2014 by Educational Testing Service. All rights reserved. Keelan Evanini and Xinhao Wang

June 18, 201413 Copyright © 2014 by Educational Testing Service. All rights reserved.

Questions?Comments?

Xinhao Wang, [email protected] Research Scientist, NLP & Speech Group at ETS.

Keelan Evanini, [email protected] Research Scientist, NLP & Speech Group at ETS.

Thank You!