Upload
laureen-bradford
View
219
Download
0
Embed Size (px)
Citation preview
Computer-Aided Language Processing
Ruslan MitkovUniversity of Wolverhampton
The rise and fall of Natural Language Processing (NLP)?
Automatic NLP: expectations fulfilled? Many practical applications such as
IR, shy away from NLP techniques Performance accurate? There are many applications such as
word alignment, anaphora resolution, term extraction where accuracy could be well below 100%
Dramatic improvements feasible in foreseeable time?
Context
Promising NLP projects, results but In the vast majority of real-world
applications, fully automatic NLP is still far from delivering reliable results.
Alternative: computer-aided language processing (CALP)
Computer-aided scenario: Processing is not done entirely by
computers Human intervention improves, post-
edits or validates the output of the computer program.
Historical perspective
Martin Kay’s (1980) paper on machine-aided translation
Machine-Aided Translation
The translator sends the simple sentences for translation to the computer and translates the more difficult, complex ones him(her)self.
CALP examples
Machine-aided Translation Summarisation (Orasan, Mitkov and Hasler
2003) Generation of multiple-choice tests (Mitkov
and An, 2003; Mitkov, An and Karamanis 2006)
Information extraction (Cunningham et al 2002)
Acquisition of semantic lexicons (Riloff and Schmelzenbach, 1998)
Annotation of corpora (Orasan 2005) Translation Memory
Translation Memory
A Translation Memory is a linguistic database that collects all your translations and their target language equivalents as you translate.A Translation Memory is a database that collects all your translations and their target language equivalents as you translate.Match 87%linguistic linguistische
CALP applications in focus
Machine-aided Translation Translation Memory Annotation tools Computer-aided Summarisation Computer-aided Generation of
Multiple-Choice Tests
MAT: the Penang experiment Books/manuals averaging about 250 pages
translated manually by a translation bureau and by a Machine-Aided Translation program (SISKEP).
Manual translation took 360 hours on average
Translation by a Machine-Aided Translation program needed 200 hours on average
Efficiency rate: 1.8
Translation Memory
A case study (Webb 1998) Client saves 40% money, 70% time Translator / translation agency saves
69% money, 70% time Efficiency rate: 3.3
PALinkA: multi-task annotation tool
Employed in a number of corpora annotation tasks
(Semi-automatic) mark-up of coreference
(Semi-automatic) mark-up of centering
(Semi-automatic) mark-up of events
The noun “the storm” is marked coreferential with the noun “the cyclone”. WordNet is consulted to find out the relation between them
The user can override theinformation retrieved from WordNet
PALinkA: multi-task annotation tool (II)
Webpage: http://clg.wlv.ac.uk/projects/PALinkA
Old version: over 500 downloads used in several projects
New version: supports plugins (not available for download yet.
Further CALP experiments (evaluations) at the University of Wolverhampton
Computer-aided summarisation Computer-aided generation of multiple-
choice tests Efficiency and quality evaluated in both
cases
Computer-aided summarisation CAST: computer-aided summarisation tool
(Orasan, Mitkov and Hasler 2003) Combines automatic methods with human
input Relies on automatic methods to identify the
important information Humans can decide to include this
information and/or additional one Humans post-edit the information to
produce a coherent summary
Evaluation (Orasan and Hasler 2007)
Time for producing summaries with and without CAST
Consistent familiarity-effect-extinguished model: same texts produced manually and with the help of the program in intervals of 1 year
Human had to choose the better summary when presented with a pair of summaries
Experiment 1 Used one professional summariser 69 texts from CAST corpus were used Summaries were produced with and
without the tool at one year distance
Without CAST With CAST Reduction %
Newswire texts 498secs 382secs23.29%
New Scientist texts 771secs 623secs19.19%
Efficiency rate: 1.25
Experiment 1
Term-based summariser used in the process evaluated
Correlation between the success of the automatic summariser and the time reduction
Experiment 2
Turing-like experiment where humans were asked humans to pick the better summary in a pair
Each pair contained one summary produced with CAST and one without CAST
17 judges were shown 4 randomly selected pairs
Experiment 2 In 41 pairs the summary produced
with CAST was preferred In 27 pairs the summary produced
without CAST was preferred Chi-square shows that there is no
statistically significant difference with 0.05 confidence
Discussion Computer-aided summarisation works
for professional summarisers Reduces the time necessary to
produce summaries by about 20% Quality of summaries not
compromised
Computer-aided generation of multiple-choice tests (Mitkov and Ha 2003)
Multiple-choice test: an effective way to measure student achievements.
Fact: development of multiple-choice tests is a time-consuming and labour intensive task
Alternative: computer-aided multiple-choice test generation based on a novel NLP methodology
How does it work?
Methodology
The system identifies the important concepts in text
Generates questions focusing on these concepts
Chooses semantically closest distractors
NLP-based methodologyterm
extraction
terms (key concepts)
test items
distractor selection
question generation
wordnet
narrative texts
distractors
transformational rules
Test developer’s post-editing environment
First version of system: 3 distractors generated to be post-edited
Current version of system: long list of distractors generated with the user choosing 3 from them
Test developer’s post-editing environment (2)
Post-editing Automatic generation
Test items classed as “worthy” (57%) or “unworthy” (43%)
About 9% of the automatically generated items did not need any revision
From the revisions needed: minor (17%), fair (36%), and major (47%)
In-class experiments Controlled set of test items administered First experiment: 24 items constructed with the
help of the first version of the system Second experiment: another 18 items
constructed with the help of the current version of the system
Further 12 manually produced items included 113 undergraduate students took the test
45 in first experiment 78 in second experiment subset of second group (30) replied to manually
produced test
Evaluation
Efficiency of the procedure Quality of the test items
Evaluation: efficiency of the procedure Efficiency:
6' 55''450'65manual
average time per item
timeitems produced
1' 48''540'300computer-aided
Evaluation (A): quality of the test items Item analysis
Item Difficulty (= C/T)
Discriminating Power (=(CU-CL):T/2)
Usefulness of the distractors (comparing no. of students in upper and lower groups who selected each incorrect alternative)
Evaluation: results
2.94541300.36000.587818computer-
aided (new)
1.92653610.4030.754524computer-aided (old)
1.18338500.26020.563012manual
avg. difference
totalnot
usefulpoor
neg. discriminating
power
avg. discriminating
power
toodifficult
avg.item
difficulty#students#items
USEFULNESS OF
DISTRACTORS
ITEM DISCRIMINATING
POWER
ITEMDIFFICULTY
TEST
tooeasy
Discussion Computer-aided construction of
multiple-choice test items is much more efficient than purely manual construction (efficiency rate 3.8)
Quality of test items produced with the help program is not compromised in exchange for time and labour savings
Efficiency rates summary
CALP: summarisation 1.25 CALP: MAT 1.8 CALP: TM 3.3 CALP: generation of 3.8
multiple-choice tests
Conclusions
CALP: attractive alternative of automatic NLP
CALP: significant efficiency (time and labour)
CALP: no compromise of quality
Further information
My web page: www.wlv.ac.uk/~le1825
The Research Group in Computational Linguistics: clg.wlv.ac.uk