36
The Link between Controlled Language and Post-Editing: An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon O’Brien, CTTS/SALIS

The Link between Controlled Language and Post-Editing:

  • Upload
    wiley

  • View
    46

  • Download
    0

Embed Size (px)

DESCRIPTION

The Link between Controlled Language and Post-Editing:. An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon O’Brien, CTTS/SALIS. Overview. Research Parameters Temporal Effort Technical Effort Cognitive Effort Conclusions. Definition. - PowerPoint PPT Presentation

Citation preview

Page 1: The Link between Controlled Language and Post-Editing:

The Link between Controlled Language and Post-Editing:

An Empirical Investigation of Technical, Temporal and Cognitive Effort

Sharon O’Brien, CTTS/SALIS

Page 2: The Link between Controlled Language and Post-Editing:

Overview

• Research Parameters

• Temporal Effort

• Technical Effort

• Cognitive Effort

• Conclusions

Page 3: The Link between Controlled Language and Post-Editing:

Definition

– an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style.

(Huijsen, 1998: 2)

Page 4: The Link between Controlled Language and Post-Editing:

Motivation – In a Nutshell

• Can the introduction of CL rules really improve MT output such that post-editing effort is reduced?

Page 5: The Link between Controlled Language and Post-Editing:

Machine “Translatability”

• One of the main “goals” of CL

• The notion of translatability is based on so-called "translatability indicators" where the occurrence of such an indicator in the text is considered to have a negative effect on the quality of machine translation. The fewer translatability indicators, the better suited the text is to translation using MT.

(Underwood and Jongejan 2001: 363)

Page 6: The Link between Controlled Language and Post-Editing:

Machine “Translatability”

• “Negative” Translatability Indicators– “NTIs” for short– Examples (for English as SL)

• Long noun phrases• Passive voice• Ungrammatical constructs• Use of slang…

– Use of NTI list (Bernth/Gdaniec 2001)– Use of term “minimal NTI”

Page 7: The Link between Controlled Language and Post-Editing:

Research Design

• SL: English; TL: German

• Text Type: User Manual (1 777 words)

• Users: 12 Professional Translators

• Tools: IBM Websphere, Translog, IBM’s EasyEnglishAnalyzer, Sun Microsystem’s Sunproof

• Place of Data Capture: IBM Stuttgart

Page 8: The Link between Controlled Language and Post-Editing:

Methodology

• Edit SL text to create two sentence types:– S(nti) = sentences with known negative

translatability indicators– S(min-nti) = sentences where all listed NTIs

had been removed

• 9 subjects: post-editing (P1-P9)

• 3 subjects: translating (T1-T3)

• First pass exercise, no QA

Page 9: The Link between Controlled Language and Post-Editing:

Temporal Effort

• Post-Editing vs. Translation– median words per

minute

Subject Type

Median words per

minute

Post-Editors

17.59

Translators 13.63

Page 10: The Link between Controlled Language and Post-Editing:

Temporal Effort (2)

• Post-Editing vs. Translation– median processing speed

• Processing speed is the total number of source words in each segment divided by the total processing time for that segment– i.e. words processed per second

Page 11: The Link between Controlled Language and Post-Editing:

Median Processing Speed

• S(ntis) vs. S(min-ntis)

Segment Type Median Processing

Speed

S(nti) .350

S(min-nti) .435

Page 12: The Link between Controlled Language and Post-Editing:

Temporal Effort: Conclusions

• The post-editing task was completed faster than the translation task.– First-pass exercise/No QA

• The median processing speeds for S(min-nti) segments were significantly higher than S(nti) segments

• So, from a temporal point of view, it seems that the introduction of CL benefits turnaround times

Page 13: The Link between Controlled Language and Post-Editing:

Technical Effort

• Measured using Translog:– Keyboarding

• Deletions, insertions, cuts, pastes

– Dictionary Look-Up Activity

Page 14: The Link between Controlled Language and Post-Editing:

Translog

Page 15: The Link between Controlled Language and Post-Editing:

Sample Linear Repetition File

Page 16: The Link between Controlled Language and Post-Editing:

Keyboarding Median Measurements

Segment Type

Median Deletions

Median Insertions

Median Cuts

Median Pastes

S(nti) 4.00 4.00 0.00 0.00

S(min-nti) 3.00 3.00 0.00 0.00

Page 17: The Link between Controlled Language and Post-Editing:

Keyboarding Median Measurements

• Small difference between the two segment types, but statistically significant for insertions/deletions

• Cutting and pasting: very limited even though post-editors recycled whole chunks of text

Page 18: The Link between Controlled Language and Post-Editing:

Use of the Translog Dictionary

• Training and practice prior to task

• All users reported being comfortable with the feature

Page 19: The Link between Controlled Language and Post-Editing:

Data on Dictionary Usage

Subject Successful Dictionary Look-

Up

Unsuccessful Dictionary Look-

Up

P1 0 0

P2 0 1

P3 0 5

P4 0 0

P5 0 1

P6 1 0

P7 0 1

P8 0 5

P9 0 1

Page 20: The Link between Controlled Language and Post-Editing:

Possible Explanations?

• Subjects not as familiar with feature as they reported

• Subjects felt it was unnecessary to use dictionary

• Subjects used to having terms suggested on-screen with TM/Terminology tool

• Subjects lost faith in the feature when they encountered problems

Page 21: The Link between Controlled Language and Post-Editing:

Conclusions on Technical Effort

• S(min-nti) segments require significantly fewer deletions and insertions than S(nti) segments.

• Cutting and pasting was a very rare activity for both segment types.

• Dictionary searches were uncommon during this study. When they were carried out, the search facility was frequently used incorrectly.

Page 22: The Link between Controlled Language and Post-Editing:

Technical/Temporal Combined

• Results on technical post-editing effort add to the evidence presented above on temporal post-editing effort and further supports the claim that the elimination of NTIs from a segment can reduce post-editing effort.

Page 23: The Link between Controlled Language and Post-Editing:

Cognitive Effort

• Potential Methodologies– TAP (rejected)– Pause Analysis – Choice Network Analysis– Eye tracking (unavailable at the time)

Page 24: The Link between Controlled Language and Post-Editing:

Pause Behaviour

• No discernible correlations between pause behaviour and post-editing activity– Pause analysis rejected

Page 25: The Link between Controlled Language and Post-Editing:

Cognitive Effort

• Choice Network Analysis

Page 26: The Link between Controlled Language and Post-Editing:

Choice Network Analysis

• …Choice Network Analysis compares the renditions of a single string of translation by multiple translators in order to propose a network of choices that theoretically represents the cognitive model available to any translator for translating that string. The technique is favoured over the think-aloud method, which is acknowledged as not being able to access automaticized processes.

(Campbell, 2000: 215)

Page 27: The Link between Controlled Language and Post-Editing:

Example – Sentence with NTIs

• ST:– “Save the document(s).”

• Raw MT output:– „Sichern Sie das Dokument(s).“

• NTIs for this sentence:– Short segment– Use of “(s)” for plural

Page 28: The Link between Controlled Language and Post-Editing:

MT Sichern Sie das Dokument (s.)

P1 Sichern Sie das Dokument/die Dokumente.

P2 Sichern Sie das bzw. die Dokumente.

P3 Sichern Sie das/die Dokument/e.

P4 Sichern Sie das/die Dokument (e).

P6 Sichern Sie das Dokument/die Dokumente

P7 Sichern Sie das Dokument.

P8 Speichern Sie das/dieDokument(e).

P9 Sichern Sie das Dokument.

Page 29: The Link between Controlled Language and Post-Editing:

Example – Sentence with minimal NTIs

• ST:– “The editor contains a menu and a toolbar.”

• Raw MT output:– „Der Editor enthält ein Menü und eine

Symbolleiste.“

Page 30: The Link between Controlled Language and Post-Editing:

MT Der Editor enthält ein Menü und eine Symbolleiste.

P1 Der Editor enthält ein Menü und eine Symbolleiste.

P2 Der Editor enthält ein Menü und eine Symbolleiste.

P3 Der Editor enthält ein Menü und eine Symbolleiste.

P4 Der Editor enthält ein Menü und eine Symbolleiste.

P5 Der Editor enthält ein Menü und eine Symbolleiste.

P6 Der Editor enthält ein Menü und eine Symbolleiste.

P7 Der Editor enthält ein Menü und eine Symbolleiste.

P8 Der Editor enthält ein Menü und eine Symbolleiste.

P9 Der Editor enthält ein Menü und eine Symbolleiste.

Page 31: The Link between Controlled Language and Post-Editing:

NTIs and Cognitive Effort

• Using CNA as a guide, NTIs categorised into:• High impact on post-editing effort

– 50% or more of the occurrences of the NTI resulted in post-editing by two or more post-editors

• Moderate impact on post-editing effort– Between 31% and 49% of occurrences

• Low impact on post-editing effort– 30% or fewer occurrences

Page 32: The Link between Controlled Language and Post-Editing:

Correlating Measurements

• By combining data on temporal, technical and cognitive effort: High Impact NTIs– Use of the gerund– Proper nouns– Problematic punctuation– Ungrammatical constructs– Use of (s) for plural– Non-finite verbs– Incomplete syntactic unit– Long NP– Short segment

Page 33: The Link between Controlled Language and Post-Editing:

Correlating Measurements

• Moderate impact NTIs:– Multiple coordinators– Passive voice– Personal pronouns– Use of a slash as a separator– Ambiguous scope in coordination– Parentheses

Page 34: The Link between Controlled Language and Post-Editing:

Correlating Measurements

• Low impact NTIs:– Abbreviations– Demonstrative pronouns– Missing “in order to”– Contractions

Page 35: The Link between Controlled Language and Post-Editing:

Conclusion

• Within the limited scope of this research, we now have empirical evidence to support the assertion that controlling the input to MT leads to lower post-editing effort.

• The elimination of some NTIs can have a higher impact than other NTIs– Is it worth having a relatively high number of CL

rules?

• Even if we remove known NTIs, MT engines are still likely to produce some errors and post-editors are still likely to post-edit.

Page 36: The Link between Controlled Language and Post-Editing:

Questions?