19
Joke Daems [email protected] www.lt3.ugent.be/en/projects/robot Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same coin assessing translation quality through adequacy and acceptability error analysis

Joke Daems [email protected] Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same

Embed Size (px)

Citation preview

Joke [email protected]

www.lt3.ugent.be/en/projects/robot

Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker

Two sides of the same coinassessing translation quality through

adequacy and acceptability error analysis

What makes error analysis so complicated?

“There are some errors for all types of distinctions, but the most problematic

distinctions were for adequacy/fluency and seriousness.”

– Stymne & Ahrenberg, 2012

Does a problem concern adequacy, fluency, both, neither?

How do we determine the seriousness of an error?

Two types of quality

“Whereas adherence to source norms determines a translation's adequacy as

compared to the source text, subscription to norms originating in the target culture

determines its acceptability.”- Toury, 1995

Why mix?

2-step TQA approach

Acceptability= target norms

Adequacy= target vs.

source

Quality Assessment

Subcategories

Acceptability

Grammar & Syntax

Lexicon

Spelling & typos

Style & register

Coherence

Adequacy

Contradiction

Deletion

Addition

Word sense

Meaning shift

Acceptability: fine-grainedGrammar & Syntax Lexicon Spelling & Typos Style & Register Coherence

article wrong preposition capitalization register conjunction

comparative/superlative wrong collocation spelling mistake untranslated missing info

singular/plural word nonexistent compound repetition logical problem

verb form punctuation disfluent paragraph

article-noun agreement typo short sentences inconsistency

noun-adj agreement long sentence coherence - other

subject-verb agreement text type

reference style – other

missing

superfluous

word order

structure

grammar – other

Adequacy: fine-grained

Meaning shift

contradiction meaning shift caused by misplaced word

word sense disambiguation deletion

hyponymy addition

hyperonymy explicitation

terminology coherence

quantity inconsistent terminology

time other

meaning shift caused by punctuation

How serious is an error?

“Different thresholds exist for major, minor and critical errors. These should be flexible,

depending on the content type, end-user profile and perishability of the content.”

- TAUS, error typology guidelines, 2013

Give different weights to error categories depending on text type & translation brief

Reducing subjectivity

• Flexible error weights

• More than one annotator

• Consolidation phase

TQA: Annotation (brat)

1) Acceptability

2) Adequacy

Application example: comparative analysis

wrong collocation

word sense

deletion

punctuation

other meaning shift

0% 2% 4% 6% 8% 10% 12%

Top HT problems newspaper articles

punctuationother meaning shift

compoundtypo

word sensewrong collocation

0% 2% 4% 6% 8% 10% 12%

Top PE problems newspaper articles

wrong collocationuntranslated

other meaning shiftcompound

logical problemterminology

0% 2% 4% 6% 8%10%

12%14%

16%18%

20%

Top HT problems technical texts

other meaning shift

untranslated

article

logical problem

terminology

compound

0% 2% 4% 6% 8% 10%12%14%16%18%

Top PE problemstechnical texts

Next step:diagnostic & comparative evaluation

• What makes a ST-passage problematic?• How problematic is this passage really? (i.e.:

how many translators make errors)• Which PE errors are caused by MT?• Which MT errors are hardest to solve?

Link all errors to corresponding ST-passage

Source text-related error sets

• ST: Changes in the environment that are sweeping the planet...• MT: Veranderingen in de omgeving die het vegen van de

planeet tot stand brengen... (wrong word sense) "Changes in the environment that bring about the brushing of the planet..."

• PE1: Veranderingen in de omgeving die het evenwicht op de planeet verstoren... (other type of meaning shift) "Changes in the environment that disturb the balance on the planet..."

• PE2: Veranderingen in de omgeving die over de planeet rasen... (wrong collocation + spelling mistake) "Changes in the environment that raige over the planet..."

Application example: impact of MT errors on PE

0

5

10

15

20

25

30

Top 10 MT errors newspaper articles

compound

terminology

article

logical p

roblem

other meaning sh

ift

deletion

structu

re

verb fo

rm

missing co

nstituent

word ord

er0

5

10

15

20

25

30

Top 10 MT errors technical texts

Summary

• Improve error analysis by:

– judging acceptability and adequacy separately

– making error weights depend on translation brief

– having more than one annotator

– introducing consolidation phase

• Improve diagnostic and comparative evaluation by:

– linking errors to ST-passages

– taking number of translators into account

Open questions

• How can we reduce annotation time?– Ways of automating (part) of the process?– Limit annotation to subset of errors?

• How to better implement ST-related error sets?– Ways of automatically aligning ST, MT, and various

TT’s at word-level?

Thank you for listening

For more information, contact: [email protected]

Suggestions?Questions?

Quantification of ST-related error sets

ST

MT (1)

MT1(0.5)

wrong word sense (0.5)

MT2 (0.5)

PE (1)

PE1 (0.5)

other meaning shift

(0.5)

PE2(0.5)

wrong collocation

(0.25)

spelling mistake (0.25)

Inter-annotator agreementHT&PEacceptability HT&PE adequacy MT acceptability MT adequacy

Exp1 Exp2 Exp1 Exp2 Exp1 Exp2 Exp1 Exp2

Initial agreement

39% (κ=0.32)

50%(κ=0.44)

42% (κ=0.31)

46%(κ=0.30)

53% (κ=0.49)

79%(κ=0.77)

57% (κ=0.46)

51%(κ=0.41)

Agreement after consolidation

67% (κ=0.65)

81%(κ=0.80)

82% (κ=0.79)

94%(κ=0.92)

84% (κ=0.83)

95%(κ=0.94)

94% (κ=0.92)

86%(κ=0.83)

Correlation between annotators

r=0.67, n=38, p<0.001

r=0.95, n=34, p<0.001

r=0.87, n=38, p<0.001

r=0.86, n=34, p<0.001

n/a n/a n/a n/a

Agreement on categories

90% (κ=0.89)

89%(κ=0.88)

89% (κ=0.87)

88%(κ=0.83)

83% (κ=0.81)

93%(κ=0.93)

86% (κ=0.79)

86%(κ=0.82)