31
Spoken Language Communications Research Laboratories Spoken Language Communication Group 2007 NICT/ATR -1- TMI 2007 Michael Paul, Andrew Finch, Eiichiro Sumita Reducing Human Assessment of Machine Translation Quality to Binary Classifiers Reducing Human Assessment of Machine Translation Quality to Binary Classifiers September 8, 2007 NICT Spoken Language Communication Group, ATR Spoken Language Communication Research Laboratories Kyoto, Japan

Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 1 -TMI 2007

Michael Paul, Andrew Finch, Eiichiro Sumita

Reducing Human Assessment of

Machine Translation Quality

to Binary Classifiers

Reducing Human Assessment of

Machine Translation Quality

to Binary Classifiers

September 8, 2007

NICT Spoken Language Communication Group,

ATR Spoken Language Communication Research Laboratories

Kyoto, Japan

Page 2: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 2 -TMI 2007

Assessment of

Machine Translation Quality

Assessment of

Machine Translation Quality

Sentence

• translation of a single input

Document

• set of sentence translations

Human

• average of sentence-level

grades

• discrete evaluation grade

• median score of multiple

human grades

metrics: fluency, adequacy, …

• confidence estimation

• machine learning approach to

predict human grades

classifiers: SVM, DT, …metrics: BLEU, METEOR, …

Machine

• comparison to (multiple)

reference translations

• assign single numerical score

Page 3: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 3 -TMI 2007

Assessment of

Machine Translation Quality

Assessment of

Machine Translation Quality

Document Sentence

Usage

• evaluation of MT system

development progress

•MT system comparison

(NIST, IWSLT, …)

• usability of given translation

in a real-world application

(post-editing, dialog trans-

lation, …)

Problems • complexity of evaluation task

(multi-class classification)

• granularity of evaluation

grades

• quality/coverage of

reference translations

• “meaning” of (numerical)

automatic evaluation scores

Page 4: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 4 -TMI 2007

Outline of TalkOutline of Talk

2. Experimental Results:

• large-scale human-annotated evaluation corpus

•••• coding matrix optimization

•••• classification accuracy

•••• correlation to human assessments

1. Prediction of Sentence-Level Translation Quality:

•••• decompose multi-class to binary classification° a coding matrix

•••• learn set of binary classifiers° feature selection, standard machine learning techniques

•••• predict multi-class label° compare binary classification results to coding matrix

Page 5: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 5 -TMI 2007

Prediction of Sentence-LevelTranslation Quality

Prediction of Sentence-LevelTranslation Quality

54321

Grade = fluency

Flawless EnglishGood EnglishNon-native EnglishDisfluent EnglishIncomprehensible

Human Assessment

Training Corpus

Test Set

Annotated

Evaluation Corpus

Evaluation(classification accuracy)

5, 4, 3, 2, 1

Binary

Classifier

+1, −−−−1

Machine Learning(SVM, DT, …)

ID | Grade | F1 | F2 | …

Feature Extraction

Multi-class

Classifier

ID | F1 | F2 | …

Feature Extraction

ID | Binary-Class

ID | Multi-Class

Page 6: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 6 -TMI 2007

Classification TaskClassification Task

Goal: predict human evaluation grade (fluency,

adequacy, …) for a given translation

→ multi-class label

Multi-Class Classification:

☺☺☺☺ direct prediction of multi-class label

���� classification accuracy is low

Binary-Class Classification:

☺☺☺☺ classification accuracy is high

���� multi-class label cannot be derived reliably

Page 7: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 7 -TMI 2007

Proposed SolutionProposed Solution

Reduction of Classification Complexity:

• decompose multi-class task into a set of binary

classification problems

• apply standard learning algorithm to train

binary classifiers

• combine results of binary classifiers

using a “coding matrix” to predict multi-class label

→→→→ increase in classification accuracy

→→→→ independent from learning algorithm

Page 8: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 8 -TMI 2007

Proposed SolutionProposed Solution

Feature Selection for Translation Quality Prediction:

• multiple automatic evaluation metric scores

° BLEU °WER ° GTM° NIST ° PER°METEOR ° TER

• metric-internal features

° ngram-prec ° length ratio °…

→→→→ takes into account different aspects of MT quality

→→→→ independent from target language and MT system

Page 9: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 9 -TMI 2007

Binary

Classifier

Binary

Classifier

+1, −−−−1

+1, −−−−1:

Combination

of Binary

Classification

Results

:

Multi-Class To Binary

Class Decomposition

Prediction of Sentence-LevelTranslation Quality

Prediction of Sentence-LevelTranslation Quality

54321

Grade = fluency

Flawless EnglishGood EnglishNon-native EnglishDisfluent EnglishIncomprehensible

Human Assessment

Training Corpus

Test Set

Annotated

Evaluation Corpus

ID | Grade | F1 | F2 | …

Feature Extraction

Machine Learning(SVM, DT, …)

Evaluation(classification accuracy)ID | F1 | F2 | …

Feature Extraction

ID | Multi-Class

Page 10: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 10 -TMI 2007

Proposed MethodProposed Method

1. Decomposition Phase:

• decompose multi-class into set of binary classification tasks:

° one-against-all (5, 4, 3, 2, 1):

Example:

5 : +1 → all training examples tagged with grade 5

−1 → all training examples tagged with grade 4 or 3 or 2 or 1)

° boundary (54_321 , 543_21):

Example:

54_321 : +1 → all training examples tagged with grade 5 or 4

−1 → all training examples tagged with grade 3 or 2 or 1

° all-pairs (5_4 , 5_3 , 5_2 , 5_1 , 4_3 , 4_2 , 4_1 , 3_2 , 3_1 , 2_1):

Example:

5_4 : +1 → all training examples tagged with grade 5

−1 → all training examples tagged with grade 4

Page 11: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 11 -TMI 2007

Proposed MethodProposed Method

2. Learning Phase:

• learn binary classifier for each decomposition task

° feature set selection/extraction

(exp): + 54 features (7 autom. eval. scores + metric-internal features)

° classifier training

(exp): + fluency/adequacy, DT classifier (+ SVM classifier)

• identify optimal subset of binary classifiers

• create coding matrix° column: class of pos./neg. training examples (for given binary classifier)

° row: correct binary classification result (for a given multi-class label)

3. Application Phase:

• apply all binary classifiers to given input → classification vector v

• match v against coding matrix rows to identify multi-class label

Page 12: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 12 -TMI 2007

Outline of Proposed MethodOutline of Proposed Method

Source Text

Evaluation Grade

MT system

Translated Text

Feature Extraction

BinaryClassifiers

Binary Class Vector

Compare

AnnotatedEvaluation Corpus

Coding Matrix

Optimization

Optimized

Coding Matrix

Class De-

composition

Machine

Learning

BinaryClassifiers

Coding

Matrix

ApplicationDecomposition/Learning

Page 13: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 13 -TMI 2007

Coding MatrixCoding Matrix

( )

{ }0,1,1,

,...,1;,...,1,

−+∈

===

ji

ljkiji

m

one-against-all all-pairs

( k=3 , l=3 )

−1−10c3+1−1−1c3

+1

0

c2•c3

0

+1

c1•c3

−1

−1

c3 • c1 c2

+1

−1

c2 • c1 c3 c1•c2c1 • c2 c3

c2

c1

−1

+1

c2

c1

−1

+1

Page 14: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 14 -TMI 2007

all-pairs

−1−10c3

+1

0

c2 • c3

0

+1

c1 • c3c1 • c2

c2

c1

−1

+1

Combination of Binary

Classifiers using a Coding Matrix

Combination of Binary

Classifiers using a Coding Matrix

−1

c2 • c3

+1

c1 • c3c1 • c2

v +1

input

Hamming

Distance(number of

positions

that differ)

= 3

Page 15: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 15 -TMI 2007

2

3

1

distance

−1

+1

0

c2 • c3

−10c3

c1

select

0

+1

c1 • c3c1 • c2

c2

c1

all-pairs

−1

+1

Combination of Binary

Classifiers using a Coding Matrix

Combination of Binary

Classifiers using a Coding Matrix

−1

c2 • c3

+1

c1 • c3c1 • c2

v +1

input

Page 16: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 16 -TMI 2007

Basic Travel Expression Corpus (BTEC):

• 36K English translations of 4K Japanese/Chinese inputs

• human assessments and automatic evaluation scores

Evaluation CorpusEvaluation Corpus

0%

10%

20%

30%

40%

5 4 3 2 1

fluency adequacy

7,590 (15 MT x 506) test

2,024 ( 4 MT x 506)develop

25,988training

fluency/ adequacysentence count

Page 17: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 17 -TMI 2007

40

50

60

70

80

90

100

mul ti

class

54_3

2154

3_21 5_

4

5_3

5_2

5_1

4_3

4_2

4_1

3_2

3_1

2_1 5 4 3 2 1

(%)Fluency

Coding Matrix Optimization(classification accuracy on DEV set)

Coding Matrix Optimization(classification accuracy on DEV set)

classification accuracy

Page 18: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 18 -TMI 2007

Coding Matrix Optimization(classification accuracy on DEV set)

Coding Matrix Optimization(classification accuracy on DEV set)

−1

−1

−1

+1

+1

54_

321

−1

−1

+1

+1

+1

543

_21

0

0

0

−1

+1

5_4

0

0

−1

0

+1

5_3

0

−1

0

0

+1

5_2

−1

0

0

0

+1

5_1

0

0

−1

+1

0

4_3

0

−1

0

+1

0

4_2

−1

0

0

+1

0

4_1

0

−1

+1

0

0

3_2

−1

0

+1

0

0

3_1

−1

+1

0

0

0

2_1

−1

−1

−1

+1

−1

4

−1

−1

+1

−1

−1

3

−1−1−14

−1−1−13

+1−1−11

−1

−1

1

+1

−1

25

2

5

−1

+1

Coding Matrix

Page 19: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 19 -TMI 2007

0

5

10

15

20

25

30

35

40

45

ALL

Fluency

Coding Matrix Optimization(omission of worst-performing classifier)

Coding Matrix Optimization(omission of worst-performing classifier)

classification accuracy

(%)

omitted binary classifier

Page 20: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 20 -TMI 2007

40

50

60

70

80

90

100

multi

c las

s

54_3

2154

3_21 5_

4

5_3

5_2

5_1

4_3

4_2

4_1

3_2

3_1

2_1 5 4 3 2 1

(%)Fluency

Coding Matrix Optimization(classification accuracy on DEV set)

Coding Matrix Optimization(classification accuracy on DEV set)

classification accuracy

Page 21: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 21 -TMI 2007

Coding Matrix Optimization(classification accuracy on DEV set)

Coding Matrix Optimization(classification accuracy on DEV set)

−1

−1

−1

+1

+1

54_

321

−1

−1

+1

+1

+1

543

_21

0

0

0

−1

+1

5_4

0

0

−1

−1

+1

5_3

0

−1

0

0

+1

5_2

−1

0

0

0

+1

5_1

0

0

−1

+1

0

4_3

0

−1

0

+1

0

4_2

−1

0

0

+1

0

4_1

0

−1

+1

0

0

3_2

0

−1

+1

0

0

3_1

−1

+1

0

0

0

2_1

−1

−1

−1

+1

−1

4

−1

−1

+1

−1

−1

3

−1−1−14

−1−1−13

+1−1−11

−1

−1

1

+1

−1

25

2

5

−1

+1

Coding Matrix

Page 22: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 22 -TMI 2007

0

5

10

15

20

25

30

35

40

45

ALL 4_

3

Fluency

Coding Matrix Optimization(omission of worst-performing classifier)

Coding Matrix Optimization(omission of worst-performing classifier)

classification accuracy

(%)

omitted binary classifier

Page 23: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 23 -TMI 2007

40

50

60

70

80

90

100

multi

c las

s

54_3

2154

3_21 5_

4

5_3

5_2

5_1

4_3

4_2

4_1

3_2

3_1

2_1 5 4 3 2 1

(%)Fluency

Coding Matrix Optimization(classification accuracy on DEV set)

Coding Matrix Optimization(classification accuracy on DEV set)

classification accuracy

Page 24: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 24 -TMI 2007

Coding Matrix Optimization(classification accuracy on DEV set)

Coding Matrix Optimization(classification accuracy on DEV set)

−1

−1

−1

+1

+1

54_

321

−1

−1

+1

+1

+1

543

_21

0

0

0

−1

+1

5_4

0

0

−1

−1

+1

5_3

0

−1

0

0

+1

5_2

−1

0

0

0

+1

5_1

0

0

−1

+1

0

4_3

0

−1

0

+1

0

4_2

−1

0

0

+1

0

4_1

0

−1

+1

0

0

3_2

0

−1

+1

0

0

3_1

−1

+1

0

0

0

2_1

−1

−1

−1

+1

−1

4

−1

−1

+1

−1

−1

3

−1−1−14

−1−1−13

+1−1−11

−1

−1

1

+1

−1

25

2

5

−1

+1

Coding Matrix

Page 25: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 25 -TMI 2007

0

10

20

30

40

50

60

ALL 4_

3

3_2

2_1

4_2

3_1

54_3

21

Fluency

Coding Matrix Optimization(omission of worst-performing classifier)

Coding Matrix Optimization(omission of worst-performing classifier)

classification accuracy

(%)

omitted binary classifier

Page 26: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 26 -TMI 2007

0

10

20

30

40

50

60

ALL 4_

3

3_2

2_1

4_2

3_1

54_3

2154

3_21 4_1 5 1

5_4

5_3

5_2

5_1 2 3

Fluency

Coding Matrix Optimization(omission of worst-performing classifier)

Coding Matrix Optimization(omission of worst-performing classifier)

classification accuracy

(%)

omitted binary classifier

Page 27: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 27 -TMI 2007

Coding Matrix Optimization(classification accuracy on DEV set)

Coding Matrix Optimization(classification accuracy on DEV set)

−1

−1

−1

+1

+1

54_

321

−1

−1

+1

+1

+1

543

_21

0

0

0

−1

+1

5_4

0

0

−1

−1

+1

5_3

0

−1

0

0

+1

5_2

−1

0

0

0

+1

5_1

0

0

−1

+1

0

4_3

0

−1

0

+1

0

4_2

−1

0

0

+1

0

4_1

0

−1

+1

0

0

3_2

0

−1

+1

0

0

3_1

−1

+1

0

0

0

2_1

−1

−1

−1

+1

−1

4

−1

−1

+1

−1

−1

3

−1−1−14

−1−1−13

+1−1−11

−1

−1

1

+1

−1

25

2

5

−1

+1

Optimized Coding Matrix

Page 28: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 28 -TMI 2007

202530354045505560657075

multi-class coding matrix

fluency adequacy

Evaluation(classification accuracy on TEST set)

Evaluation(classification accuracy on TEST set)

49.2 / 56.0 55.2 / 62.2

+6.0 / +6.6

(%)

classification accuracy

Page 29: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 29 -TMI 2007

Correlation to Human Assessment on Sentence-Level

Correlation to Human Assessment on Sentence-Level

Fluency

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

codi

ng m

atrix

mul

ticla

ssM

ETEO

R

WER TE

R

BLEU PE

R

GTM

NIST

Page 30: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 30 -TMI 2007

Correlation to Human Assessment on Sentence-Level

Correlation to Human Assessment on Sentence-Level

Adequacy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

codi

ng m

atrix

MET

EOR

mul

ticla

ss

GTM WER PE

R

NIST

BLEU TE

R

Page 31: Spoken Language Communication Group Reducing Human ...away/TMI-07/Slides/Paul_slides.pdf · Spoken Language Communication Group TMI 2007 -2- 2007 NICT/ATR Assessment of Machine Translation

Spoken Language Communications

Research Laboratories

Spoken Language Communication Group

2007 NICT/ATR- 31 -TMI 2007

SummarySummary

Multiclass reduction to binary:

• robust and reliable method to predict human assessments

on sentence-level

• high correlation to human judges outperforming commonly used

automatic evaluation metrics

• outperforms standard classification methods→ gains: +6.0 (fluency) and +6.6 (adequacy) in classification accuracy

Extension of proposed method:

• apply learning method to select features used to build

the coding matrix

• investigate in the use of additional features that increase

binary classification accuracy and thus boost overall multi-class

prediction accuracy