Factors in Evaluation-tránlation

8/11/2019 Factors in Evaluation-tránlation

http://slidepdf.com/reader/full/factors-in-evaluation-tranlation 1/13

Technology

Translation

Strategy

AunrucAN

TneNsLAToRS

AssocIATIoN

ScnorARLY

MoNocRAPH

Series

Volume

II 19BB

EDITED BY

Muriel

Vasconcellos

University

York at

Binghamton

(SUNY)

Copyright o

State University

Binghamton

(SUNY)

0890-41

Printed

United

States of America

Factors

Eualuation

Formal

Functional Approaches

MURIEL VASCONCELLOS

ec,)aluations

focusedlargely

on anaLysis of the output

and on

cost-effectic,)eness

throughput.

recently

there has

awdTeness

a systemt

q,talue

depends on

use to which

lrns shifted

suitability

the application

question.

sideration

must be

broa"d

factors

the systemt envi'

ronmentbefore

judgment

canbe made

ouerall

effectiuene.ss.

fauors

are emphnsized

differently

by developers, mandgers,

translators,

,{sers

product.

au*tor,

translator

of many

years'

experience,

guided

development

and implementation

American

OrganiTation

worked

on the

original

project

Georgetown

niversity.

question

whether or

an MT system

effec-

viewed,

depending on the

who asks

it, from

different

perspectives.

developer is

concerned

with how well

syntactic

structures

and semantic representations have been

handled.

Dictionary

errors,

recognized

instantly for

wiil be

relevant

manager,

on the other hand, wants

know if

money.

calculate

"break-even"

point-i.e.,

the volume

of translation that needs

processed

before

begins to

return on the

investment. The

translator, in

wili focus

how much

do. The independent

translator

system

extent

him time

and effort and

therefore

enhances his

productivity.

The in-

translator,

hand, looks

not only

productivity

204 Evaluation

of MT: Formal

vs. Functional

Approaches

but also

at the

implications

personal

within

working

environment-

how he

procedures,

impact

rarely

on the

security

of his

finally,

end-user will

that he

can use.

consumer

of information, for

example,

wants mainly to

can be discerned

the text without

distor-

He may

also be

interested

turnaround,

formation

only in relation

to its

timeliness.

message

attack

is stale

useless

the following

afternoon. The

lisher

product

manuals,

prepared

factor

some degree

of editing

equation, and

at how

ily the

system

fits into

production

chain,

especially

if the

translated

several

languages.

Some evaluations of MT have combined

more than one

of these

perspectives.

in itself

sense,

because it is

important

to take

into account

the interests

all who

directly

concerned.

evaluators

should

be careful not to

generalize

MT based

only on

interests

principals

involved-be

they de'

velopers

managers

translators

end-users.

the final

analysis

success

depends

on the cooperation

everyone, no

concerns can

afford

subsumed.

Formal

evaluations of MT

parameters

appear

to be)

susceptible

measurement.

Specific,

isolated

aspects are under

scrutiny-usually

only one

variables.

Most commonly

on the

machine output.

Although

no way

that these

studies

a total

picture

how MT

environment,

still they

can be

meaningful when undertaken

perspective.

ample,

studies

output

quality

informative

those end-

dealing

the raw

product

directly,

should

be discounted

purposes

when combined

teria. Formal

results

alone, however,

provide

an adequate

on which

conclusions

about the

effectiveness

of MT. Far

useful is

functional

approach,

account is taken

of all

factors

involved

meeting the

purpose

tem is intended

serve in

particular

environment. It is

neces-

sarily

complex

undertaking.

Whatever the approach,

desire

a "yes"

answer,

reconcile

differ-

perspectives,

should not

allowed

obscure

diversity

terests

stake.

actual

practice,

expected

MURIEL

VASCONCELLOS

evaluations

include

elements

formal

the func-

tional

approaches.

Formal Approaches

famous

evaluation

to the

"ALPAC

Report,"

published in 1966

National

Academy

Sciences.

Report's

findings

were based

initiated in

Automatic

Language Processing

Advisory

Committee

(ALPAC)

had been

appointed

request

of the

National

Science

Foundation to

advise

the Department of Defense,

Central

Intelligence

Agency,

the Foundation

itself

on research

development in the

of machine translation.

almost

everyone

anything

about MT, the re-

of this study

to cast

a shadow on

MT development in the

United

States

persists

present

effect

the Re-

to cut off

almost

all U.S. Government funding

velopment. MT

from which it

now beginning

recover.

tremendous

impact

of this

study is

totally out

proportion

to the

quality

that went

into it.

Many of the

Report's con-

clusions

were based on

an evaluation

output

of view

solely

of one who

would have

the raw,

unposted-

ited product

"consumer

information"

mentioned

above).

machine's

output was subjected

direct comparison

translation,

(somewhat

irrelevant)

conclusion

reached

fully automatic high-quality

machine translation

(FAHQMT)

possible.

Today, now

that we

know more

about what

happens

people

actually

MT, it is

interesting

to look back

and examine

methodology

behind

this study.

approach

was basically

formal.

Committee

focused

the raw

machine

output

and undertook

evaluate

the product

two major characteristics

translation":

intel-

ligibility,

fidelity to the

sense of the

original

testing

methodology

was developed

ALPAC in

to ac-

complish

sentences

selected

randomly

different

Russian-into-English

translations

of the

same text, three of

D Evaluation

MT: Formal

Functional

Approaches

translations

three of them

done by

machine.l

sentences,

total o{ 144,

"interspersed

random

order among

sentences

translation

sentences selected

random

from other translations

varying

quality"

(67-68).

addition

the sentences

scrambled

relation to their

order of

occurrence

originai

steps were

ensure

evaluated

more than

translation

sentence"

nine-point

"intelligibility"

similar

one of

formativeness,"

subjective

ratings

of these sentences

were obtained

of raters:

native

speakers

of English,

Harvard

undergrad-

uates,

who had no knowledge of

language

of the original

("monolinguals")

and 18 native speakers

of English with a high degree

of competence

comprehension of scientific

Russian

("bilinguals"),

direct

comparisons between the

translated

originai

input. To assess informativeness

determining

fidelity),

monolinguals

provided

"carefully

translated"

sentences

asked to measure their

informativeness

vis-d-vis the

tences

tested,

while the bilinguals

make a

direct

comparison with

original.

quickly

becomes evident

the methodology used

flawed.

the study

design flies in

we know

discourse

today.

isolated

sentences were

judged.

tences

intentionally presented

random or-

der, effectively

breaking up

any cohesive

ties or

overall

coherence

have contributed

interpretation

intended

meaning.

scrambled

deprived

the respondents of the

natural context but

may also have

created

contexts

misleading.

methodology had

the way

differences

between

respondents

were neutralized

characteristics

of the

experimental

population.

addition,

levels

two rating

scales

questioned. The

subjects

may have

had difficulty

consistent

fine-grained

discriminations

required.

possibly

charged

to the

"processing

information"

may have

included

spent in

mulling

possible

choices

at the

different

levels.Z

Furthermore,

criteria in the

were vague,

MURIEL

VASCONCELLOS

judgmental,

semantically loaded.

included such

scal.e

intelligibility:

"reads

ordinary text,"

querades

intelligible

sentence," "stylistic infelicities,"

"noise,"

ttgrotesquett

sentence

arrangement,

ttbizarre,tt

nnonsensical," tthope'

lessly

unintelligible," etc.

From the

fidelity:

"makes

the difference

the world,"

reader on the

track," "gives a slightly

different twist,"

Finally,

procedure

having

monolingual

measure

fidelity of the test translation

comparing

"carefully

pared"

translation is

fraught

uncontrolled

variables,

not the least

of them

fidelity

of the "carefully

prepared"

translation

itself.

Statistical

results

only be

meaningful

extent that

inal premises

design

are solid. It

unreasonable

conclude

results

tainted by

mistaken

lack of

knowledge

discourse

translation,

faulty

cedures.

And ALPAC

focused

evaluation

on the

quality

of machine

output

alone,

a small

the total

picture.

Summary

Formal

Approaches

As a reaction to

ALPAC,

approaches began

sought for

evaluation

conference

subject

held at the

University

Texas in 1971,3

experts

around the

convened

international

workshop

Luxembourg.a

that time

Systran was

running

the U.S.

Russian

English

and other Systran

language

had been installed

Commission of

European Communities

Luxembourg.

meeting reviewed

approaches that

applied to

evaluation

of MT up

Probably

earliest

experiment

Miller

Beebe-Center.

Sinaiko's

survey

mentioned

Force's

system done

1960s,

further

testing of

Mark II

output

decade

Small).

also a

formal

study of

output

Pfafflin

in 1965.

studies

reviewed

meeting

included

the evaluation

Systran

Slype,

subsequently

published,

both the

quality

output

throughput

were assessed.

study,

quality

determined

by asking

professional

revisers

Evaluation

Formal

vs. Functional

Approaches

corrections

weighting

changes

according

importance

perceived

investigator.

Such error

analysis

output

and continues

fairly

frequent

exercise,

understandably

sibility

initially

confronted

trans-

lations

produced

machine.

direct

assessments

quality

always

greatly

between

raters,

the relative

weights

assigned.

important

direct

analysis

is that

stran-

system may

able to

distinguish

between

problem

created

by a single

dictionary

error,

fixable

a twinkling,

a major

deficiency

conceptualization of the

system.

errors,

associated

with anaphora

resolution

(e.g.,

choices

between

'its',

'his',

'her',

'their',

'your'

a translation

Spanish

first sight

correct

postedit

prohib-

itively

expensive

to deal

the level

algorithm.

words,

developer

deliberately

assigned

priority.

importanr

is not

their manifestation

output

but rarher

easily

fixed,

in the

postedit

a long-

dictionary

or the

algorithm.

rating

technique

notable

improvemenr

direct

analysis

errors.

was no

attempt

identify

assign

values

specific

or constructions;

sentences

judged

entirety

as messages.

addition,

statistical

validity

results.

problems

the rating

concept

rather from

features

design

were introduced

process.

measuring

the quality

outpur

objective

analysis.

comprehension

respondents'

understanding

ascertained

through

questions

answer

reading

through.

formance

applied

translation

manuals,

example,

requires

subject

set of instructions

understood

from the

machine

output.

another method, which

has been

suggested

Brislin

back-transLation

respondenr,

experienced

translator,

manually

back-translates

language

(which

presumably

native

tongue).

gested,

machine

translation

directions.

difficulty

in either

particularly

latter,

second

MURIEL

VASCONCELLOS

translation

must be assumed

highest

quality-which

creates

vicious

circle.

approaches,

promising

appears

Cloze procedure:

a sample

of text,

(typically

every fifth

deleted, and

reader-subject

the blanks.

method

been used

with MT bv

Sinaiko

ries of

carefully

constructed

experiments

(Sinaiko

Sinaiko and

Klare).

addition to

quality

the output, there are other

parameters

can be

treated

formally.

by Hofstetter

proposed,

as the

measurement,

the time required to bring the machine-translated text

desired

quality.

Sager also

proposed

sion as a

parameter,

combining

with a

larger

criteria that in-

cluded

intelligibility

and acceptability of

output. And, of course,

Van Slype

studies and

others

measured

throughput.

Functional

Approaches

ser Ecraluation

of the foregoing

methods addresses the

question

whether

serving its

intended

purpose.

The first investigator to do

Henisz-Dostert.

work a

statement made

Bar-Hillel

in 197l

Every program

machine

translation

should be immediately

tested

effects on

the human

user. He is the

judge,

whether

quality

speed,

and to what

degree

Starting

premise,

conducted

survey among

edited

machine translations from

Russian

to English

Georgetown

Machine

Translation

System,

installed

Eunerou

Research

Ispra,

Italy,

Atomic

Energy Commission

Ridge,

Tennessee).

Responses

questionnaire

were received

entists and

engineers

had used raw

output from

system

period

1963-1973.

The major finding was

of the

respondents

judged

of the

translations to be

"good"

"acceptable"

purposes;

Evaluation

MT: Formal

vs. Functional Approaches

moreover,93o/o

found them to

informative,

BIo/o felt

that they

complete,

that they

were readable.

Getting

did not

present

problem. (Still,

output

was con-

sidered

nearly

longer

natural

translations

were also

reported

take about

one-third

read.)

machine translations

misinformation, while

experience.

Despite

delays

in throughput

mainly

problems

of keyboard-

the input text), 960/o of

respondents

either

already

ommended

MT to their

colleagues

not hesitate to do so;

moreover,

actually preferred

to human

translation.

respondents

considered

semantic factors

(vocabulary,

guistic

context)

more important for

understanding

MT out-

syntax,

they felt that the most important factor in

understanding

the texts

was their

own familiarity with the

subject mat-

(extralinguistic

context).

Here, then, is the

of the

pudding.

The technical

personnel

end-users

of raw MT

output

apparently

not very

cerned

about the

type of

being produced

by the machine

as they

could understand the

the information that they

were looking

Broad Functional View

functional

evaluation

system will

end-user

satisfaction

but also the

concerns

of man-

agement

(i.e.,

return

the investment,

productivity,

service to

organization)

and of

postediting

translators

(manipulability

of the

output,

ease of building the

dictionaries,

etc.). Since a

dynamic

vironment

includes interaction

developer, consideration

should

to the

needs,

priorities,

constraints

latter.

anything

determined

whether or not

system

has the capacity

cipals

interacting

it will

be able

make effective con-

tributions

to that

growth.

MT system's

potential

hinges

main factors: the

dictio-

structure,

daily working

environment, the translators

ongoing support

provided

by the

developer.

dictionaries

should

be large

with-20,000

stem entries

MURIEL

VASCONCELLOS

general

vocabulary

least,

with the

possibility

of adding

subject-specific

user-defined

subdictionaries

that override

the de-

fault translations. If the

system

make lexical

choices based

text, the

dictionary entries

necessarily

syntactic

and semantic

be supplied

by the

as well

developer.

system

should

several

different

coding

for contex-

tual associatiorrs.

Updating

should

mechanically

perform,

without

sacrificing

linguistic

specificity

described,

should be

learn.

The u.,orking environment

should

include, at

the front end,

effective

capturing

machine-readable

ual keying

the sole

source

input,

the operation

costly.

addition, the tools should

user-friendly.

Submission

of the

translation

should

simple operation,

output

should

available

word-processing

form such

translator

with it

readily

on-screen.

From the

standpoint

of management,

it makes

sense that

system

larger

production chain.

published,

it should

postediting

photocom-

position

minirnum

of steps

between.

The translators

should

prepared

make a

long-term

commit-

building

system.

typical

installation,

opposed

the few

deliver

directly

end-users,

translators

their contribution

at three

levels

and Santangelo

this vol-

First,

through

their everyday

postediting they

exercise

tem, they

sharpen

own skills,

and they

generate

feedback

updating

dictionaries

improving

algorithm.

ond, they

participate

directly

in dictionary-building.

through

their contribution,

fertilized

by daily contact

tionaries

can become

to the types of discourse

translated.

third,

in some

settings

at least,

translators

interact directly

system developers,

providing useful

suggestions

about recurring

terns.

system will

faster and

better

the extent

trans-

lators

positive

attitudes,

innovative responses,

can-do

spirit;

to the

extent

willing

processor

periods;

and to

extent

postedit

resourcefully

up with creative

solutions

at every

in the

production

chain.

important to have

ongoing support

from the

developer,

dor, in the form of

regular software

upgrades

and adjustments

may be required

by the

particular

application.

Evaluation

MT: Formal vs.

Functional Approaches

functional

evaluation

should

designed

to determine

istence

conditions

described.

If these can be

assured,

system

can be

counted

flourish

new environment.

Conclusion

single

evaluation

Always,

bottom

crucially

important to

system

standpoint

of all

parties

concerned,

is necessarily a

common endeavor

everyone

contributes

fullest.

even though

single

"right"

evaluation,

safe to say

formal

should be

inherently

limited

perspective

priority

should

always

to the functional

factors

shape the future

of the

system.

1. The provenance

the three

machine translations,

that time

the history

of MT, is

documented.

its only reference

to the texts used for the study,

Report

measurement

procedure

tested

applying

to six

varied English

translations-three

and three

mechanical-of

a Russian

entitled

Mashina

i Mysl'

fMachine

Thoughr.f,

by Z. Rovenskij,

A. Uemov,

lJemova

(Moscow

1960).

These translations

of five

passages

varying con-

siderably

type of

content.

(Allthe

passages

selected

for this experiment,

original

Russian versions, have

published

by the

Office of Tech-

Services,

Department

of Commerce,

Technical

Translation

65-60307 .)

The materials

associated

with one

passages

studies

and rater

practice

sessions;

experiment

proper

used the

maining

passages.

addition

to collecting ratings

sentences,

the raters

clock, with

stopwatches,

took ro come

up with their

sponses.

Processing time,

all other factors

equal,

be a measure-

of interest

consumers

information.

advantage

claiming

exactly

is-i.e.,

the time required

the text,

intelligence-gathering purposes

translates

budgetary

considerations.

December

Proceedings

published

Lehmann

and Stachowitz.

\Torkshop

on Evaluation of

Machine Translation

Systems,

Commission of

European

Communities

(Luxembourg,

28 Februarv

1978).

MURIEL

VASCONCELLOS

REFERENCES

Automatic

Language Processing Advisory Committee

. Language and

Machines: Com-

puters

in Translation

Linguistics; A

Report. . . .

National

Research

Council

lication

Washington,

National

Academy

Sciences,

Division

Behavioral

Sciences,

Bar-Hillel, Yehoshua. "Some

Reflections on the

Present Outlook

High-Quality

Machine

Translation ." Feasibility Study of FuIIy

Automatic HiglvQuality Translation.

Ed. Lehmann and

Stachowitz.

Austin: University

at Austin

Linguistic

Research

Center,

Dostert

Brislin,

Richard W. "Introduction." Transl.ation:

Applications and Research.

Gardner

Sons),

Henisz-Dostert,

BoZena. "LJsers'Evaluation

Machine Translation: Georgetown MT

System,

1963-1973."

presented

at the

Workshop

Evaluation

Machine

Translation

Systems

(Luxembourg,

February

1978). Typescript.

Luxembourg:

Commission

European

Communities,

Results

subsequently

published

extenso

Part lll of

MachineTranslation

Hague, Paris,

Mouton, 1979. 147-244.

Hofstetter,

"Methodological

Concept for the

Evaluation of Translation

Quality."

Paper presented

the Workshop

on Evaluation

of Machine

Translation

Systems

(Luxembourg,

Februarv 1978). Typescript.

Luxembourg:

Commission

of the Eu-

ropean Communities, 1978.

Lehmann, Winfred

Stachowitz. Feasibility Study of

Automatic High-

QnlityTratslation.

Austin: University of

Austin

Linguistic

Research

ter, 1971.

Report

RADC-TR-71-295.

Miller,

Beebe-Center.

Mechantcal

Translation 3

(1958):

ALPAC 1966.

D. B., and V.

all. Compr

ehensib

Machine

anslations o/

Russlan

Scientific Doanments.

Washington,

D.C.: American Institutes of

Research,

Cited in

Sinaiko

Pfafflin,

S.M. Mechanical Translation 8

(1965):2.

in ALPAC

Sinaiko,

Thoughts about

Evaluating Language

Translation.

presented

at the

\Torkshop on Evaluation

Machine

Translation

Systems

embourg, February

1978).

Typescript. Luxembourg: Commission of the European

Communities, 1978.

Sinaiko, H.\7.,

and G.R. Klare.

"Further

Experiments

in Language Translation:

ability

Computer

Translations."

(1972).

Sinaiko

Sinaiko,

H.\7.,

Klare.

"Further

Experiments

in Language

Translation:

Second Evaluation

of the Readability of

Computer Translations."

(1973).

Sinaiko 1978.

Slype, G. 1979.

Critical Study of Methods

for Evaluating the

Quality

of Ma-

Translation.

Final Report. Brussels,

Luxembourg: Commission

ropean

Communities,

Factors in Evaluation-tránlation

Documents

Evaluation of fine-bubblealpha factors in near full-scale

Evaluation of Factors That Influence Employee Turnover in

AN EVALUATION OF FACTORS AFFECTING ENTREPRENEURSHIP …

The evaluation of prognostic factors in differentiated ...aok.pte.hu/docs/phd/file/dolgozatok/2019/Szujo_Szabina_PhD_dolgo… · The evaluation of prognostic factors in differentiated

The evaluation of prognostic factors in differentiated

An evaluation of the factors that determine carrier …eprints.hud.ac.uk/id/eprint/713/1/pccwongfinalthesis.pdfAn evaluation of the factors that determine carrier selection in Southern

Factors for Selection and Evaluation in Using Board

Evaluation of Key Success Factors for Web Design in Taiwan

Hartford Evaluation Pipeline Design Factors

Strategic Organizational Factors Evaluation Method

Evaluation of Risk Factors in Client Acceptance …staff.ui.ac.id/system/files/users/s.nurwahyu/publication/...Evaluation of Risk Factors in Client Acceptance Decisions: Evidence from

Evaluation of Delay Related Factors in Niger Delta

Human Factors Evaluation in Ship Design and Operation: An Case … · 2019. 11. 13. · Human Factors Evaluation in Ship Design and Operation: An Case Study in Norwegian Sea Vincentius

Evaluation of Factors Affecting Productivity in the …eprints.brighton.ac.uk/11209/1/13 E109.pdf131 Evaluation of Factors Affecting Productivity in the UAE Construction Industry:

Hartford Steam Evaluation Pipeline Design Factors

Statistical Evaluation of Factors ... - documents.crinet.com

Evaluation Factors Mercer

Evaluation of Baseline Risk Factors on Progression in ... Evaluation of Impact of... · Evaluation of Baseline Risk Factors on Progression in Geographic Atrophy Post-hoc Analysis

FY16 Evaluation of Factors Influencing Methylmercury ... · FY16 Evaluation of Factors Influencing Methylmercury Accumulation in South Florida Marshes Annual Report Contract No. 22952

Evaluation of Factors Influencing Delay in Construction