48
Anabela Barreiro [email protected] FLUP & CLUP-Linguateca New York University New Tools and Resources to Support Machine Translation Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

New Tools and Resources to Support Machine Translation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: New Tools and Resources to Support Machine Translation

Anabela [email protected]

FLUP & CLUP-LinguatecaNew York University

New Tools and Resources to Support Machine Translation

Mestrado em Tradução Jurídica e EmpresarialAnabela Barreiro Lisboa, 8 January 2008

Page 2: New Tools and Resources to Support Machine Translation

Outline

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 3: New Tools and Resources to Support Machine Translation

Human Translation vs Machine Translation

An objective and purpose distinction must be established between human translation and machine translation!

•They use different methods

•They apply to different types of texts

•They serve different purposes

•They face different barriers

•They are NOT in competition!

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 4: New Tools and Resources to Support Machine Translation

Human Translation

Professional translation requires:

•a profound knowledge of the source language and native proficiency of the target language

•above-average writing skills

•an insightful knowledge of the social-cultural aspects of the source and target languages

•knowledge of the grammar of the two languages, their writing conventions, and the situational and cultural context

•In the case of scientific and technical translation, subject matter knowledge is required, including terminologies of the field or knowledge domain.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 5: New Tools and Resources to Support Machine Translation

Human Translation

Theory of translation has been dealing with controversial issues:

•problems related to privileging meaning over form

•visibility or invisibility of the translator

•being faithful to the author or trying to make the text accessible to the reader (and which kind of reader)

•giving value to the source language culture (foreignise) or making the text suitable for the target language culture (domesticate)

•Allowing languages/cultures with more impact to predominate over languages/cultures with less impact, or being creative, etc.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 6: New Tools and Resources to Support Machine Translation

Human Translation

The most relevant aspect in translation is to define the purpose of each translation, which is related to the characteristics of each text.

… And to define paraphrasing capabilities.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 7: New Tools and Resources to Support Machine Translation

Human Translation: Types of Texts

A certain subjectivity and distance from the source language text is allowed in translation of literary text for the sake of maintaining the artistic and aesthetic aspects of the target language text [Hermans, 1985] [Landers, 2001].

Literary translation may be considered an ART [Leighton, 1990] [Weaver, 2002], where the translator has more freedom of expression.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 8: New Tools and Resources to Support Machine Translation

Human Translation: Types of Texts

Technical, commercial, and legal translators, like the authors of the original texts, are more restrained in their use of language, and they need to be precise and convey the exact meaning of the original text.

Technical texts are not meant to be beautiful but rather to be informative, instructive and explanatory. Their main function is to be clear, so the easier they are to read, the better they are understood.

Technical translation may be regarded as a CRAFT [Newmark, 1988] [Biguenet & Schulte, 1989] for which both technical and linguistic competence is essential, but creativity and vagueness prohibited.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 9: New Tools and Resources to Support Machine Translation

Machine Translation

With more translation being performed by machines, new challenges are imposed on the field, theoretical traditions shaken and the need to rethink the status of translation becomes more evident. Of all automated applications, machine translation compels us to reconsider the nature of translation.

ART and CRAFT are NOT appropriate concepts for machine translation, because it has necessarily to rely on linguistics and computer science.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 10: New Tools and Resources to Support Machine Translation

Machine Translation

1- Automated translation of text or speech from one natural language into another

2- An important tool that assists human translators

3- It has become available to the general public in the last few years due to:

• sophisticated computers• continuous development of computer software capabilities• internet boom

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 11: New Tools and Resources to Support Machine Translation

Machine Translation (cont.)

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 12: New Tools and Resources to Support Machine Translation

Machine Translation Bottlenecks

1.Complexity of language

2.Ambiguity of language

3.Wordiness (related to text quality)

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 13: New Tools and Resources to Support Machine Translation

Machine Translation: Limitations

• The task of delivering high-quality machine translation of certain

types of texts and complex linguistic phenomena is difficult

• It is difficult to grasp humour, sarcasm, and other human feelings

expressed in/by means of sophisticated linguistic expression

• Difficulties in handling extra-sentential and extra-textual and extra-linguistic information (problems of culture or context), because knowledge of the world cannot be assumed

• Difficult to deal with anaphora resolution

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 14: New Tools and Resources to Support Machine Translation

Machine Translation Linguistic Challenges

1.Homography

2.Cross-language phenomena (lexical divergences and idioms and cross-language syntactic transformations, such as passives)

3.Identification of named entities

4.Capacity to deal with long sentences and wordiness

5.Unusual alterations to the order of words in the target language

6.Enhanced dictionaries and grammars to recognize and translate multiword expressions

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 15: New Tools and Resources to Support Machine Translation

Machine Translation Linguistic Challenges: Examples

• Handling of ellipsisadvanced ambiguity problems – related to anaphora

O João visitou muitos países do mundo. A Maria não visitou nenhum.=> João has visited many countries in the world. Maria hasn’t visited any.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 16: New Tools and Resources to Support Machine Translation

Machine Translation Linguistic Challenges: Examples

• Common-noun nuance resolution / homography

(1) ele não quis tomar partido de ninguém(2) ele é um bom partido(3) ele tirou partido da situação(4) ele pertence a esse partido (político)(5) o copo está partido(6) já esteve em melhor partido

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 17: New Tools and Resources to Support Machine Translation

Machine Translation Linguistic Challenges: Examples

Translation Engine Translation Results

FreeTranslation Francisco Scallop advances even if is it do an effort in the sense of take a decision still this

week, defined advances or not for a candidacy to the RTLRS.

WorldLingo advances despite he is to make an effort in the direction to still take a decision this week,

defining if he advances or he does not stop a candidacy to the RTLRS.

Translation Engine Translation Results

Google Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.

Amikai que eu não posso fazer para uma decisão sobre qualquer coisa estes dias.

FreeTranslation Eu não posso tomar uma decisão sobre algo estes dias.

Babelfish Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.

WorldLingo Eu no posso fazer a uma deciso sobre qualquer coisa estes dias.

E-Translation Server Não posso tomar uma decisão sobre qualquer coisa estes dias.

I can't make a decision about anything these days. [Compara]

Francisco Vieira adianta ainda que está a fazer um esforço no sentido de tomar uma decisão ainda esta semana, definindo se avança ou não para uma candidatura à RTLRS. [CdP]

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 18: New Tools and Resources to Support Machine Translation

Multiword Expressions: Support Verb Constructions

Support verb construction = predicate noun construction

is a multiword expression containing a verb with weak semantic value and a noun which is the predicate of the sentence.

Predicate nouns can be:

morphologically related to a verb

fazer uma apresentação de = apresentar

pay a visit to = to visit

autonomous

fazer um mestrado - *mestrar

have fun - *to fun

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 19: New Tools and Resources to Support Machine Translation

Main Objectives

1.Build a body of lexical, syntactic and semantic knowledge around support verb constructions

2.Apply this linguistic knowledge to paraphrasing

3.Improve machine translation

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 20: New Tools and Resources to Support Machine Translation

Outcome: Resources

Port4NooJ•an open source, ontology driven Portuguese linguistic

system, which integrates a bilingual extension for Portuguese-English machine translation

DicTUM

•Dicionário de Termos e Unidades Multipalavra

•a Dictionary of Multiword Expressions

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 21: New Tools and Resources to Support Machine Translation

Outcome: Tools

ReWriter •a monolingual paraphraser to pre-edit texts, using

paraphrasing capabilities

•Portuguese version ReEscreve

ParaMT•a bilingual/multilingual paraphraser to be integrated in

machine translation systems

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 22: New Tools and Resources to Support Machine Translation

ResourcesPort4NooJ - Publicly available at:

http://www.nooj4nlp.net

http://www.linguateca.pt/Repositorio/Port4Nooj/

Based on:

•NooJ linguistic environment (http://www.nooj4nlp.net/)

•OpenLogos English-Portuguese dictionary (http://logos-os.dfki.de/)

OpenLogos is an open-source derivative of the Logos Machine Translation System

Data Used

•COMPARA (http://www.linguateca.pt/COMPARA)

•METRA (http://www.linguateca.pt/metra)

•Other corpora

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 23: New Tools and Resources to Support Machine Translation

HIV,N+FLX=PORTUGAL+AB+state+IMMUN+EN=HIV

doença maníaco-depressiva,N+FLX=CASA+AB+state+MH+EN=manic-depressive disorder

doença bipolar,N+FLX=CASA+AB+state+MH+EN=bipolar disorder

asma,N+FLX=CASA+AB+state+PULM+EN=asthma

Amesterdão,N+PL+city+EN=Amsterdam

Estados Unidos da América,N+PL+coun+EN=United States of America

África,N+PL+cont+EN=Africa

Extremo Oriente,N+PL+othprop+EN=Far East

Mediterrâneo,N+FLX=ANO+PL+water+EN=Mediterranean

Alpes Peninos,N+FLX=ALPES+PL+othprop+EN=Pennine Alps

ONU,N+AN+org+EN=UN

Syntactic-Semantic Attributes

English Transfer

Inflectional Paradigm

Part of Speech

Lemma

mesa,N+FLX=CASA+CO+surf+EN=tablecair,V+FLX=ATRAIR+INMO+IntoType+EN=fallholandês,A+FLX=INGLÊS+AN+lang+EN=Dutch actualmente,ADV+FLX=FACILMENTE+TEMP+punc+pres+EN=nowadaysalguém,PRO+IMPERS+INDEF+EN=somebodyporque,RELINT+why+EN=whye,CONJ+JOIN+EN=anddurante,PREP+TEMP+EN=duringcada,DET+IMPERS+INDEF+SG+EN=eachterceiro+NUM+ord+EN=one third

Port4NooJ Dictionaries

a curto prazo,ADV+TEMP+EN=in the short runa favor de,PREP+CAUS+EN=in favor ofcada um,PRO+INDEF+SG+EN=each onede quem,INT+ThatType+EN=whosequem quer que seja,REL+WhateverType+EN=whoeveralém disso,CONJ+COOR+EN=besidesum quarto,NUM+frac+EN=one fourth

adro da igreja,N+FLX=MENINO+PL+encl+EN=churchyard

cabo de vassoura,N+FLX=MENINO+COtool+EN=broomstick

bebida alcoólica,N+FLX=CASA+MA+liqu+EN=alcoholic drink+UNAMB

bebida alcoólica,N+FLX=CASA+MA+liqu+EN=booze+slang

cor de laranja,A+NAV+Apred+EN=orange

sul-americano,A+FLX=ALTO+AN+des+EN=South American

a curto prazo,ADV+LocTime+TEMP+EN=in the short run

fora de serviço,ADV+STAT+phr+EN=out of order

há muito tempo,ADV+LocTime+TEMP+puncpast+EN=a long time ago

isto é,CONJ+COOR+EN=i.e.

já não,CONJ+COOR+EN=no longer

mesmo assim,CONJ+SUB+EN=even so

juntamente com,PREP+ASSOC+EN=along with

à direita de,PREP+Loc+AT+EN=at the right of

em conformidade com,PREP+ALOG+EN=in congruence with

General dictionary sample representing all

PoS, variable and invariable forms Sample of the

dictionary of Terms and

Multiword Expressions

DicTUMSample of invariable compounds in the general dictionary

Sample of the dictionary of

Biomedical Terms

Sample of the dictionary of

Proper Names

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 24: New Tools and Resources to Support Machine Translation

Port4NooJ Dictionaries

Sample of terms classified as Information +

Instructional/legal

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 25: New Tools and Resources to Support Machine Translation

Syntactic-Semantic Ontology  

Representation abstract language

Hierarchical taxonomy (sets, supersets and (sometimes) subsets)

Based on Logos SAL ontology

Integrated in the dictionary

It represents both meaning (semantics), and structure (syntax)

Over 1,000 categories

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 26: New Tools and Resources to Support Machine Translation

Syntactic-Semantic Ontology  

Noun Supersets concrete mass animate place information abstract process (intr) process (tr) measure time aspective

Sets and Subsets of the CONCRETE Noun Superset Click on CONCRETE Superset, sets and subsets for explanations

functionals receptacles bearing surfaces links/bridges thresholds, focal points, barriers conduits fasteners devices, tools cloth thing structural elements

concretizations of verbals concretizations of mass nouns undifferentiated functionals product/brand names * * *

agentives software

vehicles meters machines/systems communication agents concrete chemical agents

undifferentiated agentives * * *

natural things minute flora plants trees trees/wood

miscellaneous natural things * * *

other concrete sets*

impulses/lights

blemishes/marks

edibles (non-mass) edibles/color

classifiers

amorphous

atomistic

undifferentiated concrete things

* * * *With one exception, these

sets have no subsets

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 27: New Tools and Resources to Support Machine Translation

Syntactic-Semantic OntologyCategory Mnemonic Examples in English Examples in Portugueseagentives CO+undagt See subsets See subsetssoftware CO+soft routine rotina, ficheiroconcrete chemical agents CO+chem catalyst, warhead ácido sulfúricomachines/systems CO+mach battery, camera máquina fotográficavehicles CO+vehic truck, ship automóvelmeters CO+meter clock, gauge manómetrocommunication agents CO+comm radio, radar rádiofunctionals CO+undfunc trinket, ornament ornamentodevices/tools CO+tool pliers alicatefasteners CO+fast nail, tendon pregobearing surfaces CO+surf table, shelf mesareceptacles CO+recp bottle, barrel garrafaconduits CO+cond chute, artery artériathresholds/focal points/barriers CO+barr wall, door portalinks/bridges CO+link circuit, nerve circuitocloth things CO+cloth shirt, blanket camisolastructural elements CO+struc spar, bone ossoconcretizations of verbals CO+verb threadingconcretizations of mass nouns CO+mass acid liningproduct/brand names CO+brand Windows NT Windows NTnatural things CO+nat See subsets See subsetsminute flora CO+flora algae, spore algaplants CO+plant rose, weed ervatrees CO+tree apple, willow macieiratrees/wood CO+trwd oak, maple carvalhomisc. natural things CO+mnat pebble, iceberg icebergedibles (non-mass) CO+ednm pork chop costoletaedibles/color CO+edcol orange, cherry laranjaimpulses/lights Col+ight lamp, beam lâmpadablemishes/marks CO+blem scratch, freckle sardaclassifiers CO+class element elementoamorphous CO+amor breeze, tide brisaatomistic CO+atom electron, atom átomoundifferentiated CO+obj trifle, curio

  

Categories of CONCRETE nouns

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 28: New Tools and Resources to Support Machine Translation

ME - MEASURE Noun Sets and Subsets

Sets and Subsets Mnemonics (= SynSem) Examples

abstract concepts measured by unit ME+abs humidity, length

discrete measurable concepts ME+dis sum, increment

units of measure ME+unit See subsets

units of weight ME+unit+wt ounce, pound

units of velocity ME+unit+vel mph, megahertz

units of volume measure ME+unit+vol gallon, liter

units of temperature ME+unit+temp degrees celsius

units of energy/force ME+unit+ener watt, horsepower

measurement systems ME+unit+sys fahrenheit, kelvin

units of duration ME+unit+dur hour, minute, year

specialized units of measure ME+unit+spec oersted, ohm, phon

units of money/value ME+unit+value dollar, euro, forint

units of linear/area measure ME+unit+lin inch, yard, mile

general undifferentiated measure ME+undif degree, gross, share

Syntactic-Semantic Ontology  

Categories of MEASURE nouns

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 29: New Tools and Resources to Support Machine Translation

  Inflectional and Derivational Description

Noun Inflectional Paradigm

Adjective Inflectional Paradigm

Pronoun Inflectional Paradigm

Verb Inflectional Paradigm

Adverb Inflectional Paradigm Determiner Inflectional Paradigm

Interrogative Pronoun Inflectional Paradigm Nominalization Derivational

Paradigm

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 30: New Tools and Resources to Support Machine Translation

Paraphrasing and Translation Grammars

Translation and bilingual paraphrasing

of simple sentences

Graph to translate simple sentences

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 31: New Tools and Resources to Support Machine Translation

Verb entries:

• Identification of derivational paradigms for nominalizations (annotation NDRV) and predicate adjectives (annotation ADRV)

• Link to the derived noun’s support verbs and to the adjective’s copula verbs (annotation VSUP and annotation VCOP)

adaptar,V+FLX=FALAR+Aux=1+INOP57+Subset=132+EN=adapt+VSUP=fazer+DRV=NDRV00:CANÇÃOazedar,V+FLX=LIMPAR+Aux=1+OBJTRundif98+Subset=740+EN=sour+VCOP=estar+DRV=ADRV00:ALTO

Explicit Marking of Derivation and Support Verb

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 32: New Tools and Resources to Support Machine Translation

Adjective entries:

• Identification of derivational paradigms for adverbializations (annotation AVDRV)

literal,A+FLX=PRINCIPAL+IN+symb+EN=literal+DRV=AVDRV00:LITERALMENTE

Autonomous predicate nouns:

• Identification of autonomous predicate nouns (annotation Npred)

• Identification of a semantically related verb

curso,N+FLX=ANO+Npred+IN+inst+EN=course+VSUP=tirar+VRB=estudar+NPrep=de+Det=um

Explicit Marking of Derivation and Semantic Verb Association

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 33: New Tools and Resources to Support Machine Translation

ReWriter: a Monolingual Standalone Paraphraser

Recognition and monolingual paraphrasingof support verb constructions

(support verb construction / morphologically related lexical verb)

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 34: New Tools and Resources to Support Machine Translation

ReWriter: Examples

Recognition and paraphrasing of elementary support verb constructionsco-occurring with predicate nouns of the biomedical field

(support verb construction / lexical verb or stylistic variant / non-elementary support verb construction)

Elementary SVC > Lexical VerbElementary SVC > non-elementary SVC

realizar/efectuarElementary SVC > sujeitar-se a

submeter-se a ONLY if the SUBJECT is a patient

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 35: New Tools and Resources to Support Machine Translation

ReWriter: Application - Interface

Interactive ReWriterfor word processing applications

such as text editing

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 36: New Tools and Resources to Support Machine Translation

ReWriter: Application - Interface

Interactive ReWriterfor word processing applications

such as text editing

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 37: New Tools and Resources to Support Machine Translation

ReWriter: Application - Interface

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 38: New Tools and Resources to Support Machine Translation

ReWriter: Application - Interface

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 39: New Tools and Resources to Support Machine Translation

ReWriter: Application - Interface

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 40: New Tools and Resources to Support Machine Translation

ReWriter: Extensibility

1.Applications to General Language

2.Applications to Technical Language

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 41: New Tools and Resources to Support Machine Translation

ReWriter: Extensibility - Examples

[Paraphrasing adverbials]

à volta da órbita ≡ periorbital (popular versus technical) around the orbit of the eye ≡ periorbital [Paraphrasing relative clauses - into adjectival past

participles]N0 que têm sido escritos ≡ N0 que foram descritos ≡N0

escritosN0 that have been written ≡ N0 that were described ≡

N0 written  [Paraphrasing if clauses]se for necessário ≡ se necessárioif it is necessary ≡ if necessaryMestrado em Tradução Jurídica e Empresarial

Anabela Barreiro Lisboa, 8 January 2008

Page 42: New Tools and Resources to Support Machine Translation

ReWriter: Extensibility - Examples

[Paraphrasing coordinated noun phrases - conjoining or disjoining]

recursos linguísticos para o ensino e para a investigação Ŧ ?linguistic resources for teaching and for research≡ recursos linguísticos para o ensino e a investigaçãoŦ linguistic resources for teaching and research [Paraphrasing subjunctive clauses - into infinitives]

pedimos o favor que confirme a sua participação Ŧ *we ask the favor that you confirm your attendance≡ pedimos o favor de confirmar a sua participação Ŧ *we ask the favor of confirming your attendance

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 43: New Tools and Resources to Support Machine Translation

ReWriter: Extensibility - Examples

[Paraphrasing marked-up constructions]

se a necessidade do utilizador é criar um texto em linguagem controlada Ŧ ?if the end-user need is to create controlled language text≡ se o utilizador necessita de criar um texto em linguagem controlada Ŧ if the end-user needs to create controlled language text [Paraphrasing of vague and undefined or null subject sentences] (whenever the real subject/actor is known)

[-] houve um grito na rua ≡ [N-PRON]/alguém gritou na rua Ŧ there was shouting in the street ≡ [N-PRON]/someone shouted in the street

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 44: New Tools and Resources to Support Machine Translation

ReWriter: Extensibility - Examples

[Paraphrasing passives - whenever suitable]

Esse livro foi escrito por Saramago em 2008 ≡ Saramago escreveu esse livro em 2008That book was written by Saramago in 2008 ≡ Saramago wrote that book in 2008

Florida foi atingida por um tornado ≡ Um tornado atingiu a FloridaFlorida was hit by a tornado ≡ A tornado hit Florida

O carro foi roubado ≡ Alguém roubou o carroThe car was stolen ≡ Someone stole the car

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 45: New Tools and Resources to Support Machine Translation

ParaMT: a Bilingual/Multilingual Paraphraser for MT

Recognition and bilingual paraphrasing of support verb constructions (Portuguese support verb construction / corresponding English verb)

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 46: New Tools and Resources to Support Machine Translation

Preliminary Quantitative Results  

SVC Recognition

Precision SVC Recognition

Recall SVC Paraphrasing

Precision Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6% Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1% Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7% Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1% Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8% Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4%

Evaluation of recognition and paraphrasing

of support verb constructions

500 sentences

100 for each elementary support verb

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 47: New Tools and Resources to Support Machine Translation

Conclusions

Linguistic knowledge applied to a machine translation system improves its output quality.

Effective results from linguistically based research on paraphrases can save substantial effort and resources employed by machine translation systems

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008

Page 48: New Tools and Resources to Support Machine Translation

Thank you for your attention!

Acknowledgements

This work was partly supported by grant SFRH/BD/14076/2003 from Fundação para a Ciência e a Tecnologia, co-financed by POSI and partly by Fundação para a Computação Científica Nacional.

Mestrado em Tradução Jurídica e Empresarial Anabela Barreiro Lisboa, 8 January 2008