71
USING THEINTERNET FOR SPECIALISEDTRANSLATION 1

USING THEINTERNET FORSPECIALISEDTRANSLATION

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: USING THEINTERNET FORSPECIALISEDTRANSLATION

USING THEINTERNET

FOR SPECIALISEDTRANSLATION

1

Page 2: USING THEINTERNET FORSPECIALISEDTRANSLATION

TranslationTechnology

“muchtranslationworkiscarried out in a computer-assisted translation (CAT)environment, which may vary from a standard desktopequippedwithwordprocessing softwareanda browsertoa full-blowntranslatorworkstation consistingofa multiplicityoftools specificallycreatedfortranslators oftechnicaltextsand localizers."

“Translation agencies organize their workflow around project

managementsystemsthatdistribute translationtasks,memoriesand

terminologiestoand around individual translators.”

(F.Zanettin 2014, “Corpora inTranslation”)

Page 3: USING THEINTERNET FORSPECIALISEDTRANSLATION

Translation technologies

• Electronic dictionariesand terminologicaldatabases,

the arrivalof the Internetwith its numerous

possibilities for research, documentation and

communication, and the emergence of computer-

assistedtranslationtools.

Alcina A. (2008) «Translation technologies - Scope, tools and resources».Target20:1,79–102

Page 4: USING THEINTERNET FORSPECIALISEDTRANSLATION

Degrees of Translationautomation

Page 5: USING THEINTERNET FORSPECIALISEDTRANSLATION

• Theterm traditional humantranslation is understood to refertotranslation

without any kind ofautomation

• Fullyautomatichigh qualitytranslation(FAHQT)meanstranslationthat is performedwholly by the computer,withoutany kindof

human involvement,and is of “highquality”

• Human-aidedmachinetranslation (HAMT)refersto systems inwhichthe

translation is essentiallycarriedout by theprogram itself, but aidrequired from humans

• Machine-aided human translation (MAHT) comprises any process or degree of automation in the translationprocess,provided that the mechanical intervention provides somekind of linguisticsupport.

Degrees of Translationautomation

Page 6: USING THEINTERNET FORSPECIALISEDTRANSLATION

Toolsvs.Resources

• The word tool refers to computer programs that enable translators to carry

out a series of functions or tasks with a set of data that they have prepared

and, at the same time, allows a particular kind of results tobe obtained.

• Internet search engines

• Word processor

• Trados, Wordfast, Déjà Vu, Across, OmegaT,…

• Antconc, Wordsmith…

• By resourceswe refer to all sets of previously gathered linguistic datawhich

are organized in a particular manner and made available in some electronic

format so that they can be used or looked up or usedby translators used in

the course of some phase of processing. Terminological databases (e.g.

IATE), glossaries,…

• (online) dictionaries

• British National Corpus, …

Page 7: USING THEINTERNET FORSPECIALISEDTRANSLATION

why and how can we mine the web?

Page 8: USING THEINTERNET FORSPECIALISEDTRANSLATION

• the study of words “by presenting them in thecompany they usually keep - that is to say, an element of their meaning is indicated when their habitual word accompaniments are shown”

• “Extended units of meaning” at work in language

(Sinclair, 1996)

Extended units of meaning

Words must be studied in context rather than in

isolation

• collocation• colligation• semantic preference• semantic prosody

Page 9: USING THEINTERNET FORSPECIALISEDTRANSLATION

Extended units of meaning

Words must be studied in context rather than in

isolation

• Differences in Italian between (from Taylor,1998: 61):

◦ “pressione alta” = “high (blood) pressure”[medical]

◦ “alta pressione” = “(banks of) high pressure” [meteorological]

• collocation

Page 10: USING THEINTERNET FORSPECIALISEDTRANSLATION

• “Tendency of certain words to co-occur regularly in a given language” (Mona Baker,1992: 47)

• As observed in actual texts (vs. intuition)

• Key features of collocations

o language-specific (collocations vary from language tolanguage)

• Collocations are not stable or fixed

o they may change diachronically (over time) in generallanguageo they may change in LSP vs. generallanguageo they may change across LSPdomains

11

Collocation

Page 11: USING THEINTERNET FORSPECIALISEDTRANSLATION

• “Provide” tends to occur with “information”, “service(s)”, “support”,“help”, “money”, “protection”,“food”,“care”

• “Cause” tends to occur with “pain”, “damage”,“harm”

Page 12: USING THEINTERNET FORSPECIALISEDTRANSLATION

•“Aconsistentaura of meaning with which a form is imbued by its collcates” (Louw1993)

• “Feeling” or “aura” that is evoked by using certain words (reinforced by collocates, due to co-selectional implications and restrictions)

•Usually this feeling is “positive or negative”

• “Provide” tends to occur with wordsdenoting things which are desirable, necessary or good, such as “information”, “service(s)”, “support”, “help”,“money”, “protection”,“food”,“care”

• cf. Italian “fornire” and“elargire”

• “Cause” tends to occur with words denoting negative repercussions/consequences, such as “pain”, “damage”,“harm”

• cf. Italian “causare”

Semantic prosody

Page 13: USING THEINTERNET FORSPECIALISEDTRANSLATION

14

• “Commit” is used with e.g. “murder”, “crime”,“suicide”

•“Revoke” is used with e.g.

“licence”,“permit”,“authorization”

Page 14: USING THEINTERNET FORSPECIALISEDTRANSLATION

15

•Relation between a lemma and a set of semantically related words (Stubbs, 2001: 65)

• Lemma: dictionary entry of a word

• “Commit” is used with a group of semantically similar words, e.g. “murder”, “crime”,“suicide” (cf. Italian “commettere”)

•“Revoke” is used with e.g. “licence”, “permit”,“authorization”

•Semantic prosody→ positive/negativeevaluation

•Semantic preference→ relation to words belonging to a particular, definable semantic field

Semantic preference

Page 15: USING THEINTERNET FORSPECIALISEDTRANSLATION

16

• We heard the visitorsleave/leaving.

• We noticed him walk away/walking away.

• We heard Pavarotti sing/singing.

• We saw it fall/falling.

Colligation

Page 16: USING THEINTERNET FORSPECIALISEDTRANSLATION

Colligation

hear, notice, see, watch enters into colligation with the sequence of object + either the bare infinitive or the -ing form; e.g.

• We heard the visitorsleave/leaving.

• We noticed him walk away/walking away.

• We heard Pavarotti sing/singing.

• We saw it fall/falling.

•Relation between a pair of grammatical categories or a pairing of lexis and grammar (Stubbs, 2001:65)

17

Page 17: USING THEINTERNET FORSPECIALISEDTRANSLATION

Conclusionon using theWeb

for specialised translation – Main advantages

• massive amount of texts and multi-sourceinformation

can besearched

• content is constantly “refreshed” (i.e. updatedand extended)

How to friend and unfriend someone on Facebook - ComputerHope 1.https://www.computerhope.com › ... › Facebook Help24 gen 2018 - Before you can connect with another person on

Facebook and view their full profile, you must first become friends.

Below are the steps on how to find new friends on Facebook, add

friends, and how to unfriend any of your current friends. How tofind

friends on Facebook; How to friend someone on ...

Page 18: USING THEINTERNET FORSPECIALISEDTRANSLATION

Conclusionon using theWeb

for specialised translation – Main advantages

• a lot of sources,text types anddomains/topics are

represented

• manylanguages (Englishis dominant, goodpresenceof

Italian)

• replicable searchtechniques across(your

working/target) languages

• it is availableat anytime, at virtuallyno cost!

Page 19: USING THEINTERNET FORSPECIALISEDTRANSLATION

Main disadvantages andproblems

o need to differentiate good/reliablesources from

questionable information

▪for facts (limited control over user-generatedcontent

likeWikipedia)

▪for linguistic usage (badly translated, non-native texts,

poorauthors)

▪it may be difficult to identify differences between

expert/non-expertuse

o data/resultsstill need to be interpreted

Page 20: USING THEINTERNET FORSPECIALISEDTRANSLATION

Main disadvantages and problems

o Googlefocuseson content/information,rather than linguistic

forms

• the rankingand sortingof results are performed according

to criteria like

• “popularity” of the websites, orgeographic relevance

• the same search can yield different numbers of hits,

depending on unpredictable and uncontrollable factors as

the time of the day,or the location from which the query

is made -- wordcountsare not reliable+it is difficult to

compare frequencies to verify translationhypothes

• data on which searches areperformed isunstable/changes

Page 21: USING THEINTERNET FORSPECIALISEDTRANSLATION

Main disadvantages and problems

Particularly relevant to linguists/translators:

▪ no possible/meaningfulsortingofhits/results(esp.L/R-handcollocates)

- e.g.alphabetical sorting of collocates, from least to most frequent,etc.

- think of e.g. the “a * range/array of”, “on the vergeof” exercises

▪ punctuationanduppercase(capitals)areignored,e.g.“aids” vs.“AIDS”

▪ impossibleto search partsofwords,e.g.start with “geo…”,end in “-itis”

▪ no lemmatised searches

- hard to calculatefrequenciesof specificwordcombinations

- e.g.to calculatehowfrequent isthe combination “tirare l’acquaalproprio

mulino”,all inflected formsmustbesearchedfor

▪ no POS-sensitivesearches

- e.g. to searchfor ‘spot’as anoun vs.adverb

▪ no possibility to specify the spano ccuringbetweentwosearchterms

- i.e. the * wildcard can include zero to nwords

«Googleology is bad science» (Kilgarriff 2007)

Page 22: USING THEINTERNET FORSPECIALISEDTRANSLATION

MACHINE TRANSLATION

(MT)

1

Page 23: USING THEINTERNET FORSPECIALISEDTRANSLATION

24Machine translation (MT):

definition and key terms

• Definition of machine translation:

“computerised systems responsible for the production of translations

from one natural language into another, with or without human

assistance” (Hutchins & Somers, 1992: 3)

• Some key terms:

o MT system / engine / service = the software that produces the translation

o input = the source text

o [raw] output = [unedited] target text

Page 24: USING THEINTERNET FORSPECIALISEDTRANSLATION
Page 25: USING THEINTERNET FORSPECIALISEDTRANSLATION

“L'inglese di Expo non sembra Google Translate,

è Google Translate”

From http://www.linkiesta.it/it/blog-post/2015/02/12/linglese-di-expo-non-sembra-

google-translate-e-google-translate/22476/

Page 26: USING THEINTERNET FORSPECIALISEDTRANSLATION

MT – popular conceptions

Probably the translation technology that

attracts the most public attention, esp.

among non-translators.

Page 27: USING THEINTERNET FORSPECIALISEDTRANSLATION

28

Texts in SL Texts in TL

Parallel corpora: a collection of original texts in language L1 and their translations

into a give L2

Machine translation (MT): main architectures of MT systems

Page 28: USING THEINTERNET FORSPECIALISEDTRANSLATION

29

Machine translation (MT):

why is MT so difficult? Or why is translation difficult for computers?

• So why is translation difficultfor computers?

o Some blame the computer’s lack of “real-world knowledge”

legno bosco foresta IT

wood forest EN

bois forêt FR

Page 29: USING THEINTERNET FORSPECIALISEDTRANSLATION

30Machine translation (MT):

why is MT so difficult? Or why is translation difficult for computers?

• Partly because the translation often depends on the context / situation,

which the computer is not able to take into account

“The ball is in your court”

“Il pallone è nella vostra metà campo”

(the manager to the players)

“Il ballo è nella vostra corte”

(the chamberlain to the king)

Page 30: USING THEINTERNET FORSPECIALISEDTRANSLATION

31Machine translation (MT):why is MT so difficult? Or why is translation difficult for computers?

• Lexical ambiguities (gramm. category <-> meaning <-> translation)

for example, in EN:round

j) My team was eliminated in the first round

k) The cowboy started to round up the cattle

l) We can use the round table for dinner

m) Maggie is going on a cruise round the world

(Noun: girone)

(Verb: radunare)

(Adjective: rotondo)

(Preposition: intornoal)

Page 31: USING THEINTERNET FORSPECIALISEDTRANSLATION

32

2) The chimp eats the banana because it is ripe.

3) The chimp eats the banana because it is lunchtime.

?

Machine translation (MT):

some linguistic phenomena that are particularly difficult forMT

1) The chimp eats the banana because it is greedy.

• The case / example of pronominal anaphora (resolution), difficult for MT

Page 32: USING THEINTERNET FORSPECIALISEDTRANSLATION

33

MT post-editing

Page 33: USING THEINTERNET FORSPECIALISEDTRANSLATION

34

• The aim of post-editing is to make the revised output usable or

understandable

• The priority is to save time andmoney

• The extent and the accuracy of post-editing are negotiated/specified

on a case by case basis, depending on the needs and requirements

• Different “types” and levels of post-editing (in companies, organisations):

• no post-editing

• internal circulation, almost never external publication

• minimum post-editing

• internal circulation, rarely external publication

• full/complete post-editing (but… is it worth it?)

• very rarely internal circulation, mostly external publication

MT post-editing

Page 34: USING THEINTERNET FORSPECIALISEDTRANSLATION

35

• new skill that is acquired with experience, different from translation

• in this scenario one has to balance and optimise quality-speed-cost,

in relation to the intended use/durationof the translation

• length ofuse of

the document

• needs and

expectations of

the enduser(s)

• ability of the

readers/addressees to

makeuse of the doc.

• type, length and

“visibility” ofthe

document

• available and

viable options

MT post-editing: introduction

Page 35: USING THEINTERNET FORSPECIALISEDTRANSLATION

36

• (minimum/full-complete) are decided specifically

• Factors to be considered(prioritised)

• save time and money (quality is less relevant)

• understandability and correctness of general meaning are key

• Factors to be ignored (irrelevant in PE)

• any detail or nuance

•elegance, fluency, naturalness of expression, etc.

on average PE is paid roughly 50%of the “real/proper”

translation

Aims and level of PE (vs.translation/proofreading!)

Page 36: USING THEINTERNET FORSPECIALISEDTRANSLATION

37

MT pre-editing

Page 37: USING THEINTERNET FORSPECIALISEDTRANSLATION

38

•There are two possibilities to limit the texts / language in / for MT:

• adopt a controlled language (restricted input)

• use the sublanguageapproach

• Common aims with both options (to the advantage of MT):

• limited vocabulary

• more certainty on interpretation

• reduce syntacticvariation

Limit input domain / topic

Page 38: USING THEINTERNET FORSPECIALISEDTRANSLATION

39

• Prescriptive rules aimed at normalising the style of the input (ST), e.g.

• do not write sentences with more than 20 words (general, language-neutral)

• avoid passive constructions, use only active verb forms

• avoid anaphoras, make all subjects and pronominal references explicit

• in EN: do not omit “that” in relative clauses (language-specific)

• in IT: do not use “solo” as an adverb, but use “soltanto/solamente”

• in IT: use the word “minuto” only as a noun (i.e. to mean 60 seconds);

for the adjectival meaning, use only “piccolo”

Etc……

The result of controlled language is restricted input

Controlled language

Page 39: USING THEINTERNET FORSPECIALISEDTRANSLATION

40

• Natural/normal behaviour of language within a well-defined domain

(~ LSP, specialised language, jargon, etc.)

• “sub-” in the mathematical sense as in “subset”, not derogatory!

• referred to very well-defined, enclosed, limited domains and texts

• A sublanguage exists and is used regardless of MT, but one can design an MT system that takes advantage of this sublanguage

• vocabulary

• limited (relatively few concepts to be covered/expressed)

• finite/closed (innovation/deviation tend to be avoided)

• a few homographs, in general limited use of synonyms and coreferences

• syntax

• limited range of structures and constructions (regularity + repetitiveness)

• usually sublanguages are very similar cross-linguistically between SL/TL(s)

Sublanguage (1/2)

Page 40: USING THEINTERNET FORSPECIALISEDTRANSLATION

41

• Input must be in (or converted into) electronic format

• Correct formatting and layout of the input are very important

o the word “e r r o r” (spaced letters) would not be recognised / translated

o spelling and typos are crucial: THEYBOOKS AROOM …

(anybody would understand banal mistakes, but not an MT system!)

• Limited availability of language combinations (improving with SMT)

o coverage mostly limited to “usual” big languages with commercial interest

Machine translation (MT):

restrictions to the use of MT

Page 41: USING THEINTERNET FORSPECIALISEDTRANSLATION

COMPUTER-ASSISTED

TRANSLATION

(CAT) TOOLS

1

Page 42: USING THEINTERNET FORSPECIALISEDTRANSLATION

• Computer-assisted translation or computer(machine)-aided translation (CAT) refers to a variety of tools, a family of software products designed to support professional translators in their work.

• CAT is a “recent” development, derived from MT over the last 20 years

• The actual development of commercial CAT tools started in the 1990’s–the so-called“translator’s workstation / workbench”, which includes

• terminology managementpackages

• translation memory (TM) software (+ text alignment software, etc.)

• CAT tools are pieces of software designed to enhance the work of translators:

• maximise speed→ higher productivity

• improve coherence and precision→ higher quality

43

Computer-assisted translation (CAT)tools

Page 43: USING THEINTERNET FORSPECIALISEDTRANSLATION

• Used to create, store, retrieve and manipulate bi-/multilingual termbases/glossaries

• As searching for terminology can be highly time-consuming (even up to 75% of translators’ time), setting up a database which gathers the terminology you come across is vital.

• Lists in word processors / spreadsheets (e.g. Excel)→limited options for presenting and sorting data

• The terminology covered is usually that of a given (sub-)discipline or the terms needed for a specific translation project.

• Terminology records consist of a number of flexible fields

44

CAT tools, example 1:

terminology management packages

Page 44: USING THEINTERNET FORSPECIALISEDTRANSLATION
Page 45: USING THEINTERNET FORSPECIALISEDTRANSLATION

46

• Translation memory (TM):

“multilingual text archive containing […]

multilingual texts, allowing storage and retrieval of

aligned text segments against various search conditions”

(EAGLES* 1995)* Evaluation of Natural Language Processing Systems

•This roughly means: a “filing cabinet” (i.e. a database) of old translations whose bits can be retrieved and used when / as needed by the translator

• essentially a textual database that can be searched• pairs of source-text and target-text segments

CAT tools, example 2:

translation memory (TM) software

Note: Translation memory indicates both the software tool and the contents

of the database, i.e. the whole set of aligned text segments that it includes

Page 46: USING THEINTERNET FORSPECIALISEDTRANSLATION

Translation memory (TM) software

• Key idea: recycle similar past translations, never translate the same (or a similar) text twice

• How it works:• TM tools divide the source text – which must be in (or turned

into, e.g. with OCR) electronic/digital format –into segments, which translators can translate one-by-one in the traditional way.

• These segments (usually sentences, or even phrases) are then

sent to a built-in database. When there is a new source segmentequal or similar to one already translated, the memory retrievesthe previous translation from the database.

• When is this most useful:• for the translation of any text that has a high degree of repeated

terms and phrases which must be translated consistently, as is the case with e.g. user manuals, computer products and subsequent versions of the same document (e.g. website updates).

• mostly relevant to technical/specialised translation (notl4i7terature)

Page 47: USING THEINTERNET FORSPECIALISEDTRANSLATION

48

• Scenario

◦you have to translate the user manual of a printer (new model) from English into Italian

◦ a lot of repetition within the documentitself

◦overlap and repetitions across updated (old-new) versions of the documentation

◦ you have a relevant TM (similar topic / domain / texts / clients)

◦ you translated the previous manual(s)

◦ TM provided by client / translation agency / colleague

Using translation memory (TM) software

Page 48: USING THEINTERNET FORSPECIALISEDTRANSLATION

Source text (in language A)

ST: There are 4 ways to change print settings for this printer

Exact/Perfect match (everything in the segment is exactly the same)

A: There are 4 ways to change print settings for this printer

B: Ci sono 4 modi per cambiare le impostazioni di stampa di questa stampante

Full match (only figures, dates and similar small details are different)

A: There are 2 ways to change print settings for thisprinter

B:49 Ci sono 2 modi per cambiare le impostazioni di stampa di questa stampante

Using translation memory (TM) software

• Translation of a printer manual English (A) → Italian (B)

Page 49: USING THEINTERNET FORSPECIALISEDTRANSLATION

Source text (in language A)

ST: “There are 4 ways to change print settings for thisprinter”

Fuzzy match 85% similar (a few words in translation unit are different)

A: “There are theseveral ways to change print settings for printer”

B: “Ci sono vari modi per cambiare le impostazioni di stampa alla stampante”

Fuzzy match 60% similar (some words in translation unit are different)

A: “There are several ways to modify of yourthe default setting printer”

B: “Ci sono vari modi per modificare l’impostazione standard della tua stampante”

• With the acceptibility threshold of the TM tool set at 75%, no

candidate translation unit under that level of similarity is retrieved

a50nd shown to thetranslator!!

Using translation memory (TM) software

Page 50: USING THEINTERNET FORSPECIALISEDTRANSLATION

• CAT tools - Advantages

• can speed up the translation process and increase productivity

•can improve translation quality (by enhancing terminological and phraseological coherence)

• can help translators provide quotations

• allow for collaboration over large projects

• TMs/termbases can be shared by several translators and updated in realtime

• Useless for some text types (e.g. literature)

• Essential for many specialized/technical domains

• Translation agencies require translators to use (specific types of) CAT tools

Page 51: USING THEINTERNET FORSPECIALISEDTRANSLATION

• Technical / practical issues

• different approaches: some CAT tools have a proprietary, stand-alone text editor, others are «integrated» (e.g. to Word processor), some recent ones are fully online

• proprietary vs. interchange formats

• no matches calculated below sentence-level (e.g. at phrase level)

• but Concordance function is becoming standard

• criteria used to define similarity / matches• matching is calculated not on the basis of sentence or

word meaning, but on the basis of character-stringsimilarity

TP: I bambini giocano in gruppo con il pallone

FM1: I pampini giovano il grullo con il tallone (94% match)FM2: I bimbi si divertono giocando a calcio insieme (42% match)

16Some issues about TMs

Page 52: USING THEINTERNET FORSPECIALISEDTRANSLATION

• Language / translation issues

• segmentation implies that overall perception of the ST/TT is lost→ ST structure tends to be reproduced in TT

• cross-linguisticdifferences in e.g. cohesive patternsmight be overlooked

• using TMs limits the translator’s creativity, as s/he is usually expected to use the terminology and phraseology included in the TM

• TMs can sometimes be reversed, as if translation direction did not matter…

• need to control the reliability of translations within TM

16Some issues about TMs

Page 53: USING THEINTERNET FORSPECIALISEDTRANSLATION

CORPORA AND TRANSLATION

1

Page 54: USING THEINTERNET FORSPECIALISEDTRANSLATION

•“a collection of naturally-occurringlanguage text, chosen to characterize a

state or variety of a language” (Sinclair, 1991:171)

•“a collection of texts assumed to be representative of a given language, dialect, or other subset of a language, to be used for linguistic analysis” (Francis, 1992:7)

• “a closed set of texts in machine-readable form establishedfor general or specific purposes by previously defined criteria” (Engwall, 1992:167)

•“a finite-sized body of machine-readable text, sampled in order to bemaximally representative of the language variety under consideration”(McEnery & Wilson, 1996:23)

•“a collection of (1) machine-readable (2) authentic texts […] which is (3) sampled to be (4) representative of a particular language or language variety” (McEnery et al., 2006:5)

What is a corpus?

Some (authoritative) definitions

Page 55: USING THEINTERNET FORSPECIALISEDTRANSLATION

What is / is not a corpus…?

A newspaper archive on CD-ROM? An online glossary?A digital library (e.g. Project Gutenberg)?All RAI 1 programmes (e.g. for spoken TV

language)

The answer is always “NO”

(see definition)

Page 56: USING THEINTERNET FORSPECIALISEDTRANSLATION

Corpora vs. web•Corpora:

– Usually stable•searches can be replicated

– Control overcontents•we can select the texts to be included, or have control over selection strategies

– Ad-hoc linguistically-awaresoftware to investigate them•concordancers can sort / organise concordance lines

• Web (as accessed via Google or other search engines):– Very unstable

•results can change at any time for reasons beyond our control

– No control over contents•what/how many texts are indexed by Google’s robots?

– Limited control oversearch results•cannot sort or organise hits meaningfully; they are presented randomly

Click here for another corpus vs. Google comparison

Page 57: USING THEINTERNET FORSPECIALISEDTRANSLATION

What types of corpora exist?Abrief overview

• A corpus is a principled collection of naturally occurring electronic

texts designed to be a representative sample of language in actual use

• Some of the main features and criteria used to describe and classify

corpora:general

specialised

written

spoken (transcribed)

multimodal (audio/video)

balanced (sample)

opportunistic

synchronic

diachronic

static

dynamic

closed / finite

open-ended (monitor)

raw (pre-corpus)

marked-up (augmented)

POS-tagged (augmented)

annotated (augmented)

monolingual

bi- / multilingual

parallel

comparable

Page 58: USING THEINTERNET FORSPECIALISEDTRANSLATION

An example of planned balance:

the British National Corpus

100 m words of contemporary spoken and writtenBritish

English Representative of British English “as a whole”

Designed to be appropriate for a variety of uses:

lexicography, education, research, commercial

applications (computational tools)

Balanced with regard to genre, subject matter andstyle

Sampling and representativeness very difficult toensure

Page 59: USING THEINTERNET FORSPECIALISEDTRANSLATION

Dynamic (Monitor) vs static (Finite)

A static corpus will give a snapshot of language use at a given time

Easier to control balance of contentMay limit usefulness, esp. as time passes

A dynamic corpus is ever-changingCalled “monitor” corpus because allows us to monitor language change over time

Page 60: USING THEINTERNET FORSPECIALISEDTRANSLATION
Page 61: USING THEINTERNET FORSPECIALISEDTRANSLATION

Concordance for nodeword “eyes” (sorted 1L) generated

from theBNC

Page 62: USING THEINTERNET FORSPECIALISEDTRANSLATION

Parallel (translational) corpora

•contain translationally “equivalent” texts: STs and their corresponding TTs

•need to be aligned, usually at the sentence level, i.e. SL sentence X matched to TL sentence X’

•context is provided to account for “equivalence” and “translation shifts” between ST and TT

•translation direction needs tobe clear, i.e. which are SL and TLcomponents of the corpus

Comparable corpora

•texts originally produced (not translated) in the respective languages

•consist of independent texts which are “similar” according to some pre-determined criteria

•the various language components share a set of common features, e.g. text type, genre, publication span, domain, topic

•parameters definingthis similarity vary widely

63

Parallel vs. comparable multilingual corpora

Page 63: USING THEINTERNET FORSPECIALISEDTRANSLATION

Bilingual parallel corpora on the web

64

• OPUS corpus, opus.lingfil.uu.se

• A variety of multilingual parallel corpora• European Parliament debates (EuroParl corpus)

• European Central Bank corpus• UN documents• Subtitles (open subtitle project)• Software manuals (PHP, OO)

• …

Page 64: USING THEINTERNET FORSPECIALISEDTRANSLATION

Query

Sort +

Launch the queryChooseTL(s)

help

http://opus.lingfil.uu.se/ → EuroParl v7 search interface

Other usefulfunctions

Choose SL

Page 65: USING THEINTERNET FORSPECIALISEDTRANSLATION

66

Comparable Eng/Ita corpus on botany

Page 66: USING THEINTERNET FORSPECIALISEDTRANSLATION

Summing up: corpus use in translationMain uses:

Test/generate hypotheses as to interpretation of the source text, and as to appropriate translations

helpful when you’re dealing with little known text-types / domains helpful when you’re dealing with a little knownlanguage

Improve quality – capture subtleties of source text, produce translations which read like native speaker texts

More precisely,

Reference corpora provide insights onphraseological regularities indiscourse

Comparable corpora (automatic and manual) can be used for (contrastive) specialised/genre-controlled text analysis

Parallel corpora provide equivalents in context/evidence of translation strategies (and are more versatile than TMs)

Page 67: USING THEINTERNET FORSPECIALISEDTRANSLATION

Discuss the most important milestones in the historyof

Machine Translation, also with regard to the evolution

of MT approaches.

Question on MT

8

Page 68: USING THEINTERNET FORSPECIALISEDTRANSLATION

9

Sample answer on MT

• Important points, to mention: ☺

– “Birth” of MT, idea by W. Weaver and Memorandum (1949)

– Innovative idea of using computersto translate

– First demonstrationGeorgetown-IBM(1954)

– US government funding, 1st generation direct rule-based approaches

– FAHQMT ideal (with short explanation)

– ALPAC report (1966): gist of contentsand negativerepercussionsin the

US

Page 69: USING THEINTERNET FORSPECIALISEDTRANSLATION

• Important points, to mention: [continued]: ☺

– ’80s: PCs more and more widespread, commercial systems in US and

Europe

– 2nd generation, rule-based transferapproach

– rule-based interlinguaapproach (complex,disappointing)

– Excessive ideal of FAHQMT gradually replaced by more realistic

alternatives

– Statistical approach replaces rule-based approach (advantages)

– Increase in the number of texts in digital format to be translated

(Internet,email)

– End ’90s: TA systems availableonline (even for free)

– MT integration with CAT tools

– L1a0test developments: neural MT

Sample answer onMT

Page 70: USING THEINTERNET FORSPECIALISEDTRANSLATION

11

Sample answer onMT

• Less relevant, non essential points:

– It is not necessary to provide definitions (for MT, input, output, etc.),

you can take them for granted

– Civil applications of MT (e.g. for warcryptography)

– Individuals (apart from W. Weaver)

– Conferences and journals, specific research centres

– Not important to differentiate between countries (apart from US

beginning)

– Not necessary to mention single MT tools or firms

Page 71: USING THEINTERNET FORSPECIALISEDTRANSLATION

12

Sample answer onMT

• FOR THIS QUESTION,avoid :

– Explain pre-editingvs post-editing

– Detailed explanation of sublanguageand controlled language

– Detailed explanation of different rule-based approaches

– Detailed explanation of statistical MT approaches, the resources needed

to developthem and their advantages

– Details on the ALPAC committee

– Reviewof available online MT systems

– Detailed explanation of how MT is integrated into CAT tools