Upload
lucy-software-and-services
View
376
Download
2
Embed Size (px)
DESCRIPTION
A presentation given by Juan Alberto Alonso about the co-official languages in Spain and the special role Machine Translation (MT) plays in it.
Citation preview
Bridging the Gap between Iberian Languages MT to the rescueJuan Alberto Alonso04.05.2012
Agenda Basque and Portuguese:
two special cases
MT and the Iberian languagesThe Use of MT
© Lucy Software Ibérica SL / 2
A success case with Catalan
The Use of MT
When is MT useful?
© Lucy Software Ibérica SL / 3
When is MT Useful?
When it is adapted to the user’s specific needs:
Terminology
Document format
Linguistic peculiarities
© Lucy Software Ibérica SL / 4
When is MT useful?
When it is properly used according to:
The translation quality delivered by the language-pairin question
The type of documents to be translated
The user environment where it has to be integrated
© Lucy Software Ibérica SL / 5
When is MT Useful?
When it is well integrated into the user’s document flow:
CMS, proxies, etc.
Press agencies and newspapers
Translation agencies
© Lucy Software Ibérica SL / 6
The Uses of MT
Translation qualitydoes not need to be
very highFor languages linguis-
tically more distantUseful to break language barriers(from 0% to X%)
Assimilation(Information)
Dissemination(Production)
Very high MT qualityFor closely
related languagesCan be integrated
into very complex user environments
© Lucy Software Ibérica SL / 7
Languages: Multilingualism, projection and role in the world
Official policies: Linguistic politics
Spanish is the official language in Spain, next to four co-official languages: Basque Catalan/Valencian Galician
Portuguese: New linguistic normative toward the international unification of language.
© Lucy Software Ibérica SL / 9
MT with the Iberian Languages:A Unique Case
Very high MT qualityFor closely
related languagesCan be integrated
into very complex user environments
Translation qualitydoes not need to be
very highFor languages linguis-
tically more distantUseful to break language barriers(from 0% to X%)
Assimilation(Information)
Dissemination(Production)
© Lucy Software Ibérica SL / 10
Translation qualitydoes not need to be
very highFor languages linguis-
tically more distantUseful to break language barriers(from 0% to X%)
Very high MT qualityFor closely
related languagesCan be integrated
into very complex user environments
MT with the Iberian Languages:A Unique Case
Political FactorsAssimilationDissemination
The promotion ofminority languages is apolitical issue and is supported by local
Governments
Need for huge translation volumes
© Lucy Software Ibérica SL / 11
Castilian, Catalan and Galician:An Ideal Scenario for MT
The translation quality yielded by MT among Castilian, Catalan and Galician is very high (above 95%)
Through a ramp-up phase where the MT system is adapted to the user’s needs, this quality can become even better.
The daily “normal” use of Catalan and Galician is officially encouraged and supported by the corresponding local Governments
© Lucy Software Ibérica SL / 12
Castilian, Catalan and Galician:An Ideal scenario for MT
There is a real and constant need of translation for huge documentation volumes between Castilian and Catalan (less for Galician).
MT has been used for years in productive complex environments for Castilian-Catalan (newspapers, translation agencies, Public Administrations, etc.), with millions of words MT-translated and post-edited on a daily basis and therefore...
There exists a year-long culture for productive MT use, with users and post-editors trained to use these systems. This is probably a unique case in the World
© Lucy Software Ibérica SL / 13
Castilian, Catalan and Galician:An ideal scenario for MT
Very high MT qualityFor closely
related languagesCan be integrated
into very complex user environments
Political Factors
Dissemination
The promotion ofminority languages is apolitical issue and is supported by local
Governments
Need for huge translation volumes
© Lucy Software Ibérica SL / 14
A Success Case for Spanish-Catalan:La Vanguardia
La Vanguardia is the leading newspaper in Catalonia, and one of the main newspapers in the rest of Spain, with an average daily circulation of over 200.000 copies. It is widely recognized as a quality newspaper.
Starting May 3rd 2011, La Vanguardia now has two parallel editions, one in Spanish and another in Catalan.
© Lucy Software Ibérica SL / 15
La Vanguardia: The Challenge 3 Options
Given the task of making bilingual daily editions of a newspaper, three possible options could be considered:
The “MT-less” option:Using no MT at all
The “full-MT” option:“Only” using MT
The “sensible-MT” option:Using MT + customization + human post-editors
© Lucy Software Ibérica SL / 16
La Vanguardia:The Challenge The “MT-less” Option
Duplicate the whole editorial human team OR/AND hire a team of N human translators to translate the entire newspaper content on time in order to keep both editions synchronized for publishing.
Duplicate most of the IT infrastructure
Given all these factors, the question arises of whether it would be feasible to produce bilingual editions of a newspaper this way because of Dramatic increase of costs Very tight time constraints
© Lucy Software Ibérica SL / 17 CONFIDENTIAL
La Vanguardia:The Challenge The “full-MT” Option
Run all the contents of the base edition through an MT translation system.
Publish the raw MT-translation of the original contents in the other-language edition.
Obviously, this is not an option because, even for language-pairs for which the quality of MT is very high (as it is the case for Spanish-Catalan, > 95%), the output mistakes would be unacceptable for publishing (proper nouns being translated, homographs, etc.) and the resulting Catalan style would not always sound “natural” to Catalan speakers.
© Lucy Software Ibérica SL / 18
La Vanguardia:The Challenge The “sensible MT” Option
Customize the MT-system to the specific linguistic needs of the newspaper (style guide, corporate terminology, proper nouns, etc.)
Integrate the MT-flow within the newspaper editorial flow (document and character formats, connection to a post-edition environment, feedback processing, etc.)
Incorporate a post-edition environment to be used by a team of human post-editors into the editorial flow.
Here we have a compromise between the MT-use (time and effort saving) and the translation quality.
© Lucy Software Ibérica SL / 19
Requirements from La Vanguardia
One daily copy of La Vanguardia includes over 60.000 words, all of them to be translated, revised and post-edited.
The Catalan edition should comply with the linguistic requirements stated in the Style Guide of La Vanguardia.
Both editions should be ready for printing every day at 23:30 the latest.
Currently, most journalists at La Vanguardia write in Spanish, which is now the base edition, out of which the Catalan edition is created, but
At short/mid-term every journalist will be free to write in the language of his/her choice (Catalan or Spanish), so that, actually, there will be no base edition.
Both the MT-system and the post-edition environment should be completely integrated into their editorial flow (both IT-integration and human team integration).
© Lucy Software Ibérica SL / 20
How the MT-System was Customized forLa Vanguardia
Computational linguists, post-edition experts, and La Vanguardia editorial team worked together for six months in order to
Customize the MT-system to their linguistic requirements (as far as possible)
Over 20.000 lexical entries added/changed in the MT-system lexicons Around 440 rules adapted in the MT-system grammars.
Integrate the MT-system into their IT editorial environment. Integration with HERMES CMS. La Vanguardia specific character format and XML tag handling Inclusion of markups specifically designed for post-editors Translation performance to meet the translation load & peaks
requirements.
A team of around 15 persons has been trained on post-editing the MT-output before publishing.
© Lucy Software Ibérica SL / 21
La VanguardiaConclusions
Producing two parallel bilingual editions of a daily newspaperonly seems to be feasible if:
MT is used
MT is properly customized, adapted and integrated to the newspaper linguistic and IT requirements.
There is a team of trained specialized human post-editors who correct MT mistakes and “give the human flavor” to the output.
© Lucy Software Ibérica SL / 22
Portuguese: A Different Scenario
Portuguese is one of the Iberian languages with a high-level business potential (both in Portugal and Brazil/South America)
The translation quality given by MT-Systems between Portuguese and Spanish is very high (similar to the one among Castilian, Catalan and Galician.
However, in the case of Portuguese, the key factor is the Business needs and opportunities and not the political drive.
© Lucy Software Ibérica SL / 23
Very high MT qualityFor closely
related languagesCan be integrated
into very complex user environments
There is a wide Market asking for quality translation
between Portugueseand Spanish
Need for huge translation volumes
Portuguese: a Different Scenario
Market NeedsDissemination
© Lucy Software Ibérica SL / 24
Basque: yet Another Different Scenario
CA: El basc és un cas particular entre les llengües de la Península Ibèrica
ES: El vasco es un caso particular entre las lenguas de la Península Ibérica
GL: O vasco é un caso particular entre as linguas da Península Ibérica
EU: Iberiar Penintsulako hizkuntzen artean euskara kasu berezia da
PT: O basco é um caso particular entre as línguas da Península Ibérica
EN: Basque is a special case among the languages of the Iberian Peninsula
© Lucy Software Ibérica SL / 25
Basque: yet Another Different Scenario
The promotion ofminority languages is apolitical issue and is supported by local
Governments Need for huge
translation volumes
PoliticalFactors
AssimilationDissemination
Enough MT quality forrestricted domains
For closely related languages
Can be integratedinto very complex user
environments
Translation qualitydoes not need to be
very highFor languages linguis-
tically more distantUseful to break language barriers(from 0% to X%)
© Lucy Software Ibérica SL / 26
Basque: yet Another Different Scenario
Basque is a special case among the Iberian languages: It is not an Indo-European language. It is linguistically very
different from the rest of Iberian languages (and, incidentally, also from any other human language).
The MT translation quality between Basque and Castilian, Portuguese, Galician or Catalan will be lower than the one obtained among the latter four.
Adapted for restricted domains, the MT quality can be sufficient for productive use.
Its daily “normal” use is being encouraged and supported by the Basque Government.
The use of MT to translate from Basque into Castilian, Catalan, Portuguese, Galician or English is a good example of assimilation use (breaking language barriers).© Lucy Software Ibérica SL / 27
Lucy Basque MT Portal
The first MT-systems with Basque already exist and new oneswill be developed at short/mid-term
© Lucy Software Ibérica SL / 28
Questions?
© Lucy Software Ibérica SL / 29
Thank you for your attention!
Juan A. AlonsoLucy Software Ibérica
© Lucy Software Ibérica SL / 30