25
Catalan daily goes Catalan LocWord 2012, A4 Magí Camps (La Vanguardia) Blanca Vidal (Lucy Software)

Catalan daily goes Catalan

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Catalan daily goes Catalan

Catalan daily goes Catalan

LocWord 2012, A4Magí Camps (La Vanguardia)Blanca Vidal (Lucy Software)

Page 2: Catalan daily goes Catalan

[1] Introduction, background

79.239

45.309

31.762

15.6626.779

0

10.000

20.000

30.000

40.000

50.000

60.000

70.000

80.000

90.000

Newspapers in CatalanNet Circulation

Source: Estudi General de Mitjans (EGM), 2012

Page 3: Catalan daily goes Catalan

Introduction, background

Increase +4% of copies+7% of readers

Distribution57% Spanish43% Catalan

Results

Page 4: Catalan daily goes Catalan

Introduction, background

Why a Catalan version?Celebration of LV’s 130 anniversaryNormalization of the use of Catalan

Investment to face the crisisOpportunity to consolidate LV’s hegemony

Page 5: Catalan daily goes Catalan

[2] Customer goals

To publish two language editions of the same

newspaper daily (supplements incl.).

Journalists should be able to write in

any of the two languages.

Neither quality nor distribution

timeframes should be affected.

Page 6: Catalan daily goes Catalan

• Tailor-made system• Complying with LV’s style guide• Seamless integration into journalist’s

workflow• Translation of Hermes XML and

InDesign formats• Reliability, high availability• High performance

MT

Customer requirements

Page 7: Catalan daily goes Catalan

[3] Ramp-up phaseProject set-up

Work areas MT linguistic improvement/tuning Post-editing preparationMT system set-up and integrationMT lexicon training

Duration 8 months (+ 3 months)

Staff LV: 10-12 in-house journalistsLucy: 3 computational linguists / lexicographers 1 software developerIncyta: 2 professional post-editors

Important! On-site support

Page 8: Catalan daily goes Catalan

SubphasesTASKS Phase 1 Phase 2 Phase 3 Phase 4

Linguistic improvement/tuning

- Language-type definition x

- Creation of a corpus of real texts x x x x

- Analysis of the translation quality x x x x

- Error reporting (lexicon and grammar errors) x x x x

- Linguistic implementation (lex and grammar) x x x x

- Pre and post-editing filters x x x x

Post-editing preparation

- Gathering of MT post-editing guidelines x

- Evaluation of post-editing effort x x

- Creation and training of the post-editing team x

Technical set-up

- System set-up and integration x

- Preparation of XML converters x

Maintenance

- Lexicon maintenance training x

Duration 2 mo 3 mo 3 mo 3 mo

Page 9: Catalan daily goes Catalan

[a] Linguistic tuning

Language model

Corpus

Translation quality (TQ)

Analysis and error-reporting

Implementation

Accomplished improvement data

Page 10: Catalan daily goes Catalan

Linguistic tuning

Catalan language model• no exclusion• compliant with standards• innovative in terminology• dynamic in syntactical structures

Corpus• ES: 500,000 transl. units – 8,300,000

words• CA: 250,000 transl. units – 3,000,000

words

Page 11: Catalan daily goes Catalan

Conclusions• No specific domains (except Sports)• Culture: proper names• Opinion: idioms, plays on words• Errors not repetitive• % style to be post-edited

Linguistic tuningTranslation Quality

Minimal post-

editing24%

Perfect74%

Medium post-edit

2%

Page 12: Catalan daily goes Catalan

Linguistic tuning

Analysis and error reporting• Semi-automatic detection of missing words• Terminology lists• New and different translations, error

reporting

Implementation• Proper names [44.5 % of the TUs ]• Idioms• Alternatives

Page 13: Catalan daily goes Catalan

Linguistic tuning

Accomplished improvement data• Work in figures

40,000 lexicon entries (20,000 for each transl. direction)Around 440 grammar rulesAround 7,200 words in the proper names files (each transl. dir)

• Non-measurable workUnderstanding of the MT systemUnderstanding of the newspaper specificitiesSupport in the style guide taking into account MT

• ImprovementES>CA 41% diff => 35% better , 4% similar, 2% worseCA>ES 36% diff => 32% better, 3% similar, 1% worse

Page 14: Catalan daily goes Catalan

[b] Post-editing

Page 15: Catalan daily goes Catalan

Post-editing

Metrics on translation volume

Metrics on post-editing effortSpecificities of the

text Post-editors workspace

Post-editing resources Error reporting

process and tools

Post-editing team and profile

Page 16: Catalan daily goes Catalan

Post-editing: metrics

FileTotal

translation unitsLex/gram

post-edition %Style

post-edition %

LV_2010-10-27 2,474 464 18.79% 394 15.96%

Conclusions

• Different sections had different levels of post-editing• What style corrections could be avoided?• Post-editing speed: 1,000-1,500 words/h• Daily volume: 75,000 words• New post-editing team: 20 post-editors/12 editors

(= 42.512 words)

Page 17: Catalan daily goes Catalan

Post-editing: resources, workspace

Post-editors should have proficiency in their skills BUT also

Be trained on MT post-ed

Have an integrated workspace

Have resources at a click

Post-editing guide

Adapt CMS to new workflow

Resources on Intranet language

portal

Classified frequent MT

errors

Reference document for

training

New processing

status

New mark-ups

Bilingual style guide

Links to all reference

dictionaries

MT portal for any journalist

Page 18: Catalan daily goes Catalan

Post-editing: resources, workspace

La Vanguardia’s intranet: linguistic portal

Page 19: Catalan daily goes Catalan

Post-editing: error reporting, team

Error reporting

• Crucial for continuous improvement• Not automated (yet)• Provide better support to error reporting

Definition of post-editing profile and team

• Proficient in Catalan• Journalist background

Page 20: Catalan daily goes Catalan

[c] System integration

During phase 1: pre-production• Pre-production set-up and installation• Hermes XML converter• Changes in the LT engine to translate

InDesign files

During phase 3: production• Production installation• Test (load, performance and stress)• Performance 500-1,200 w/sec• Definition of the final installation size

Page 21: Catalan daily goes Catalan

System integration

• Production: balanced high performance (HP) and high availability (HA) configuration• System requirements: normal Windows Server -> low HW footprint (e.g. Dual Core/Quad 2.5-3 GHz, 2-4 GB RAM running Win Server 2003/2008)

MaintenancePre-production

HermesInDesign

Language portal

Production

InDesignHermes

Web Service Web Service

Page 22: Catalan daily goes Catalan

[4] Operation: production process

Staff• 20 post-editors• 12 editors

Effort• 30’ linguistic review• 10’ journalistic review• 70,000 words/day + suppl.

Timeline• Start 5 p.m.• First edition 11.30 p.m. • Second edition 2.30 a.m.

Page 23: Catalan daily goes Catalan

Operation: production process

Challenge accomplished!

Page 24: Catalan daily goes Catalan

[5] Next goals

Success! Yes.Thanks to • Close work and

cooperation• Three parties involved• Time and effort

investment• Customisation

Next!• How to reduce post-

editing effort• How to re-use post-

edited text

Page 25: Catalan daily goes Catalan

Thank you for your attention

Magí CampsLa [email protected]

Blanca VidalLucy Software Ibé[email protected]

Ignasi [email protected]