21
Beyond Data Delivering Machine Translation with Subject Matter Expertise John Tinsley Director / Co-Founder TAUS MT Showcase. 4 th June 2014, Dublin

Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Embed Size (px)

Citation preview

Page 1: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Beyond Data Delivering Machine Translation with

Subject Matter Expertise

John TinsleyDirector / Co-Founder

TAUS MT Showcase. 4th June 2014, Dublin

Page 2: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

We provide Machine Translation solutions with Subject Matter Expertise

Page 3: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

We do this using Linguistic Engineering

Page 4: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

An “ensemble” MT architecture

Page 5: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

The world’s first and only patent specific MT system that’s ready to go

Page 6: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Data EngineeringWhat is Linguistic Engineering?

Pre-processing Post-processing

Input Output

Training Data

Page 7: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Patents: an MT nightmare

L is an organic group selected from -CH2-(OCH2CH2)n-, -CO-NR'-, with R'=H or C1-C4 alkyl group; n=0-8; Y=F, CF3 …

maximum stress of 1.2 to 3.5 N/mm<2> and a maximum elongation of 700 to 1,300% at 0[deg.] C.

Long Sentences

Technical constructions

Largest single document: 249,322 words

Longest Sentence: 1,417 words

Page 8: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Data EngineeringWhat is Linguistic Engineering?

Pre-processing Post-processing

Input Output

Training Data

Page 9: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Data Engineering + Linguistic EngineeringAn “ensemble” architecture

Chinese pre-ordering rules

StatisticalPost-editing

Input

Output

Training Data

Spanish med-deviceentity recognizer Multi-output

Combination

Korean pharmatokenizer

Patent inputclassifier

Client TM/terminology (optional)

Japanese scriptnormalisation

GermanCompounding rules

Moses

RBMT

Moses

Moses

Page 10: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

If you don’t understand it, you can’t translate it

MT with Subject Matter Expertise

“Allopurinol-induced serious cutaneous adverse reactions (SCAR), including Steven Johnson’s syndrome

(SJS) and toxic epidermal necrolysis (TEN), are associated with a genetic marker, the HLA-B*5801

allele.”

“IPTranslator is perfect for someone who needs to search [patents] across multiple languages and with is useful in the case of both patentability and infringement searches.”

– Aalt van de Kuilen, Global Head of Patent Information, Abbott

Machine Translation for Patents

Page 11: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

What is the value for users?

Specialist solutions deliver more useable outcomes for the user

Post-editing

For information purposes

Multilingual search

Increased productivity

Extract more meaning

Retrieve more relevant results

=

=

=

Page 12: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

De-risking the machine translation proposition

What is the value for users?

+ Data + Time + €€€ = ???

+ No data needed + Systems are ready to go + No upfront cost= Evaluate immediately

Our PrerequisitesTypical Prerequisites

Customisation. Refinement.

» Incorporation of user feedback» Incremental training with post-edits» Tuning for specific input types

Page 13: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Iconic in practice

client case study

Page 14: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Iconic in practice

Iconic had a domain-specific MT solution for that industry

Machine Translation technology for the legal industry

Business Need

Page 15: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Iconic in practice

Delivered immediately and initial results were positive

Translation samples required for initial evaluation

Process (1)

Page 16: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Iconic in practice

“The complexities and unforeseen but inevitable surprises of MT integration in large scale production processes were handled both competently and efficiently.”

Integrate Iconic with GlobalSight for productivity pilot

Process (2)

Page 17: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Iconic in practice

>20% productivity increase for translator post-editing Iconic output

“Iconic delivered measurable productivity gains from the outset”

Performance

Page 18: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Iconic in practice

•  Ongoing improvement through feedback from translators•  Ongoing improvement through the incorporation of post-edits

•  More than 4 million words translated to date for Asian languages•  Periodic roll-out of new languages over time

Looking forward

Page 19: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Need: short-term solution to provide on-demand translation through a web search interface

Iconic in practice

Process: integrate directly through Iconic API and evaluate quality and throughput concurrently

Outcomes: in 5 months of production for English-Portuguese alone, we processed:

•  15,526 translation requests•  14,606,374 words

Page 20: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

All content is not created equal

We cannot afford to be dogmatic when it comes to MT

Domain specific MT is about more than just data

Know your subject matter!

Take home messages…

Page 21: Beyond Data: Delivering Machine Translation with Subject Matter Expertise

Thank You! [email protected]

@IconicTrans