View
8
Download
0
Category
Preview:
Citation preview
19/09/2013
Machine Translation at the EPO
Removing language barriers from patent documentation
Paul Schwander
European Patent Office
The 5th Workshop on Patent Translation, MT Summit 2013
Roadmap
The context: why is MT strategic?
Machine Translation @ the EPO:
state of play and future plans
Why Machine Translation?
- Reducing the language
barrier in the European
context: Unitary Patent
system supported by MT.
- Access to global patent
information for prior art
searches.
Global patent filings rising continuously, especially Chinese applications
IP5= Europe, USA, Japan, China and Korea
Accessing Asian languages patent for EPO
examiners
Addressing the Chinese language wall
MT Full-text acquired, ca. 5 milion documents.
An on demand manual translation service offered to examiner
5 million patents: manual translation -> 1 day a patent -> 22 years of work for a team of 1000 translators.
Search in
MTed text
Relevant patents
Order a manual
translation
Detect
Understand
Patent Translate launched on 29 February 2012
System integrated in Espacenet,
the EPO Publication Server and EPOQUE
Patent Translate: how does it work?
Result of a collaboration between the EPO and Google
Patent data represent a huge source of corpora.
Patent documents and their translation/corresponding documents are prepared and stored in a corpora repository.
Translation system is trained using this corpora.
Translation quality assessed before launch: test fit for purpose level.
Translation
Memory
Translation
(Google)
Patent Translate: Architecture
Patent Translate Service
EPO
API (REST)
Corpora
Repository
Espacenet Bulk GPI Examiner tools Publication Server
EPO
«Functionblock»
Business Services::ApplicationBE
EPO«Functionblock»
SysManag.::Monitoring
«Functionblock»
SysManag.::Installation
«file»
ConfigurationFile
«Component»
Superinstaller
uses/provides
«file»
Monitoring::Checks
«document»
InstallationGuideline
Installation-GUI
Actors::ITAdministartor
«Component»
LiveUpdate
Update Plugin
«Component»
EPO::EPO update server
«file»
ExternalSoftware
«file»
OLF Software build
«file»
Installation Packages
provides
provides
Update-GUI
Actors::PatentApplicant
soap via http
«file»
OLF & NO Plugins
CLI
«Component»
ApplicationBE::FileManagerBE
provides
install
National
Patent
Offices
Quality level ranking
Current achievements
Patent Translate now covers translations between English and
21 other languages: Bulgarian, Chinese, Czech, Danish, Dutch,
Finnish, French, German, Greek, Hungarian, Icelandic, Italian,
Japanese, Norwegian, Polish, Portuguese, Romanian, Slovak,
Slovenian, Spanish and Swedish
305 million different machine translations of complete patent
documents can be accessed 'on the fly', using the current
language pairs offered = 1500 years of work for 1000 translators
if done manually
Patent Translate usage
Plans
Project to be completed end of 2014:
32 languages European and Asian.
2013-2014
Turkish, Estonian, Croatian, Latvian,
Lithuanian, Albanian, Macedonian, Serbian,
Russian and Korean
Patent Translate : illustrative example (Description)
MT Example
MT Example
Topics around Patent MT
CLIR or not?
OCR combined with MT for
non-digitised collections
Fit for purpose quality
assessment
Perception of the quality
Conclusion
MT is more than ever a must in the context of
the global patent documentation:
– The size of patent collections to search is
increasing and systematic manual translation
is not an option.
– MT has proven to be fit for purpose
– Quality will continue improving
Recommended