14
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

Embed Size (px)

Citation preview

Page 1: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

© Copyright 2013 ABBYYConfidential

NLP PLATFORMFOR EU-LINGUALDIGITAL SINGLE MARKET

Alexander Rylov

LTi Summit 2013

Page 2: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

Confidential

Market fragmentation

By domains By languages

Page 3: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

3Confidential

WHY SHOULD LT VENDORS

SHARE THEIR RESOURCES?

● Many of LT vendors have their own LT

● LTs are focused on particular domain/language(s)

● Resources are critical for enabling such technologies

● If case of share vendors may loose competitive advantage

Page 4: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

4Confidential

Technologies ability and restrictions

● Language specific = language centric = limited by language

● Difficulties - Controlled links ● Anaphora● Long distance links● Ellipsis

● Ontology, dictionaries, statistic = trained on limited set of data = covers only limited variety of meaning representations = sometimes good to achieve 40% of recall (NER US DoD track)

Page 5: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

5Confidential

WHAT IS

BIGDATA… ● Multilingual● Covers more than 1 domain● 85 – 90% is in unstructured

text documents● Language expression of the

same meaning vary by uncountable number of ways

Page 6: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

6Confidential

A FUNDAMENTAL NATURAL LANGUAGE TECHNOLOGYREQUIRED SCALABLE BY DOMAINS AND LANGUAGES

Page 7: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

7Confidential

ABBYY Compreno as proposal

● Interlingua approach:● semantic model is based on universal

language independent representation both for lexis and grammar

● Working Languages:● Russian, English: at the stage of

terminological and collocation expansion● German: full prototype (lexis, syntax) is

completed; at the stage of main lexis expansion (from core to periphery)

● French: full prototype is completed (tested on controlled MT task) ;

● Chinese: lexical system prototype is completed (challenged task never carried out before);

● It is proved that Compreno is a scalable technology to use for any language

Universal Semantic Hierarchy

Statistic and

machine learning

Syntactic and

semantic analysis

Page 8: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

Complete syntactic and semantic analysis

The bank was located at the bank of the river; it was closed.

The complete analysis helps overcome linguistic problems in the text, if any..

Page 9: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

9Confidential

Compreno current achievements

Russian syntax analysis 2011 Precision Recall F

Compreno 0.95 0.98 0.97

System 2 0.93 0.98 0.96

System 3 0.90 0.98 0.94

System 4 0.89 0.95 0.92

System 5 0.86 0.98 0.92

System 6 0.86 0.86 0.86

System 7 0.79 0.98 0.87

Fact Extraction 2013 Compreno System 1 Compreno System 2 Compreno System 3

Precision 0.95 0.95 0.96 0.98 0.92 0.92Recall 0.93 0.70 0.84 0.44 0.92 0.74

F-measure 0.94 0.81 0.90 0.61 0.92 0.82ABBYY advantage 14% 32% 10%

Page 10: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

10Confidential

Applications

● BigData analytics – analysis of facts, extraction of objects

● Intelligence, eDiscovery (any kind)● Search by meaning rather than by

concepts● Dialogues systems by natural language● Translation

Page 11: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

11Confidential

Few facts about Compreno

● 18 years of development● About 350 people involved● More than 2000 man-years

Page 12: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

12Confidential

Barriers for wide implementation

● At least 3 years per language● At least 30 linguists per language● At least 12M € per language

● Then support and improvement

Page 13: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

13Confidential

EU project idea

● Describe ALL EU languages● Describe Major domains: healthcare,

law, government, major industries

● ABBYY commitment:● Methodology, management, instruments

Page 14: © Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential

14Confidential

EU BENEFITS – CREATE SINGLE DIGITAL LT MARKET

● Operate not with language but with universal model of it – interlingual approach● Describe one domain in one

language – apply in all other languages

● A platform for LT vendors to create solutions and products easy scalable by languages and domains