28
The European Language Resources Coordination ELRC 2 nd Conference Prof. Josef van Genabith, Dr. Andrea Lösch German Research Center for Artificial Intelligence (DFKI) 1

The European Language Resources Coordination ELRC 2nd ... · The European Language Resources Coordination – ELRC 2nd Conference ... • Training on Harry Potter and translation

Embed Size (px)

Citation preview

The European Language

Resources Coordination –

ELRC 2nd Conference

Prof. Josef van Genabith, Dr. Andrea Lösch

German Research Center

for Artificial Intelligence (DFKI)

1

2

• How do we bridge language gaps?

• How do we make sure that information does not stay in silos?

• How do we make sure that nobody is discriminated against because of language?

• How do we treat all languages in the same?

• Translation!

• Human translation

• Machine translation

• To support HT and sometimes also MT on its own

Translation

3

Human languages are complex!

4

• Human languages are:

– Elegant

– Efficient

– Flexible

– Complex

• One word/sentence may mean many

things

• Many ways of saying the same thing

• Meaning depends on context

• Literal and figurative language

(metaphor)

• Language and culture (different ways of

conceptualising the same thing)

• Language is complex

• We cannot compute it exactly

• We tried: rule-based LT …

• What do we do?

• Machine Learning

– Learns from data

– Approximate solution not perfect

• Robust

• Scalable

5

6

7

Data

Data for MT

8

Data for MT

9

• Translation everywhere• Industry

• Culture

• Travel

• Education

• EC prime producer and consumer for translation

• One of largest translation operations on the planet

• Long term & expert user of MT in public service

• State-of-the-art SMT based on EU funded

research: the Moses SMT system

Translation, the EC and Beyond

10

Why ELRC…?

11

• EC has decided to expand translation to the needs of the

public services in the member states

• Automated Translation platform of the Connecting

Europe Facility (CEF AT) to facilitate multilingual

communication and exchange of documents in key

public administration scenarios:

Consumer rights, health, public procurement, social security,

culture, justice.

Public online services: Open Data Portal, Europeana, Online

Dispute Resolution, eJustice etc. (DSIs of CEF …)

• EC has good data for its own needs: EU parliamentary debates,

EU laws etc. …

• It doesn’t have the right kind of data for the needs of national

public services and the DSIs

• Training on Harry Potter and translation weather reports ….!

• It needs the right kind of data!

• Who has the best data for their needs?

• The national public services of the member states!

• ELRC: working for the EC with the national public services to

obtain this data for the EC to provide MT services back to

national public services and CEF DSIs

Why ELRC…?

12

Who is ELRC?

13

• The ELRC Consortium

– German Research Center for Artificial Intelligence (DFKI) – Josef

van Genabith, Andrea Lösch

– Evaluations and Language Resources Distribution Agency

(ELDA) – Khalid Choukri

– TILDE – Andrejs Vasiljevs

– ILSP (Institute for Language and Speech Processing) – Stelios

Piperidis

• PLUS:

– 30 ELRC Technological NAPs (one per CEF affiliated country)

– 30 ELRC Public Services NAPs (one per CEF affiliated country)

– Legal advisors (e.g. iRights)

ELRC Workshops

14

• Past workshops:– 24.09.15 Greece

– 29.09.15 Germany

– 05.08.15 Latvia

– 23.11.15 Hungary

– 01.12.15 Cyprus

– 08.12.15 Slovenia

– 15.12.15 Czech Republic

– 26.01.16 Spain

– 28.01.16 Ireland

– 11.02.16 Estonia

– 19.02.16 Finland

– 24.02.16 Lithuania

– 26.02.16 Malta

– 01.03.16 Portugal

– 07.03.16 Copenhagen

– 09.03.16 Poland

– 10.03.16 Sweden

– 15.03.16 Italy

– 18.03.16 Bulgaria

– 23.03.16 Romania

– 14.04.16 Slovakia

– 15.04.16 Austria

– 19.04.16 Netherlands

– 21.04.16 Croatia

– 11.05.16 France

– 08.06.16 Norway

– 14.06.16 Luxemburg

15

ELRC Workshops

16

• Localized workshops in each of the 30 participating

countries

• Target audience: National public service administrations

• Goals:

– To raise awareness about the importance of language data held

by public administrations for public administrations

– To understand the needs of national public service

administrations with regard to automated translation

– To jointly identify relevant sources of multi-lingual language

resources

– To discuss any technical and legal issues involved in the use of

data for automated translation

17

18

19

Contribute

and Share

your Data

for a better

Europe

ELRC Helpdesk

20

• Continuous support for data contributors

• Accessible through

– ELRC website: http://www.lr-coordination.eu/helpdesk

– Phone, Skype, Email

• Services:

– Answering any technical questions related to the use, production,

collection, processing, and sharing of language resources.

– Answering any legal questions related to the use, production,

collection, processing, and sharing of language resources.

• Response times:

– 24 hours (simple query)

– Up to 5 days (complex query)

21

Contribute

and Share

your Data

for a better

Europe

22

23

24

25

26

27

Contribute

and Share

your Data

for a better

Europe

28

Supporting our languages is supporting Europe, and

supporting Europe is supporting our languages!