25
Enabling Language Resources to Expose Translations as Linked Data on the Web Jorge Gracia, Elena Montiel-Ponsoda, Daniel Vila-Suero, Guadalupe Aguado-de-Cea Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM) [email protected] Acknowledgments: LIDER and BabeLData projects 9th Language Resources and Evaluation Conference, LREC 2014 Reykjavik (Iceland) 28/05/2014

Enabling Language Resources to Expose Translations as Linked Data on the Web

Embed Size (px)

DESCRIPTION

Language resources, such as multilingual lexica and multilingual electronic dictionaries, contain collections of lexical entries in several languages. Having access to the corresponding explicit or implicit translation relations between such entries might be of great interest for many NLP-based applications. By using Semantic Web-based techniques, translations can be available on the Web to be consumed by other (semantic enabled) resources in a direct manner, not relying on application-specific formats. To that end, in this paper we propose a model for representing translations as linked data, as an extension of the lemon model. Our translation module represents some core information associated to term translations and does not commit to specific views or translation theories. As a proof of concept, we have extracted the translations of the terms contained in Terminesp, a multilingual terminological database, and represented them as linked data. We have made them accessible on the Web both for humans (via a Web interface) and software agents (with a SPARQL endpoint).

Citation preview

Page 1: Enabling Language Resources to Expose Translations as Linked Data on the Web

Enabling Language Resources to

Expose Translations as

Linked Data on the Web

Jorge Gracia, Elena Montiel-Ponsoda,

Daniel Vila-Suero, Guadalupe Aguado-de-Cea

Ontology Engineering Group (OEG)

Universidad Politécnica de Madrid (UPM)

[email protected]

Acknowledgments: LIDER and BabeLData projects

9th Language Resources and Evaluation

Conference, LREC 2014

Reykjavik (Iceland) 28/05/2014

Page 2: Enabling Language Resources to Expose Translations as Linked Data on the Web

Outline

Motivation

The translation model

Terminesp: a validating example

Conclusions

2

Page 3: Enabling Language Resources to Expose Translations as Linked Data on the Web

3

Motivation and goals

Page 4: Enabling Language Resources to Expose Translations as Linked Data on the Web

Motivation

Current multilingual lexica and electronic dictionaries

• Proprietary formats

• Non-standard APIs

• Disconnected from other resources

4

Page 5: Enabling Language Resources to Expose Translations as Linked Data on the Web

Motivation

GOAL: to allow language resources to expose

translations as Linked Data on the Web for their

consumption by semantic enabled applications in a

direct manner, not relying on application-specific

formats

5

Page 6: Enabling Language Resources to Expose Translations as Linked Data on the Web

Motivation

Objectives:

• To define a model for representing translations in RDF

• As a proof of concept:

1. Extract translations from the Terminesp terminological

database

2. Represent them in RDF with our model

3. Make them accessible both for human and machine

consumption

6

Page 7: Enabling Language Resources to Expose Translations as Linked Data on the Web

7

The translation model

Page 8: Enabling Language Resources to Expose Translations as Linked Data on the Web

The translation model

8

Page 9: Enabling Language Resources to Expose Translations as Linked Data on the Web

The translation model

9

Page 10: Enabling Language Resources to Expose Translations as Linked Data on the Web

LEXICONES

LEXICONEN

LexicalEntry LexicalSense

http://purl.org/goodrelations/v1#PaymentMethods

LexicalEntry LexicalSense

ONTOLOGY

“payment method”

“medio de pago”

The translation model

Translation (direct equivalent)

10

Page 11: Enabling Language Resources to Expose Translations as Linked Data on the Web

LEXICONES

LEXICONEN

LexicalEntry LexicalSensehttp://dbpedia.org/ontology/PrimeMinister

LexicalEntry LexicalSense

ONTOLOGY

“Prime Minister”

“Presidente del Gobierno”

http://es.dbpedia.org/resource/Presidente_del_Gobierno

ONTOLOGY

The translation model

Translation (Cultural equivalence)

11

Page 12: Enabling Language Resources to Expose Translations as Linked Data on the Web

The translation model

Characteristics of the model

• Translation as a relation between senses

• Translation relation reified additional information

can be attached to it

• Support to a variety of translation categories

• Translation categories clearly separated from the

model no commitment to specific views or

translation theories

• Translation sets group translations coming from the

same language resource, or belonging to the same

organization, for instance

• Re-use of well established vocabularies (DC, DCAT,

etc.) for provenance and additional information.

12

Page 13: Enabling Language Resources to Expose Translations as Linked Data on the Web

LexicalSense

tran

translationTarget

context

TranslationSet TranslationtranslationConfidence:double

The translation model

Translation Categories

http://purl.org/net/translation-categories

translationCategory

context

Resource

http://purl.org/net/translation.owl

Translation Module

translationSource

directEquivalent

culturalEquivalent

lexicalEquivalent

13

Page 14: Enabling Language Resources to Expose Translations as Linked Data on the Web

14

Terminesp,

a validating example

Page 15: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp, a validating example

TERMINESP

• Multilingual terminological database

• Terms and definitions from Spanish technological

standards

• More than 30K terms in Spanish, with translations into

English, German, French, Italian, …

15

Page 16: Enabling Language Resources to Expose Translations as Linked Data on the Web

lemon:LexicalEntryterminesp:38756es

lemon:LexicalEntry terminesp:38756en

lemon:LexicalSenseterminesp:38756es-sense

lemon:LexicalSenseterminesp:38756en-sense

skos:Conceptterminesp:38756

lemon:Lexiconterminesp:lexiconES

lemon:Lexicon terminesp:lexiconEN

tr:Translationterminesp:38756es-en-TR

“red”@es

“network”@en

lemon:entry

lemon:entry

lemon:sense

lemon:sensetr:translationTarget

tr:translationSource

lemon:reference

lemon:reference

ClassInstance

Legend

lemon:form

lemon:form

lemon:LexicalForm

lemon:writtenRep

lemon:writtenRep

lemon:LexicalForm

Terminesp, a validating example

16

Page 17: Enabling Language Resources to Expose Translations as Linked Data on the Web

lemon:LexicalSenseterminesp:38756es-sense

lemon:LexicalSenseterminesp:38756en-sense

Tr:TranslationSetterminesp:es-en-transet

tr:Translationterminesp:38756es-en-TR

tr:translationCategorytr:translationTarget

tr:translationSource

ClassInstance

Legend

tr:tran

trcat:directEquivalent

Terminesp, a validating example

17

Page 18: Enabling Language Resources to Expose Translations as Linked Data on the Web

Before

• MS Access database and a Web search interface

• Non standard formats and vocabularies

• Data “invisible” to software agents

• Translations implicit, not explicit

Terminesp, a validating example

18

Page 19: Enabling Language Resources to Expose Translations as Linked Data on the Web

Now

• Published on the Web as Linked Data

• Modelled using lemon and well established vocabularies

• Dereferenceable URIs

• Data “visible” to software agents

• Translations were made explicit

• Web search interface for human consumption

• SPARQL endpoint for machine consumption

Terminesp, a validating example

19

Page 20: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp for machine consumption – SPARQL endpoint

http://linguistic.linkeddata.es/terminesp/sparql-editor/

Terminesp, a validating example

20

Page 21: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp for machine consumption – SPARQL endpoint

http://linguistic.linkeddata.es/terminesp/sparql-editor/

Written representation target Lexicon target

network http://linguistic.linkeddata.es/data/terminesp/lexiconEN

Netzwerk (in der

Netzwerktopologie)http://linguistic.linkeddata.es/data/terminesp/lexiconDE

Terminesp, a validating example

21

Page 22: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp for human consumption – Web interface

http://linguistic.linkeddata.es/terminesp/search/

Terminesp, a validating example

22

Page 23: Enabling Language Resources to Expose Translations as Linked Data on the Web

23

Conclusions

Page 24: Enabling Language Resources to Expose Translations as Linked Data on the Web

Conclusions

24

Our proposal

• Model to represent translations as Linked Data on the

Web

• Terminesp as a validating example

Next steps

• Standardization through W3C Ontolex Community group

• Study possible reuse of ITS 2.0 elements

• Links of Terminesp to external resources (e.g., BabelNet)

24

Page 25: Enabling Language Resources to Expose Translations as Linked Data on the Web

Thanks for your attention !

25