Upload
jorge-gracia
View
163
Download
0
Embed Size (px)
DESCRIPTION
Language resources, such as multilingual lexica and multilingual electronic dictionaries, contain collections of lexical entries in several languages. Having access to the corresponding explicit or implicit translation relations between such entries might be of great interest for many NLP-based applications. By using Semantic Web-based techniques, translations can be available on the Web to be consumed by other (semantic enabled) resources in a direct manner, not relying on application-specific formats. To that end, in this paper we propose a model for representing translations as linked data, as an extension of the lemon model. Our translation module represents some core information associated to term translations and does not commit to specific views or translation theories. As a proof of concept, we have extracted the translations of the terms contained in Terminesp, a multilingual terminological database, and represented them as linked data. We have made them accessible on the Web both for humans (via a Web interface) and software agents (with a SPARQL endpoint).
Citation preview
Enabling Language Resources to
Expose Translations as
Linked Data on the Web
Jorge Gracia, Elena Montiel-Ponsoda,
Daniel Vila-Suero, Guadalupe Aguado-de-Cea
Ontology Engineering Group (OEG)
Universidad Politécnica de Madrid (UPM)
Acknowledgments: LIDER and BabeLData projects
9th Language Resources and Evaluation
Conference, LREC 2014
Reykjavik (Iceland) 28/05/2014
Outline
Motivation
The translation model
Terminesp: a validating example
Conclusions
2
3
Motivation and goals
Motivation
Current multilingual lexica and electronic dictionaries
• Proprietary formats
• Non-standard APIs
• Disconnected from other resources
4
Motivation
GOAL: to allow language resources to expose
translations as Linked Data on the Web for their
consumption by semantic enabled applications in a
direct manner, not relying on application-specific
formats
5
Motivation
Objectives:
• To define a model for representing translations in RDF
• As a proof of concept:
1. Extract translations from the Terminesp terminological
database
2. Represent them in RDF with our model
3. Make them accessible both for human and machine
consumption
6
7
The translation model
The translation model
8
The translation model
9
LEXICONES
LEXICONEN
LexicalEntry LexicalSense
http://purl.org/goodrelations/v1#PaymentMethods
LexicalEntry LexicalSense
ONTOLOGY
“payment method”
“medio de pago”
The translation model
Translation (direct equivalent)
10
LEXICONES
LEXICONEN
LexicalEntry LexicalSensehttp://dbpedia.org/ontology/PrimeMinister
LexicalEntry LexicalSense
ONTOLOGY
“Prime Minister”
“Presidente del Gobierno”
http://es.dbpedia.org/resource/Presidente_del_Gobierno
ONTOLOGY
The translation model
Translation (Cultural equivalence)
11
The translation model
Characteristics of the model
• Translation as a relation between senses
• Translation relation reified additional information
can be attached to it
• Support to a variety of translation categories
• Translation categories clearly separated from the
model no commitment to specific views or
translation theories
• Translation sets group translations coming from the
same language resource, or belonging to the same
organization, for instance
• Re-use of well established vocabularies (DC, DCAT,
etc.) for provenance and additional information.
12
LexicalSense
tran
translationTarget
context
TranslationSet TranslationtranslationConfidence:double
The translation model
Translation Categories
http://purl.org/net/translation-categories
translationCategory
context
Resource
http://purl.org/net/translation.owl
Translation Module
translationSource
directEquivalent
culturalEquivalent
lexicalEquivalent
13
14
Terminesp,
a validating example
Terminesp, a validating example
TERMINESP
• Multilingual terminological database
• Terms and definitions from Spanish technological
standards
• More than 30K terms in Spanish, with translations into
English, German, French, Italian, …
15
lemon:LexicalEntryterminesp:38756es
lemon:LexicalEntry terminesp:38756en
lemon:LexicalSenseterminesp:38756es-sense
lemon:LexicalSenseterminesp:38756en-sense
skos:Conceptterminesp:38756
lemon:Lexiconterminesp:lexiconES
lemon:Lexicon terminesp:lexiconEN
tr:Translationterminesp:38756es-en-TR
“red”@es
“network”@en
lemon:entry
lemon:entry
lemon:sense
lemon:sensetr:translationTarget
tr:translationSource
lemon:reference
lemon:reference
ClassInstance
Legend
lemon:form
lemon:form
lemon:LexicalForm
lemon:writtenRep
lemon:writtenRep
lemon:LexicalForm
Terminesp, a validating example
16
lemon:LexicalSenseterminesp:38756es-sense
lemon:LexicalSenseterminesp:38756en-sense
Tr:TranslationSetterminesp:es-en-transet
tr:Translationterminesp:38756es-en-TR
tr:translationCategorytr:translationTarget
tr:translationSource
ClassInstance
Legend
tr:tran
trcat:directEquivalent
Terminesp, a validating example
17
Before
• MS Access database and a Web search interface
• Non standard formats and vocabularies
• Data “invisible” to software agents
• Translations implicit, not explicit
Terminesp, a validating example
18
Now
• Published on the Web as Linked Data
• Modelled using lemon and well established vocabularies
• Dereferenceable URIs
• Data “visible” to software agents
• Translations were made explicit
• Web search interface for human consumption
• SPARQL endpoint for machine consumption
Terminesp, a validating example
19
Terminesp for machine consumption – SPARQL endpoint
http://linguistic.linkeddata.es/terminesp/sparql-editor/
Terminesp, a validating example
20
Terminesp for machine consumption – SPARQL endpoint
http://linguistic.linkeddata.es/terminesp/sparql-editor/
Written representation target Lexicon target
network http://linguistic.linkeddata.es/data/terminesp/lexiconEN
Netzwerk (in der
Netzwerktopologie)http://linguistic.linkeddata.es/data/terminesp/lexiconDE
Terminesp, a validating example
21
Terminesp for human consumption – Web interface
http://linguistic.linkeddata.es/terminesp/search/
Terminesp, a validating example
22
23
Conclusions
Conclusions
24
Our proposal
• Model to represent translations as Linked Data on the
Web
• Terminesp as a validating example
Next steps
• Standardization through W3C Ontolex Community group
• Study possible reuse of ITS 2.0 elements
• Links of Terminesp to external resources (e.g., BabelNet)
24
Thanks for your attention !
25