25
(2) Dipartimento di Scienze dell’Informazione, Università di Bologna (1) Semantic Technology Laboratory ISTC-CNR Gathering Lexical Linked Data and Knowledge Patterns from FrameNet Andrea Giovanni Nuzzolese (1,2) [email protected] Aldo Gangemi (1) [email protected] Valentina Presutti (1) [email protected] K-CAP 2011 Banff, AL, Canada 27 June 2011

Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Embed Size (px)

Citation preview

Page 1: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

(2) Dipartimento di Scienze dell’Informazione, Università di Bologna(1) Semantic Technology Laboratory ISTC-CNR

Gathering Lexical Linked Data and Knowledge Patterns from FrameNetAndrea Giovanni Nuzzolese (1,2)

[email protected] Gangemi (1)

[email protected] Presutti (1)

[email protected]

K-CAP 2011Banff, AL, Canada 27 June 2011

Page 2: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Outline

• Motivations

• Semantic issues

• Transformation method

• Ongoing work

• Conclusions

Page 3: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Premise

• Work after request from Berkeley FrameNet group for a Semantic Web version of FrameNet 1.5

• Previous work had various limitations, mainly data incompleteness and implicit semantics– E.g. Scheczyk et al., Narayanan et al.

• Decided to go for a dual transformation– RDF for a complete porting to Linked Open Data,

similarly to W3C WordNet RDF porting– (customizable) OWL for a focused porting to knowledge

patterns reusable for ontology design or for creating views over linked data

Page 4: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Motivations• The web of data is exploding and NLP techniques accompany this

explosion

• Hybridizing natural language processing and semantic web techniques shows to be a promising approach

• Part of the exploitation of LOD data, is carried out by means of lexical resources that are represented directly as linked data

• Bring lexical resource on linked data (favor hybridization)– benefit from linking all lexical resources and have an homogenous more

powerful one

• Link lexical knowledge to domain knowledge– linked data ground to lexical knowledge and textual documents

Page 5: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

DBpedia

Yago

Lexvo

lingvoj

RDFWordNet

2.0

RDFWordNet

3.0

RDFFrameNet

1.5RDF

VerbNet 3.1

RDF Italian

MultiWordNet

WordNetDomains

WordNetSupersenses

WordNetFormal Glosses

OpenCyc

VerbOcean

Page 6: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Several semantic issues in reusable linguistic data

• Semantics induced by the data structure, e.g. RDB, XML, etc.

• Semantics from the linguistic model adopted

• Semantics of the corpus (e.g. sentences)

• Semantics needed for querying

• Semantics needed for reasoning

Page 7: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

FrameNet

• A lexical knowledge base – cognitive soundness– grounded in a large corpus

• Consists of a set of frames, which have– frame elements – lexical units, which pair words (lexemes) to frames– relations to corpus elements

• Each frame can be interpreted as a class of situations

Page 8: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

An example of frame

Page 9: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

FrameNet as LOD

Page 10: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

FrameNet as LOD

Page 11: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

FrameNet as ontologies

Page 12: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

StructuralSchema

LinguisticSchema

LinguisticData

CorpusData

ReferentialData

Linguistic transformation

architecture

Page 13: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Transformation approach

• We pulled out the semantics of FrameNet and its data by using Semion,

• Semion is a tool grounded on a method with two main steps– a syntactic and completely automatic transformation of the data source

to RDF datasets according to an OWL ontology that represents the data source structure

– a semantic rule-based refactoring that allows to express the RDF triples according to specific domain ontologies e.g. SKOS, DOLCE, FOAF, LMM, or anything indicated by the user.

Page 14: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Reengineering

Syntactic transformation to RDF triples

<frame name="Abounding_with" ... ID="262">...

<frameRelation type="Inherits from"><relatedFrame>

Locative_relation</relatedFrame>

</frameRelation>...</frame>

Page 15: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Refactoring

• aims to add semantics to data

• is performed by means of set of rules– i.e. SPARQL CONSTRUCT

Page 16: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

ABox Refactoring

The ABox refactoring is the process of gathering RDF data (Abox)

Rule-based

Customizable or based on recipes

Page 17: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

ABox Refactoring (data)

Page 18: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

TBox Refactoring

• The TBox refactoring is the process of gathering a new ontology schema (a TBox) from data (ABox)

Page 19: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

TBox Refactoring

Page 20: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Ongoing work

• Linking – WordNet, WN Domains, MultiWordNet, VerbNet,

FrameNet, VerbOcean (P. Pantel)

• Basic linking uses SKOS– exactMatch, closeMatch– links partly present in Colorado bank, partly in

WordNet mappings, part are newly created

• More reasoning requires some expressivity– semiotics.owl knowledge pattern, D&S– property chains

Page 21: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Conclusion

• issues related to the conversion of lexical resources – more specifically to semantic issued of FrameNet

conversion

• a method to solve those issues (supported by a tool)

• a conversion of FrameNet to RDF published as a dataset in the LOD

• a method to convert FrameNet data into knowledge patterns

Page 22: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Thank youAndrea Nuzzolese

-STLab, ISTC-CNR

&Dipartimento di Scienze dell’Informazione

University of BolognaItaly

Page 23: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

23

Semantic issues: objects• Semantic frames/verb classes as twofold creatures

– intensional polymorphic relations (aka descriptions) + situation types– Desiring(?experiencer, ?theme, ?time, ?loc, ?...)

• Frame elements/VN arguments as complex creatures– (semantic) roles + concepts

• Semantic types are a mixture– concepts, grammatical types, etc.

• Lexical units/VN class members as hybrid creatures– lexically-oriented semantic frames– bridges between semantic frames and word senses– FN lex units belong to diverse parts of speech

• Annotated sentences contain syntactical realizations of semantic frames (“exemplifications”)

– syntactic frames in VN, valences in FN

23

Page 24: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

24

Semantic issues: relations• Inheritance in FN and VN is classic, can hold for situation types safely

– needs to be treated jointly with semantic role representation– subFe also classic

• Subframes in FN are conceptual compositions (“parts of descriptions” in D&S), intensional in nature

– similarly for “excludes” and “requires” holding for FE• Frame “usage” in FN is partial inheritance, hard to digest for situation

types• Selectional restrictions in VN maybe too tough for situation types• Selectional preferences absent in resources, but probability would be

an added value• Core vs. peripheral vs. unexpressed are interesting but tough:

“characteristic”, hidden optionality, etc.

24

Page 25: Gathering Lexical Linked Data and Knowledge Patterns from FrameNet

Why a KP?

– a multidimensional context model able to capture descriptive, informational, situational, social, and formal characters of knowledge.