52
Semantic Web Machine Reading with FRED STLab, ISTC-CNR (Rome/Catania, Italy)

Fred sw jpaper2017

Embed Size (px)

Citation preview

Page 1: Fred sw jpaper2017

Semantic Web Machine Reading with FRED

STLab, ISTC-CNR (Rome/Catania, Italy)

Page 2: Fred sw jpaper2017

STLab team

Semantic Web machine reading with FRED. Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Nuzzolese, Francesco Draicchio, Misael Mongiovì. Semantic Web, vol. Preprint, no. Preprint, pp. 1-21, 2016.http://content.iospress.com/articles/semantic-web/sw240DOI: 10.3233/SW-160240

Page 3: Fred sw jpaper2017

Outline

• Overview of FRED

• Main mappings and heuristics (the method behind)

• FRED pipeline (implementation)

• Quality, importance, impact (evaluation)

• Conclusion

Page 4: Fred sw jpaper2017

Outline

• Overview of FRED

Page 5: Fred sw jpaper2017

FRED: machine reader

http://wit.istc.cnr.it/stlab-tools/fred

http://wit.istc.cnr.it/stlab-tools/fred/fredlib

Page 6: Fred sw jpaper2017

FRED: machine reader

John Coltrane played with Miles Davis in Kind of Blue

domain and task independentsemantic middleware

Page 7: Fred sw jpaper2017

Formal knowledge graph from text

OWL semantics+

OWL n-ary relation pattern

Valentina gave Aldo a book by Charlie Mingus

Page 8: Fred sw jpaper2017

FRED graph for

Valentina gave Aldo a book by Charlie Mingus

Page 9: Fred sw jpaper2017

(ROOT(S

(NP (NNP John) (NNP Coltrane))(VP (VBD played)

(PP (IN with)(NP (NNP Miles) (NNP Davis)))

(PP (IN in)(NP

(NP (NNP Kind))(PP (IN of)

(NP (NNP Blue))))))))

root(ROOT-0, played-3)nn(Coltrane-2, John-1)nsubj(played-3, Coltrane-2)nn(Davis-6, Miles-5)prep_with(played-3, Davis-6)prep_in(played-3, Kind-8)prep_of(Kind-8, Blue-10)

___________________________ _____________ |x0 x1 x2 x3 | |e4 | |...........................| |.............| (|named(x0,john_coltrane,per)|A|play(e4) |)|named(x1,kind,loc) | |Actor1(e4,x0)| |named(x2,blue,loc) | |of(x1,x2) | |named(x3,miles_davis,loc) | |in(e4,x1) | |___________________________| |Actor2(e4,x3)|

|_____________|

Boxer’s DRT boxingStanford’s dependency parsing,

coreference resolution

Tagme’s linkingBabelfy’s (or UKB’s) linking

Language recogniser +BING off the shelf translator

Multiple NLP tools

Page 10: Fred sw jpaper2017

Semantic web resources

• DBpedia

• schema.org

• WordNet-RDF

• BabelNet

• VerbNet-RDF

• FrameNet-OWL

• OntoWordNet

• DOLCE-Zero

• Ontology Design Patterns

• NIF

• Earmark

Page 11: Fred sw jpaper2017

Main contributions of FRED

• A novel form of machine reading: from text to knowledge graph

• A method for integrating multiple NLP task outputs into a knowledge graph

• Formal mappings between different theories (implemented as a set of heuristics)

• Ontology design patterns for the Semantic Web• Reference cognitive semantics for interpretation

i.e. Frame Semantics

Page 12: Fred sw jpaper2017

Outline

• Main mappings and heuristics (the method behind)

Page 13: Fred sw jpaper2017

Main mappings and heuristics

• From DRS to RDF/OWL

• Tense, modality and negation

• Compositional semantics, taxonomy induction and quality representation

• Periphrastic relations

Page 14: Fred sw jpaper2017

From DRS to RDF/OWL

• The core of FRED takes Discourse Representation Structures as input (based on Hans Kamp’s DRT)

• DRS informally called boxes (graphical representation)

• Discourse referents and conditions (their interpretation)

• Subset of FOL, arbitrary entities, Neo-Davidsonian events

• DRSs can become very complex as natural lanuguageexpression are rarely simple

• Boxer

People love movies

Page 15: Fred sw jpaper2017

DRS basic patterns

1. Boxes only have a syntactic role in Boxer’s result, meaning that FRED only needs to focus on representing their content and linking them to the rest

2. Boxes provide a unified relation to a complex state of affair (usually expressed in the text by a copula), meaning that they have their own semantics to be represented

Page 16: Fred sw jpaper2017

Box patterns 1)

People love movies

Page 17: Fred sw jpaper2017

Box patterns 1

Valentina is happy

Valentina is a researcher

Page 18: Fred sw jpaper2017

Box patterns 2)

Valentina is Gianni’s fifth daughter, from his second marriage

Page 19: Fred sw jpaper2017

Complex boxes

• Boxes can be composed based on and, or, entails

• They can be nested, negated, contain several events or none

• Vocabularies: • Boxer taxonomy of types (boxer:)• Situations, formal relations and other properties

(boxing:)

Page 20: Fred sw jpaper2017

Tense, modality and negationTense: Time intervals + Allen’s algebra

Modality and negation: lightweight RDF semantics

Rahm Emanuel says he won’t resign over police shooting

Page 21: Fred sw jpaper2017

Lightweight negation, why?

• Ambiguous negation scope, same syntactic structure

John did not go to school by car

¬∃e(go(e, John, s, c)⋀ Event(e)⋀ School(s) ⋀Car (c))

∃e(go(e, John, s, c) ⋀ Event(e)⋀School(s)⋀¬Car(c)

Page 22: Fred sw jpaper2017

Negation: lightweight representation

• FRED only annotates the referenced event with the information that its truth value is false

• boxing:hasTrueValue

• No assumption is made on the impact of the negation over logical quantification and scope

Page 23: Fred sw jpaper2017

Lightweight negation representation: limits and benefits

• Limit: no triggering of any automatic reasoning

• Benefit: hook to the negated event for client that can apply their desired interpretation

Page 24: Fred sw jpaper2017

Negation

Rahm Emanuel says he won’t resign over police shooting

Page 25: Fred sw jpaper2017

Modality: lightweight representation

• No formal constructs in OWL for modality

• boxing:Modality• boxing:Necessary (will, must, should, etc.)• boxing:Possible (may, might, etc.)

:resign_1 boxing:hasTruthValue boxing:False ;

boxing:hasModality boxing:Necessary .

“will not resign”

Page 26: Fred sw jpaper2017

Negation

Rahm Emanuel says he won’t resign over police shooting

Page 27: Fred sw jpaper2017

Compositional semantics, taxonomy induction, and quality representation

• Compound terms: FRED recognises two main patterns

• Dependency relation• Co-referent predicates

• FRED creates a new class e.g. PoliceShooting

• The modifier ”police” of the main concept (“shooting) provides a distinguished feature to the more specific class (differentia specifica)

Rahm Emanuel says he won’t resign over police shooting

Page 28: Fred sw jpaper2017

Compositional semantics: nouns as differentia specifica

Rahm Emanuel says he won’t resign over police shooting

fred:PoliceShootingrdfs:subClassOf fred:Shooting ;dul:associatedWith fred:Police .

Page 29: Fred sw jpaper2017

Compositional semantics: adjectives as differentia specifica

• Adjectives have an unpredictable and unstable semantics (Morzycki’s sectivity)

• Four main representation patterns implemented based on an extension of Morzycky’s theory

Aldo Gangemi, Andrea G. Nuzzolese, Valentina Presutti, and Diego Reforgiato Recupero. Adjective semantics in open knowledge extraction. In FOIS 2016, pp.167-180.http://ebooks.iospress.nl/volumearticle/44244. DOI: 10.3233/978-1-61499-660-6-167

Page 30: Fred sw jpaper2017

• Canadian surgeon (intersective):surgeon_1 a :CanadianSurgeon .:CanadianSurgeon rdfs:subClassOf :Surgeon .:surgeon_1 dul:hasQuality :Canadian .:CanadianSurgeon hasQuality :Canadian .

• Skilful surgeon (subsective):surgeon_1 a :SkilfulSurgeon .:SkilfulSurgeon rdfs:subClassOf :Surgeon .:SkilfulSurgeon hasQuality :Skilful .

• Alleged surgeon (modal):surgeon_1 a :AllegedSurgeon .:AllegedSurgeon dul:AssociatedWith :Surgeon .:AllegedSurgeon hasModality :Alleged .

• Fake surgeon (privative):surgeon_1 a :FakeSurgeon .:AllegedSurgeon dul:AssociatedWith :Surgeon .:AllegedSurgeon hasQuality :Fake .

Compositional semantics: adjectives as differentia specifica

Page 31: Fred sw jpaper2017

Periphrastic relations

• Relations expressed with prepositions (of, with, for, in, etc.)

• Naming object properties with preposition is meaningless• sister of, survivor of

• FRED identifies the noun to be assciated with the preposition and puts it before it to form a meaningful label

fred:survivor_1fred:survivorOf fred:expedition_1;rdf:type fred:Survivor .

fred:expedition_1 rdf:type fred:Expedition .

Page 32: Fred sw jpaper2017

Additional NLP integrations

• Named Entity Recognition• Used for identifying elements that should have a

corresponding OWL individual in the knowledge graph• Entity linking

• owl:sameAs axioms (DBpedia)• Co-reference resolution and role propagation

• Used for merging nodes• Word sense disambiguation

• owl:equivalentClass, rdfs:subClassOF• WordNet, Babelnet, Dolce+DnS Ultra Lite (DUL),

Wordnet Supersenses

Page 33: Fred sw jpaper2017

Other capabilities

• Multilingualism • 48 lanugages (translates through English) • Bing Translation APIs

• Textual annotation grounding• Annotations of text fragment to corresponding

graph elements• Earmark and NIF

Page 34: Fred sw jpaper2017

Outline

• FRED pipeline (implementation)

Page 35: Fred sw jpaper2017

FRED pipeline

BoxerTAGME

CoreNLP

REST

REST

TAGME

UKB

Babelfy

Page 36: Fred sw jpaper2017

Outline

• Quality, importance, impact (evaluation)

Page 37: Fred sw jpaper2017

Quality and impact

• Ability to perform specific knowledge extraction tasks

• Performance as a semantic middleware, by evaluating applications built on top of it

• Popularity in terms of reuse and citations by communities of developers

Page 38: Fred sw jpaper2017

Frame detection

Tool Precision Recall F-ScoreFRED 75.320 57.519 65.227

Semafor 75.325 74.797 75.060

Valentina Presutti, Francesco Draicchio, and Aldo Gangemi. Knowledge extraction based on discourse representation theory and linguistic frames. EKAW 2012. https://link.springer.com/chapter/10.1007%2F978-3-642-33876-2_12DOI:10.1007/978-3-642-33876-2_12.

Page 39: Fred sw jpaper2017

Knowledge extraction tasks: tool comparisonTool/Task Topics NER NE-RS TE TE-RS Senses Taxo Rel Roles Events Frames

+SRLAIDA – + + – – + – – – – –

Alchemy + + – + – + – + – – –Apache Stanbol – + + – – + – – – – –

CiceroLite – + + + + + – + + + +DB Spotlight – + + – – + – – – – –

FOX + + + + + + – – – – –FRED – + + + + + + + + + +NERD – + + – – + – – – – –Ollie – – – – – – – + – – –

Open Calais + + – – – + – – – + –PoolParty KD + – – – – – – – – – –

ReVerb – – – – – – – + – – –Semiosearch – – + – + – – – – – –

Tagme – + + + + – – – – – –Wikimeta – + – + + + – – – – –Zemanta – + – – – + – – – – –Aldo Gangemi: A Comparison of Knowledge Extraction Tools for the Semantic Web. ESWC 2013: 351-366. https://link.springer.com/chapter/10.1007/978-3-642-38288-8_24. DOI: 10.1007/978-3-642-38288-8_24

Page 40: Fred sw jpaper2017

FRED as a semantic middleware

• Sentiment analysis

Diego Reforgiato Recupero, Valentina Presutti, Sergio Consoli, Aldo Gangemi, Andrea Giovanni Nuzzolese: Sentilo: Frame-Based Sentiment Analysis. Cognitive Computation 7(2): 211-225 (2015). https://link.springer.com/article/10.1007/s12559-014-9302-z. DOI: 10.1007/s12559-014-9302-z

Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero:Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool. IEEE Comp. Int. Mag. 9(1): 20-30 (2014). http://ieeexplore.ieee.org/document/6710244/. DOI: 10.1109/MCI.2013.2291688

Page 41: Fred sw jpaper2017

• Extraction of link semantics

Valentina Presutti, Andrea Giovanni Nuzzolese, Sergio Consoli, Aldo Gangemi, Diego ReforgiatoRecupero: From hyperlinks to Semantic Web properties using Open Knowledge Extraction. Semantic Web 7(4): 351-378 (2016). http://content.iospress.com/articles/semantic-web/sw221.DOI: 10.3233/SW-160221

Page 42: Fred sw jpaper2017

• Automatic entity typing

Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, Paolo Ciancarini: Automatic Typing of DBpedia Entities. International Semantic Web Conference (1) 2012: 65-81. https://link.springer.com/chapter/10.1007/978-3-642-35176-1_5.DOI: 10.1007/978-3-642-35176-1_5

Page 43: Fred sw jpaper2017

• Knowledge reconciliation

Misael Mongiovì, Diego Reforgiato Recupero, Aldo Gangemi, Valentina Presutti, Sergio Consoli: Merging open knowledge extracted from text with MERGILO. Knowl.-Based Syst. 108: 155-167 (2016). http://www.sciencedirect.com/science/article/pii/S0950705116301034

Page 44: Fred sw jpaper2017

Community feedback (some)

• stackoverflow.com: http://tinyurl.com/qa2dyfj, http://tinyurl.com/o993scy

• answers.semanticweb.com:http://tinyurl.com/n6pzpot, http://tinyurl.com/kb8564w

• Wikipedia entry for Knowledge Extraction:http://en.wikipedia.org/wiki/Knowledge_extraction

• A study about dealing with Big Data by UN Economic Commission for Europe

http://tinyurl.com/ml6ystn

Page 45: Fred sw jpaper2017

Outline

• Conclusion

Page 46: Fred sw jpaper2017

Conclusion

• The final goal of FRED is to support natural language understanding

• Currently, FRED responds to a very important challenge: to leverage existing NLP approaches to obtain a unified, formal knowledge graph including both facts and concepts, expressed by a natural language text.

Page 47: Fred sw jpaper2017

Issues and Open challenges

• Partial accuracy of NLP components may lead to global errors

N/V POS tagging

Complex multiword extraction

Coordination

Plural co-reference

David Moyes shares Manchester United fans' frustration

Myeloid hepatosplenomegaly is an enlargement of liver and kidney due to myelofibrosis

Aristotle was a Greek philosopher, a student of Plato and teacher of Alexander the Great

When Carol helps Bob and Bob helps Carol, they can accomplish any task.

Page 48: Fred sw jpaper2017

Issues and Open challenges

• Semantics besides literal interpretation• The path descended abruptly• The road runs along the coast for two hours• The fence zigzags from the plateau to the valley• The highway crawls through the city• The road leads us to Bordeaux

• Need for type coercion to satisfy hidden frame• Sometimes an inversion of roles: the path descends

because it can be descended• Road is an object that “can be followed as an indication”

to our destination• Fence is an object whose shape “can be followed by

zigzaging”• Highway is a path that “can be crawled”, therefore the

crawling frame here is descriptive of a state, not of an action

Page 49: Fred sw jpaper2017

Issues and Open challenges

• The recognition of cultural framing out of real world facts as in political discourse

• requires the extraction, representation, and reasoning over high-level frames (attitudes, values, metaphors)

Page 50: Fred sw jpaper2017

Issues and Open challenges

• The accurate extraction of implicit discourse relations and conventional implicatures• requires background knowledge and reasoning

on that knowledge in a way close to the appropriate discourse level

Page 51: Fred sw jpaper2017

References

In academic publication, for reference to FRED please cite:

Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Nuzzolese, Francesco Draicchio, Misael Mongiovì. Semantic Web, vol. Preprint, no. Preprint, pp. 1-21, 2016. http://content.iospress.com/articles/semantic-web/sw240. DOI: 10.3233/SW-160240

Other relevant references related to the FRED project:

Aldo Gangemi, Andrea G. Nuzzolese, Valentina Presutti, and Diego Reforgiato Recupero. Adjective semantics in open knowledge extraction. In FOIS 2016, pp.167-180. http://ebooks.iospress.nl/volumearticle/44244. DOI: 10.3233/978-1-61499-660-6-167

Aldo Gangemi: A Comparison of Knowledge Extraction Tools for the Semantic Web. ESWC 2013: 351-366. https://link.springer.com/chapter/10.1007/978-3-642-38288-8_24. DOI: 10.1007/978-3-642-38288-8_24

Valentina Presutti, Francesco Draicchio, and Aldo Gangemi. Knowledge extraction based on discourse representation theory and linguistic frames. EKAW 2012. https://link.springer.com/chapter/10.1007%2F978-3-642-33876-2_12.DOI:10.1007/978-3-642-33876-2_12 .

Page 52: Fred sw jpaper2017

References

Diego Reforgiato Recupero, Valentina Presutti, Sergio Consoli, Aldo Gangemi, Andrea Giovanni Nuzzolese: Sentilo: Frame-Based Sentiment Analysis. Cognitive Computation 7(2): 211-225 (2015). https://link.springer.com/article/10.1007/s12559-014-9302-z. DOI: 10.1007/s12559-014-9302-z

Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero: Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool. IEEE Comp. Int. Mag. 9(1): 20-30 (2014). http://ieeexplore.ieee.org/document/6710244/. DOI: 10.1109/MCI.2013.2291688

Valentina Presutti, Andrea Giovanni Nuzzolese, Sergio Consoli, Aldo Gangemi, Diego ReforgiatoRecupero: From hyperlinks to Semantic Web properties using Open Knowledge Extraction. Semantic Web 7(4): 351-378 (2016). http://content.iospress.com/articles/semantic-web/sw221. DOI: 10.3233/SW-160221

Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, Paolo Ciancarini: Automatic Typing of DBpedia Entities. International Semantic Web Conference (1) 2012: 65-81. https://link.springer.com/chapter/10.1007/978-3-642-35176-1_5. DOI: 10.1007/978-3-642-35176-1_5

Misael Mongiovì, Diego Reforgiato Recupero, Aldo Gangemi, Valentina Presutti, Sergio Consoli: Merging open knowledge extracted from text with MERGILO. Knowl.-Based Syst. 108: 155-167 (2016). http://www.sciencedirect.com/science/article/pii/S0950705116301034