Upload
valentina-presutti
View
100
Download
0
Embed Size (px)
Citation preview
Semantic Web Machine Reading with FRED
STLab, ISTC-CNR (Rome/Catania, Italy)
STLab team
Semantic Web machine reading with FRED. Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Nuzzolese, Francesco Draicchio, Misael Mongiovì. Semantic Web, vol. Preprint, no. Preprint, pp. 1-21, 2016.http://content.iospress.com/articles/semantic-web/sw240DOI: 10.3233/SW-160240
Outline
• Overview of FRED
• Main mappings and heuristics (the method behind)
• FRED pipeline (implementation)
• Quality, importance, impact (evaluation)
• Conclusion
Outline
• Overview of FRED
FRED: machine reader
http://wit.istc.cnr.it/stlab-tools/fred
http://wit.istc.cnr.it/stlab-tools/fred/fredlib
FRED: machine reader
John Coltrane played with Miles Davis in Kind of Blue
domain and task independentsemantic middleware
Formal knowledge graph from text
OWL semantics+
OWL n-ary relation pattern
Valentina gave Aldo a book by Charlie Mingus
FRED graph for
Valentina gave Aldo a book by Charlie Mingus
(ROOT(S
(NP (NNP John) (NNP Coltrane))(VP (VBD played)
(PP (IN with)(NP (NNP Miles) (NNP Davis)))
(PP (IN in)(NP
(NP (NNP Kind))(PP (IN of)
(NP (NNP Blue))))))))
root(ROOT-0, played-3)nn(Coltrane-2, John-1)nsubj(played-3, Coltrane-2)nn(Davis-6, Miles-5)prep_with(played-3, Davis-6)prep_in(played-3, Kind-8)prep_of(Kind-8, Blue-10)
___________________________ _____________ |x0 x1 x2 x3 | |e4 | |...........................| |.............| (|named(x0,john_coltrane,per)|A|play(e4) |)|named(x1,kind,loc) | |Actor1(e4,x0)| |named(x2,blue,loc) | |of(x1,x2) | |named(x3,miles_davis,loc) | |in(e4,x1) | |___________________________| |Actor2(e4,x3)|
|_____________|
Boxer’s DRT boxingStanford’s dependency parsing,
coreference resolution
Tagme’s linkingBabelfy’s (or UKB’s) linking
Language recogniser +BING off the shelf translator
Multiple NLP tools
Semantic web resources
• DBpedia
• schema.org
• WordNet-RDF
• BabelNet
• VerbNet-RDF
• FrameNet-OWL
• OntoWordNet
• DOLCE-Zero
• Ontology Design Patterns
• NIF
• Earmark
Main contributions of FRED
• A novel form of machine reading: from text to knowledge graph
• A method for integrating multiple NLP task outputs into a knowledge graph
• Formal mappings between different theories (implemented as a set of heuristics)
• Ontology design patterns for the Semantic Web• Reference cognitive semantics for interpretation
i.e. Frame Semantics
Outline
• Main mappings and heuristics (the method behind)
Main mappings and heuristics
• From DRS to RDF/OWL
• Tense, modality and negation
• Compositional semantics, taxonomy induction and quality representation
• Periphrastic relations
From DRS to RDF/OWL
• The core of FRED takes Discourse Representation Structures as input (based on Hans Kamp’s DRT)
• DRS informally called boxes (graphical representation)
• Discourse referents and conditions (their interpretation)
• Subset of FOL, arbitrary entities, Neo-Davidsonian events
• DRSs can become very complex as natural lanuguageexpression are rarely simple
• Boxer
People love movies
DRS basic patterns
1. Boxes only have a syntactic role in Boxer’s result, meaning that FRED only needs to focus on representing their content and linking them to the rest
2. Boxes provide a unified relation to a complex state of affair (usually expressed in the text by a copula), meaning that they have their own semantics to be represented
Box patterns 1)
People love movies
Box patterns 1
Valentina is happy
Valentina is a researcher
Box patterns 2)
Valentina is Gianni’s fifth daughter, from his second marriage
Complex boxes
• Boxes can be composed based on and, or, entails
• They can be nested, negated, contain several events or none
• Vocabularies: • Boxer taxonomy of types (boxer:)• Situations, formal relations and other properties
(boxing:)
Tense, modality and negationTense: Time intervals + Allen’s algebra
Modality and negation: lightweight RDF semantics
Rahm Emanuel says he won’t resign over police shooting
Lightweight negation, why?
• Ambiguous negation scope, same syntactic structure
John did not go to school by car
¬∃e(go(e, John, s, c)⋀ Event(e)⋀ School(s) ⋀Car (c))
∃e(go(e, John, s, c) ⋀ Event(e)⋀School(s)⋀¬Car(c)
Negation: lightweight representation
• FRED only annotates the referenced event with the information that its truth value is false
• boxing:hasTrueValue
• No assumption is made on the impact of the negation over logical quantification and scope
Lightweight negation representation: limits and benefits
• Limit: no triggering of any automatic reasoning
• Benefit: hook to the negated event for client that can apply their desired interpretation
Negation
Rahm Emanuel says he won’t resign over police shooting
Modality: lightweight representation
• No formal constructs in OWL for modality
• boxing:Modality• boxing:Necessary (will, must, should, etc.)• boxing:Possible (may, might, etc.)
:resign_1 boxing:hasTruthValue boxing:False ;
boxing:hasModality boxing:Necessary .
“will not resign”
Negation
Rahm Emanuel says he won’t resign over police shooting
Compositional semantics, taxonomy induction, and quality representation
• Compound terms: FRED recognises two main patterns
• Dependency relation• Co-referent predicates
• FRED creates a new class e.g. PoliceShooting
• The modifier ”police” of the main concept (“shooting) provides a distinguished feature to the more specific class (differentia specifica)
Rahm Emanuel says he won’t resign over police shooting
Compositional semantics: nouns as differentia specifica
Rahm Emanuel says he won’t resign over police shooting
fred:PoliceShootingrdfs:subClassOf fred:Shooting ;dul:associatedWith fred:Police .
Compositional semantics: adjectives as differentia specifica
• Adjectives have an unpredictable and unstable semantics (Morzycki’s sectivity)
• Four main representation patterns implemented based on an extension of Morzycky’s theory
Aldo Gangemi, Andrea G. Nuzzolese, Valentina Presutti, and Diego Reforgiato Recupero. Adjective semantics in open knowledge extraction. In FOIS 2016, pp.167-180.http://ebooks.iospress.nl/volumearticle/44244. DOI: 10.3233/978-1-61499-660-6-167
• Canadian surgeon (intersective):surgeon_1 a :CanadianSurgeon .:CanadianSurgeon rdfs:subClassOf :Surgeon .:surgeon_1 dul:hasQuality :Canadian .:CanadianSurgeon hasQuality :Canadian .
• Skilful surgeon (subsective):surgeon_1 a :SkilfulSurgeon .:SkilfulSurgeon rdfs:subClassOf :Surgeon .:SkilfulSurgeon hasQuality :Skilful .
• Alleged surgeon (modal):surgeon_1 a :AllegedSurgeon .:AllegedSurgeon dul:AssociatedWith :Surgeon .:AllegedSurgeon hasModality :Alleged .
• Fake surgeon (privative):surgeon_1 a :FakeSurgeon .:AllegedSurgeon dul:AssociatedWith :Surgeon .:AllegedSurgeon hasQuality :Fake .
Compositional semantics: adjectives as differentia specifica
Periphrastic relations
• Relations expressed with prepositions (of, with, for, in, etc.)
• Naming object properties with preposition is meaningless• sister of, survivor of
• FRED identifies the noun to be assciated with the preposition and puts it before it to form a meaningful label
fred:survivor_1fred:survivorOf fred:expedition_1;rdf:type fred:Survivor .
fred:expedition_1 rdf:type fred:Expedition .
Additional NLP integrations
• Named Entity Recognition• Used for identifying elements that should have a
corresponding OWL individual in the knowledge graph• Entity linking
• owl:sameAs axioms (DBpedia)• Co-reference resolution and role propagation
• Used for merging nodes• Word sense disambiguation
• owl:equivalentClass, rdfs:subClassOF• WordNet, Babelnet, Dolce+DnS Ultra Lite (DUL),
Wordnet Supersenses
Other capabilities
• Multilingualism • 48 lanugages (translates through English) • Bing Translation APIs
• Textual annotation grounding• Annotations of text fragment to corresponding
graph elements• Earmark and NIF
Outline
• FRED pipeline (implementation)
FRED pipeline
BoxerTAGME
CoreNLP
REST
REST
TAGME
UKB
Babelfy
Outline
• Quality, importance, impact (evaluation)
Quality and impact
• Ability to perform specific knowledge extraction tasks
• Performance as a semantic middleware, by evaluating applications built on top of it
• Popularity in terms of reuse and citations by communities of developers
Frame detection
Tool Precision Recall F-ScoreFRED 75.320 57.519 65.227
Semafor 75.325 74.797 75.060
Valentina Presutti, Francesco Draicchio, and Aldo Gangemi. Knowledge extraction based on discourse representation theory and linguistic frames. EKAW 2012. https://link.springer.com/chapter/10.1007%2F978-3-642-33876-2_12DOI:10.1007/978-3-642-33876-2_12.
Knowledge extraction tasks: tool comparisonTool/Task Topics NER NE-RS TE TE-RS Senses Taxo Rel Roles Events Frames
+SRLAIDA – + + – – + – – – – –
Alchemy + + – + – + – + – – –Apache Stanbol – + + – – + – – – – –
CiceroLite – + + + + + – + + + +DB Spotlight – + + – – + – – – – –
FOX + + + + + + – – – – –FRED – + + + + + + + + + +NERD – + + – – + – – – – –Ollie – – – – – – – + – – –
Open Calais + + – – – + – – – + –PoolParty KD + – – – – – – – – – –
ReVerb – – – – – – – + – – –Semiosearch – – + – + – – – – – –
Tagme – + + + + – – – – – –Wikimeta – + – + + + – – – – –Zemanta – + – – – + – – – – –Aldo Gangemi: A Comparison of Knowledge Extraction Tools for the Semantic Web. ESWC 2013: 351-366. https://link.springer.com/chapter/10.1007/978-3-642-38288-8_24. DOI: 10.1007/978-3-642-38288-8_24
FRED as a semantic middleware
• Sentiment analysis
Diego Reforgiato Recupero, Valentina Presutti, Sergio Consoli, Aldo Gangemi, Andrea Giovanni Nuzzolese: Sentilo: Frame-Based Sentiment Analysis. Cognitive Computation 7(2): 211-225 (2015). https://link.springer.com/article/10.1007/s12559-014-9302-z. DOI: 10.1007/s12559-014-9302-z
Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero:Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool. IEEE Comp. Int. Mag. 9(1): 20-30 (2014). http://ieeexplore.ieee.org/document/6710244/. DOI: 10.1109/MCI.2013.2291688
• Extraction of link semantics
Valentina Presutti, Andrea Giovanni Nuzzolese, Sergio Consoli, Aldo Gangemi, Diego ReforgiatoRecupero: From hyperlinks to Semantic Web properties using Open Knowledge Extraction. Semantic Web 7(4): 351-378 (2016). http://content.iospress.com/articles/semantic-web/sw221.DOI: 10.3233/SW-160221
• Automatic entity typing
Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, Paolo Ciancarini: Automatic Typing of DBpedia Entities. International Semantic Web Conference (1) 2012: 65-81. https://link.springer.com/chapter/10.1007/978-3-642-35176-1_5.DOI: 10.1007/978-3-642-35176-1_5
• Knowledge reconciliation
Misael Mongiovì, Diego Reforgiato Recupero, Aldo Gangemi, Valentina Presutti, Sergio Consoli: Merging open knowledge extracted from text with MERGILO. Knowl.-Based Syst. 108: 155-167 (2016). http://www.sciencedirect.com/science/article/pii/S0950705116301034
Community feedback (some)
• stackoverflow.com: http://tinyurl.com/qa2dyfj, http://tinyurl.com/o993scy
• answers.semanticweb.com:http://tinyurl.com/n6pzpot, http://tinyurl.com/kb8564w
• Wikipedia entry for Knowledge Extraction:http://en.wikipedia.org/wiki/Knowledge_extraction
• A study about dealing with Big Data by UN Economic Commission for Europe
http://tinyurl.com/ml6ystn
Outline
• Conclusion
Conclusion
• The final goal of FRED is to support natural language understanding
• Currently, FRED responds to a very important challenge: to leverage existing NLP approaches to obtain a unified, formal knowledge graph including both facts and concepts, expressed by a natural language text.
Issues and Open challenges
• Partial accuracy of NLP components may lead to global errors
N/V POS tagging
Complex multiword extraction
Coordination
Plural co-reference
David Moyes shares Manchester United fans' frustration
Myeloid hepatosplenomegaly is an enlargement of liver and kidney due to myelofibrosis
Aristotle was a Greek philosopher, a student of Plato and teacher of Alexander the Great
When Carol helps Bob and Bob helps Carol, they can accomplish any task.
Issues and Open challenges
• Semantics besides literal interpretation• The path descended abruptly• The road runs along the coast for two hours• The fence zigzags from the plateau to the valley• The highway crawls through the city• The road leads us to Bordeaux
• Need for type coercion to satisfy hidden frame• Sometimes an inversion of roles: the path descends
because it can be descended• Road is an object that “can be followed as an indication”
to our destination• Fence is an object whose shape “can be followed by
zigzaging”• Highway is a path that “can be crawled”, therefore the
crawling frame here is descriptive of a state, not of an action
Issues and Open challenges
• The recognition of cultural framing out of real world facts as in political discourse
• requires the extraction, representation, and reasoning over high-level frames (attitudes, values, metaphors)
Issues and Open challenges
• The accurate extraction of implicit discourse relations and conventional implicatures• requires background knowledge and reasoning
on that knowledge in a way close to the appropriate discourse level
References
In academic publication, for reference to FRED please cite:
Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero, Andrea Nuzzolese, Francesco Draicchio, Misael Mongiovì. Semantic Web, vol. Preprint, no. Preprint, pp. 1-21, 2016. http://content.iospress.com/articles/semantic-web/sw240. DOI: 10.3233/SW-160240
Other relevant references related to the FRED project:
Aldo Gangemi, Andrea G. Nuzzolese, Valentina Presutti, and Diego Reforgiato Recupero. Adjective semantics in open knowledge extraction. In FOIS 2016, pp.167-180. http://ebooks.iospress.nl/volumearticle/44244. DOI: 10.3233/978-1-61499-660-6-167
Aldo Gangemi: A Comparison of Knowledge Extraction Tools for the Semantic Web. ESWC 2013: 351-366. https://link.springer.com/chapter/10.1007/978-3-642-38288-8_24. DOI: 10.1007/978-3-642-38288-8_24
Valentina Presutti, Francesco Draicchio, and Aldo Gangemi. Knowledge extraction based on discourse representation theory and linguistic frames. EKAW 2012. https://link.springer.com/chapter/10.1007%2F978-3-642-33876-2_12.DOI:10.1007/978-3-642-33876-2_12 .
References
Diego Reforgiato Recupero, Valentina Presutti, Sergio Consoli, Aldo Gangemi, Andrea Giovanni Nuzzolese: Sentilo: Frame-Based Sentiment Analysis. Cognitive Computation 7(2): 211-225 (2015). https://link.springer.com/article/10.1007/s12559-014-9302-z. DOI: 10.1007/s12559-014-9302-z
Aldo Gangemi, Valentina Presutti, Diego Reforgiato Recupero: Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool. IEEE Comp. Int. Mag. 9(1): 20-30 (2014). http://ieeexplore.ieee.org/document/6710244/. DOI: 10.1109/MCI.2013.2291688
Valentina Presutti, Andrea Giovanni Nuzzolese, Sergio Consoli, Aldo Gangemi, Diego ReforgiatoRecupero: From hyperlinks to Semantic Web properties using Open Knowledge Extraction. Semantic Web 7(4): 351-378 (2016). http://content.iospress.com/articles/semantic-web/sw221. DOI: 10.3233/SW-160221
Aldo Gangemi, Andrea Giovanni Nuzzolese, Valentina Presutti, Francesco Draicchio, Alberto Musetti, Paolo Ciancarini: Automatic Typing of DBpedia Entities. International Semantic Web Conference (1) 2012: 65-81. https://link.springer.com/chapter/10.1007/978-3-642-35176-1_5. DOI: 10.1007/978-3-642-35176-1_5
Misael Mongiovì, Diego Reforgiato Recupero, Aldo Gangemi, Valentina Presutti, Sergio Consoli: Merging open knowledge extracted from text with MERGILO. Knowl.-Based Syst. 108: 155-167 (2016). http://www.sciencedirect.com/science/article/pii/S0950705116301034