Upload
donald-barnett
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Human Language Technology in Ontology
Engineering
Ontology Learning from Text
Paul Buitelaar DFKI GmbH
Language Techology LabDFKI Competence Center Semantic Web
Saarbrücken, Germany
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Overview
HLT and Ontology Engineering
Automated Linguistic Analysis
Ontology Learning from Text
Further Issues: Evaluation
Conclusions
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Ontology Lifecycle
Creating
Populating
Validating
Evolving
Maintaining
Deploying
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
HLT in the Ontology Lifecycle
Ontology(Knowledge)
Ontology Learning
Development & EvolutionLinguistic Analysis
to Extract Classes / Relations
Ontology Population
Knowledge Base GenerationLinguistic Analysis
to Extract Instances
Instances
Documents(Text)
HLT for Ontology Learning and Population from Text
Human Language Technology = Automated Linguistic Analysis
Classes,Relations/Properties
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Linguistic Analysis: Example
The Dell computer with a flat screen had to be rejected because of a failure in the motherboard.
Dell computerflat screen
motherboard
has-a
has-a
reject
failurelocation-of
animate-entity
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Levels of Linguistic Analysis
Lexical Analysis Word Class: Part-of-Speech (also Semantic Class) Word Structure: Morphology
Phrase Analysis Sentence Structure: Phrases (if ‘shallow’: Chunks) Semantic Units
Dependency Structure Analysis Sentence Meaning: Predicate Argument Structure (Clause) Semantic Structure
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Part-of-Speech, Morphology
Part-of-Speech e.g.: noun, verb, adjective, preposition, … PoS tag sets may have between 10 and 50 (or more) tags
Morphology Most languages have inflection and declination, e.g.:
Singular/Plural computer, computers Present/Past reject, rejected
Many languages have also complex (de)composition, e.g.:
Flachbildschirm (flat screen) > flach + Bildschirm> flach + Bild + Schirm
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Phrases, Terms, Named Entities
Semantic Units Phrases (e.g. nominal - NP, prepositional - PP)
NP a flat screenPP with a flat screenNP (recursive) the Dell computer with a flat
screen a failure in the motherboard
Terms (domain-specific phrases)Dell computer
Dell computer with a flat screen
Named Entities (phrases corresponding to dates, names, …)
COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Dependency Structure (I)
Semantic Structure Dependencies between Predicates and Arguments
the Dell computer with a flat screen had to be rejected
PRED: rejectARG1: ENTITYARG2: ‘the Dell computer with a flat screen’
‘Logical Form’ : reject(x,y) & animate-entity(x) & computer(y) & …
Dependency Structure Analysis is based on:
Sub-categorization Frames
reject :: Subj:NP, Obj:NP
Selection Restrictions
reject :: Subj:NP:ANIMATE-ENTITY, Obj:NP:ENTITY
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Dependency Structure (II)The Dell computer that has been rejected was claimed to have suffered from handling.
reject(e1,x1,y1) & animate-entity(x1) & Dell_computer(y1) & claim(e2,x2,e3) & animate-entity(x2) & suffer_from(e3,y1,y2) & handling (y2)
PRED claim < NULL, XCOMP >
SUBJ y1
XCOMP
PRED computer
MOD Dell
ADJUNCTPRED reject < NULL, SUBJ >
PRED suffer < SUBJ, OBL-from >
SUBJ y1
SUBJ y1
OBL-from handling
claim
y1
Dell reject
suffer
y1
y1handling
SUBJ
SUBJ
XCOMP
MOD ADJUNCT OBL-from
SUBJ
y1 : computer
Lexical Functional Grammar (LFG)
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Some History
Lexical Knowledge Extraction Extraction of lexical semantic representations (word meaning) from Machine Readable Dictionaries – 70‘s/80‘s Extraction of semantic lexicons from corpora for Information Extraction systems - 80‘s/90‘s, e.g. CRYSTAL (Soderland) Answer extraction in Question Answering, e.g. Webclopedia (Hovy)Thesaurus Extraction Similar work, (complex, multilingual) term extraction e.g. Sextant (Grefenstette); DR-Link (Liddy)
Ontology Learning from Text Similar work, (domain-specific) term / relation extraction e.g. TextToOnto (Maedche & Staab), OntoLearn (Velardi et al.) Discussed here: OntoLT (Buitelaar, Olejnik & Sintek)
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
OntoLearn
Domain-Specific WordNet Tuning and Extension
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
OntoLT: Some Background
Ontology Learning from Text Taxonomy Extraction, Document Clustering
String-based, Document Level
“Unnamed” Relation Extraction, Word ClusteringStemming & Part-of-Speech, Token Level
Extraction of Terms, “Named” RelationsPred-Arg & Head-Mod Structure, Term Level
TextToOnto
OntoLearn
Text in Ontology Engineering Textual Grounding of Concepts
Retain Linguistic Contexts and Realizations
Text-based Ontology MonitoringCompare Language Use over Time
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
OntoLT: Some Background
Ontology Learning from Text Taxonomy Extraction, Document Clustering
String-based, Document Level
“Unnamed” Relation Extraction, Word ClusteringStemming & Part-of-Speech, Token Level
Extraction of Terms, “Named” RelationsPred-Arg & Head-Mod Structure, Term Level
Text in Ontology Engineering Textual Grounding of Concepts
Retain Linguistic Contexts and Realizations
Text-based Ontology MonitoringCompare Language Use over Time
OntoLT
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
OntoLT
What is it?OntoLT provides a middleware solution in ontology development that enables the ontology engineer to bootstrap or extend a domain-specific ontology from a relevant text collection
How does it work?1. automatic linguistic annotation2. automatic statistical preprocessing 3. interactive definition of mapping rules4. interactive user validation of candidates5. automatic integration into an ontology
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
OntoLT: Architecture
AnnotatedCorpus(XML)
Mappings
XML (Linguistic Structure) <=>
Protégé (Classes, Slots)
Extraction
Protégé
Edit Extracted Ontology
Corpus
Definitionof Mappings
LinguisticAnnotation
ExtractedOntology
OntoLT
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
<sentence … >…
<text> … <text>
<phrases> … <phrases>
<clauses> … </clauses>
</sentence>
<text> … <token id="t5" pos="ADJA" str="mittlere"> <lemma id="t5.l1">mittler</lemma> </token> <token id="t6" pos="NN" str="Patellarsehnendrittel"> <lemma id="t6.l1">patellar</lemma> <lemma id="t6.l2">Sehne</lemma> <lemma id="t6.l3">Drittel</lemma> </token> …
Linguistic Annotation
<phrases> … <phrase id="p2" from="t5" to="t6" type="NP"> <mod from="t5" to="t5" /> <head from="t6" to="t6" /> </phrase> … </phrases>
<clauses> <clause id="cl1" from="p1" to="p5" pred="p5" type="pass"> <arg id="a1" type="SUBJ" phrase="none" /> <arg id="a2" type="IOBJ" phrase="p1"/> <arg id="a3" type="DOBJ" phrase="p2" /> </clause> </clauses>
mittlere Patellarsehnendrittel(mid patellar ligament third)
An 40 Kniegelenkpräparaten wurden mittlere Patellarsehnendrittel mit einer neuen Knochenverblockungstechnik in einem zweistufigen Bohrkanal femoral fixiert.
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Mapping Rules
Precondition LanguageVar (Y, XPath (Y)) Get all occurrences of element Y,
e.g. HeadNoun, Modifier, Subject, …ConcatConcatList
combined through AND, OR, NOT, EQUAL
OperatorsCreateCls create a new class with super-classAddSlot add a slot with range to a new or existing classCreateInst introduce an instance for a new or existing classFillSlotset the value of a slot of an instance
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Mapping Rules
Precondition LanguageVar (Y, XPath (Y)) Get all occurrences of element Y,
e.g. HeadNoun, Modifier, Subject, …ConcatConcatList
combined through AND, OR, NOT, EQUAL
OperatorsCreateCls create a new class with super-classAddSlot add a slot with range to a new or existing classCreateInst introduce an instance for a new or existing classFillSlotset the value of a slot of an instance
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Example Experiment
Ontology Extraction for Neurology Neurology Section of a Medical Corpus Medical Scientific Journal Abstracts – MuchMore Project
XML-based Linguistic Annotation PoS, Lemmatization, Phrases, Pred-Arg Structure
Statistical Preprocessing (chi-square) Select Domain-Relevant Linguistic Entities
Definition of Mapping Rules Define Operators for Selected Linguistic Entities
Generate & Validate Class/Slot Candidates Select Candidates for Integration in Neurology Ontology
Generate “Ontology Fragments” for Neurology
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Further Issues
Future Development Organization of Class/Slot Candidate List
Inference & Clustering - “Graph Restructuring” Extend Statistical Preprocessing
Multiple Reference CorporaExtended Frequency Information
Include Machine Learning ApproachSemi-Automatic Definition of Mapping
RulesPerformance Evaluation Guidelines
ECAI04 Workshop on OLP Benchmark
Challenge within PASCAL NoE
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Evaluation: What? -- Subtasks Classes
(Multilingual) Term ExtractionNamed-Entity RecognitionSimilarity ThesaurusTerm,Document Clustering
Class-Hierarchy (Taxonomy) Thesaurus ExtractionTerm,Document Clustering
Class-Properties (Relations)Relation Extraction? Formal Properties of Relations (Properties)
Class-Instances (Individuals)(Multilingual) Term ExtractionNamed-Entity RecognitionTerm,Document Classification
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Evaluation: How?
By Sub-Task – Evaluation of: Classes – Term,NE Extraction,Clustering
Class-Hierarchy – Thesaurus Extraction
Class-Properties – Relation Extraction
Class-Instances – Term,NE Extraction,Classification
By Application – Evaluation of: Ontology Learning and Population – Gold Standard
IR,QA – Precision /Recall Increase with Ontology?
Interactive QA – Increased User Satisfaction?
Information Access – Increased User Performance?
© Paul Buitelaar: KnowledgeWeb Summer School, Spain - July 2004
Conclusions
Stay Tuned
OntoLT Release
To be Announced on Protégé-Discussion List
http://protege.stanford.edu/mailing-lists
Evaluation
Ontology Learning & Population (OLP) Challenge
Within PASCAL NoE - First Task Spring 2005
ECAI04 Workshop: Evaluation of Text-based OLP
http://olp.dfki.de/ECAI04/cfp.htm