28
A Domain Ontology Engineering Tool with General Ontologies and Text Corpus Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

  • Upload
    pearly

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus. Naoki Sugiura, Masaki Kurematsu, Naoki Fukuta, Naoki Izumi, & Takahira Yamaguchi. DODDLE and DODDLE II. Domain Ontology rapiD DeveLopmet Environment Builds taxonomic and non-taxonomic relationships - PowerPoint PPT Presentation

Citation preview

Page 1: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Naoki Sugiura, Masaki Kurematsu,

Naoki Fukuta,Naoki Izumi, &

Takahira Yamaguchi

Page 2: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

DODDLE and DODDLE II

Domain Ontology rapiD DeveLopmet Environment

Builds taxonomic and non-taxonomic relationships

Uses dictionary approach and text corpus (body) to build relationships

Page 3: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

DODDLE & DODDLE II

Large Ontologies are difficult to build by hand

Locates relationships between words based on context similarities; even if separated

Disadvantages Human Interaction is still required Low amount of success

Page 4: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

DODDLE vs DODDLE II

DODDLE only works on taxonomic relationships

DODDLE II Extension of DODDLE Finds non-taxonomic relationships

Page 5: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Outline

Overview Taxonomic Relationships Non-Taxonomic Relationships Case Studies Problems/Future Work Conclusion Assessment

Page 6: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

OverviewDomain Terms

Domain Specific Text Corpus

Domain Specific Text Corpus

Concept Extraction

Module

NTRL ModuleTRA Module

Page 7: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Overview TRA Module

Matched Result Analysis

Trimmed Result Analysis

Modification using syntactic strategies

Taxonomic Relationship

MRD(Wordnet)

MRD(Wordnet)

Page 8: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Overview NTRL Module

Extraction of frequent words

WordSpace creation

Extraction of similar concept pairs

Non-Taxonomic Relationship

Concept specification templates

Domain Specific Text Corpus

Domain Specific Text Corpus

Page 9: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

OverviewOverview

Taxonomic RelationshipNon-Taxonomic

Relationship

Interaction Module

Page 10: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

TRA Module

Matched Result Analysis

Trimmed Result Analysis

Modification using syntactic strategies

Taxonomic Relationship

MRD(Wordnet)

MRD(Wordnet)

Page 11: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

TRA

Matched Result Analysis Constructs PAB and STM

Trimmed Result Analysis Remove unnecessary nodes

Modification using statistical strategies Allows for human input

Page 12: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

PAB and STM

Page 13: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

TRA

Page 14: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

NTRL Module

Extraction of frequent words

WordSpace creation

Extraction of similar concept pairs

Non-Taxonomic Relationship

Concept specification templates

Domain Specific Text Corpus

Domain Specific Text Corpus

Page 15: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

NTRL

Extraction of key words Primitive: 4 words Collocation matrix

ai,j = fi before f j …f8 f4 f3 f7 f8f4 f1 f3 f4 f9 f2f5 f1 f7 f1 f5 …

…f8 f4 f3 f7 f8f4 f1 f3 f4 f9 f2f5 f1 f7 f1 f5 …

Page 16: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

NTRL

o WordSpace Creation Context Vectors Word Vectors

Sum of Context Vectors г(w)=∑ ( ∑ φ(f))

iε C(w) f close to i

A vector representation of a word of phrase w

a 4-gram vector of a 4 gram f

Appearance places of a word or phrase w

WordSpace is a collocation of г(w)

Page 17: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

NTRL

Extraction of Concept Pairs Each input has a best-matched “synset”

Synset: collection of word vectors Sum of the word vectors set to a concept which

corresponds with each input term Inner product of all combinations of concept

pairs Match is determined by user set threshold

Case Study: .87

Page 18: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

NTRL

Finding Association Rules Locates Rules of the form:

Page 19: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

NTRL

Constructing Concept Specification Templates Set of Similar concept pairs and

association rules DODDLE sets priorities between

concept pairs Based on TRA Module and Co-occurrence

information

Page 20: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Case Study

Law-“Contract for International Sale of Goods”

Business -“XML Common Business Library”

Support: 0.4 %Confidence: 80%

Page 21: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Law Case Study

Given 46 Concepts WordSpace: 77 concept pairs Association between input terms: 55

pairs or terms Templates

Page 22: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Business Case Study

Input: 57 terms Wordspace: 40 pairs Association between input terms:

39

Page 23: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Taxonomic Results

Bus. Precision Recall per path

Recall per subtree

Matched Result

.2 .29 .71

Trimmed Result

.22 .13 .5

Law Precision Recall per path

Recall per subtree

Matched Result

.25 .23 .19

Trimmed Result

.3 .3 .15

Page 24: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Non-taxonomic Results

Law WS AR Join of WS and AR

# Extracted Concept Pairs

77 55 117

# Accepted Concept Pairs

18 13 27

Precision .23 .24 .23

Recall .38 .27 .56

Bus. WS AR Join of WS and AR

# Extracted Concept Pairs

40 39 66

# Accepted Concept Pairs

30 20 39

Precision .75 .51 .59

Page 25: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Problems/ Future Work

Threshold Changes with each domain

Specification of a Concept Relation Still need to specify relationships

Ambiguity of Multiple Terminology “transmission” Semantic specialization of multi-definition

words needed. DODDLE-R

Uses RDF tags

Page 26: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Conclusion

Uses MRD and text corpus Two strategies for taxonomic: matched

result analysis and trimmed result analysis

Non-Taxonomic: extracted by co-occurrence information in text corpus

Concept Specification: a way to eliminate concept pairs to build an ontology

Page 27: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Assessment

Designed to be a tool No time results Determining thresholds is plug-and-

guess.

Page 28: A Domain Ontology Engineering Tool with General Ontologies and Text Corpus

Questions ?