Generic Ontology Learners on Application Domains · Motivations Probabilistic Ontology Learning...

Generic Ontology Learnerson Application Domains

Francesca Fallucchi1 Maria Teresa Pazienza 1

Fabio Massimo Zanzotto1

1DISPUniversity of Rome Tor Vergata

Rome, Italy{fallucchi,pazienza,zanzotto}@info.uniroma2.it

LREC 2010, Malta, May 2010

Motivations Probabilistic Ontology Learning Experimental Evaluation Conclusions

Motivation

Learning methods require large general corpora and knowl-edge repositoriesIn specific domains ontologies are extremely poorManually building ontologies is a very time consuming andexpensive taskAutomatically creating or extending ontologies needs largecorpora and existing structured knowledge to achieve rea-sonable performance

Motivation

ProblemsScarcity of domains covered by existing ontologiesNot relevant existing ontologies to expand for target domain

SolutionWe propose a model that can be used in different specificknowledge domains with a small effort for its adaptationOur model is learned from a generic domain that can beexploited to extract new informations in a specific domain

Motivation

ProblemsScarcity of domains covered by existing ontologiesNot relevant existing ontologies to expand for target domain

SolutionWe propose a model that can be used in different specificknowledge domains with a small effort for its adaptationOur model is learned from a generic domain that can beexploited to extract new informations in a specific domain

1 Motivations

2 Probabilistic Ontology LearningCorpus AnalysisA Probabilistic ModelLogistic Regression

3 Experimental EvaluationExperimental Set-UpAgreementResults

4 Conclusions and Future Works

Our Learner Model

Model exploits the information learned in a backgrounddomain for extracting information in an adaptation domainModel is based on the probabilistic formulationModel takes into consideration corpus-extracted evidencesover a list of training pairsModel is used to estimate the probabilities of the newinstances computing a new feature space

Corpus Analysis

context

... “dog” , as “animal” ...

instance

(dog, animal), 1as 1, as 1 Q

feature space

Corpus Analysisco

context

instance

(dog, animal)

, 1as 1, as 1 Q

feature space

Corpus Analysisco

context

instance

(dog, animal)

, 1as 1, as 1 Q

feature space

Corpus Analysisco

context

... “dog” , as “animal” ...fe

instance

(dog, animal),

feature space

Corpus Analysisco

context

... “dog” , as “animal” ...fe

instance

(dog, animal), 1as 1, as 1 Q

feature space

Corpus Analysisco

context

X1 f1 f2 Y1

(X1,Y1) (X2,Y2) ... ... (Xn,Yn)f1 • • • • •f2 • • • •f3 • • • •

• • • • • •• • • • • •

... • • •

... • • • •• • • •

• • • •fm • • • • •

Corpus Analysisco

context

X1 f1 f2 Y1

(X1,Y1) (X2,Y2) ... ... (Xn,Yn)f1 • • • • •f2 • • • •f3 • • • •

• • • • • •• • • • • •

... • • •

... • • • •• • • •

• • • •fm • • • • •

Corpus Analysisco

context

X1 f1 f2 Y1

(X1,Y1)

(X2,Y2) ... ... (Xn,Yn)

f1 •

• • • •

f2 •

• • •f3 • • • •

• • • • • •• • • • • •

... • • •

... • • • •• • • •

• • • •fm • • • • •

Corpus Analysisco

context

X1 f1 f2 Y1

(X1,Y1) (X2,Y2)

... ... (Xn,Yn)

f1 •

• • • •

f2 • •

• •

f3 •

• • •• • • • • •• • • • • •

... • • •

... • • • •• • • •

• • • •fm • • • • •

Corpus Analysisco

context

X1 f1 f2 Y1

(X1,Y1) (X2,Y2) ... ... (Xn,Yn)f1 • • • • •f2 • • • •f3 • • • •

• • • • • •• • • • • •

... • • •

... • • • •• • • •

• • • •fm • • • • •

Instances Matrixcontext

X1 f1 f2 Y1

Evidences Matrix

E = (−→e1 ...−→en )

(X1,Y1) (X2,Y2) ... ... (Xn,Yn)f1 • • • • •f2 • • • •f3 • • • •

• • • • •• • • • • •

... • • •

... • • • •• • • •

• • • •fm • • • • •

Instances Matrixcontext

X1 f1 f2 Y1

Evidences Matrix

E = (−→e1 ...−→en )

(X1,Y1) (X2,Y2) ... ... (Xn,Yn)f1 • • • • •f2 • • • •f3 • • • •

• • • • •• • • • • •

... • • •

... • • • •• • • •

• • • •fm • • • • •

A Probabilistic Model

Probabilistic model for learning ontologies form corporaOntology is seen as a set O of relations R over pairs Ri,j

If Ri,j is in O, i is a concept and j is one of its generalization

Goal: Estimate Posterior Probability

P(Ri,j ∈ O|E)

where E is a set of evidences extracted from corpus

Logistic Regression

LogitGiven two variables Y and X, the probability p of Y to be 1given that X = x is: p = P(Y = 1|X = x) and Y ∼ Bernoulli(p)

logit(p) = ln(

p1−p

logit(p) = β0 +β1x1 + ...+βkxk

Given regression coefficients the probability is

p(x) =exp(β0 +β1x1 + ...+βkxk)

1+ exp(β0 +β1x1 + ...+βkxk)

Logistic Regression

logit(p) = ln(

p1−p

)logit(p) = β0 +β1x1 + ...+βkxk

1+ exp(β0 +β1x1 + ...+βkxk)

Logistic Regression

logit(p) = ln(

p1−p

)logit(p) = β0 +β1x1 + ...+βkxk

1+ exp(β0 +β1x1 + ...+βkxk)

Estimating Regression Coefficients

We estimate the regressors β0,β1, ...,βk of x1, ...,xk withmaximal likelihood estimationlogit(p) = β0 +β1x1 + ...+βkxk

solving a linear problem

−−−−→logit(p) = Eβ

1 e11 e12 · · · e1n1 e21 e22 · · · e2n...

......

. . ....

1 em1 em2 · · · emn

Estimating Regression Coefficients

We estimate the regressors β0,β1, ...,βk of x1, ...,xk withmaximal likelihood estimationlogit(p) = β0 +β1x1 + ...+βkxk

solving a linear problem

−−−−→logit(p) = Eβ

1 e11 e12 · · · e1n1 e21 e22 · · · e2n...

......

. . ....

1 em1 em2 · · · emn

Background Ontology Learner

Using a logistic regressor based on the Moore-Penrosepseudo-inverse matrix (Fallucchi and Zanzotto, RANLP 2009)

β̂ = X+CB

where:X+

CBis the pseudo-inverse matrix of the evidences matrix

XCB obtained from a generic corpus CB

l is the logit vector (−−−−→logit(p))

Estimator for Application Domain

The logit of the testing pairs

l′ = αXCA β̂

where:α is a parameter used to adapt the model by the β vector tothe new domainXCA is the inverse evidence matrix obtained from anadaptation domain corpus CA

β̂ is the regressors vector

Then, step by step testing pairs probability

pi = exp(li)1+exp(li)

Estimator for Application Domain

The logit of the testing pairs

l′ = αXCA β̂

where:α is a parameter used to adapt the model by the β vector tothe new domainXCA is the inverse evidence matrix obtained from anadaptation domain corpus CA

β̂ is the regressors vectorThen, step by step testing pairs probability

pi = exp(li)1+exp(li)

1 Motivations

Experimental Set-Up

1 Target OntologiesTraining: pairs that are in hyperonym relation in WordNet==> about 600000 pairs of wordsTesting: pairs in Earth Observation Domain==> about 404 pairs of words

2 CorpusTraining: English Web as Corpus, ukWaC (Ferraresi,2008)==> about 2700000 web pagesTesting: corpus related to Earth Observation Domain==> about 8300 web pages

3 Feature Spacesbag-of-words and n-gramswindows: length 3 tokens==> about 280000 features

Annotators for Testing Pairs

Three annotators (A1, A2 and A3) to build three differentontologiesTwo annotators are expert in the domain (A1 and A2), thethird one is not (A3)A1 and A2 have different levels of expertise: A1 is a youngexpert in the domain and A2 an older oneEach annotator made a binary classification of 641 pairs ofwords in Earth Observation Domain

Only 404 pairs are found in Earth Observation Corpus

Annotators for Testing Pairs

Three annotators (A1, A2 and A3) to build three differentontologiesTwo annotators are expert in the domain (A1 and A2), thethird one is not (A3)A1 and A2 have different levels of expertise: A1 is a youngexpert in the domain and A2 an older oneEach annotator made a binary classification of 641 pairs ofwords in Earth Observation DomainOnly 404 pairs are found in Earth Observation Corpus

Evaluating the Quality of Annotations

Quality of the annotation procedure according tointer-annotation agreement among annotators

Pairwise AgreementInter-annotators agreement for each pair of annotatorsContigency table

Multi−π AgreementInter-annotators agreement for all annotators togetherAgreement table

Pairwise Agreement 404-annotation

A1yes no

yes 40 32 72A2

no 35 297 33275 329 404

pair1 = (A1 ,A2)

A1yes no

yes 65 54 119A3

no 10 275 28575 329 404

pair2 = (A1 ,A3)

A2yes no

yes 53 66 119A3

no 19 266 28572 332 404

pair3 = (A2,A3)

Table: Contingency tables for pairwise annotator agreement

Ao Ae kappa

pair1 = (A1,A2) 0.8341584 0.7023086 0.4429077pair2 = (A1,A3) 0.8415842 0.6291663 0.5728117pair3 = (A2,A3) 0.7896040 0.6322174 0.4279336

Table: pairwise agreement

Pairwise Agreement 404-annotation

A1yes no

yes 40 32 72A2

no 35 297 33275 329 404

pair1 = (A1 ,A2)

A1yes no

yes 65 54 119A3

no 10 275 28575 329 404

pair2 = (A1 ,A3)

A2yes no

yes 53 66 119A3

no 19 266 28572 332 404

pair3 = (A2,A3)

Table: Contingency tables for pairwise annotator agreement

Ao Ae kappa

pair1 = (A1 ,A2) 0.8341584 0.7023086 0.4429077pair2 = (A1 ,A3) 0.8415842 0.6291663 0.5728117pair3 = (A2 ,A3) 0.7896040 0.6322174 0.4279336

Table: pairwise agreement

Multi−π Agreement 404-annotation

pairs of words A1 A2 A3 Yes No

(forest,terra firma) 1 1 1 3 0(wind,process) 0 0 0 0 3(forest,object) 0 0 0 0 3(cloud,state) 0 1 0 1 2(soil,object) 0 1 1 2 1

(wind,breath) 0 0 0 0 3(wind,act) 0 0 0 0 3

(topography,geography) 1 1 1 3 0. . . . . . . . . . . . . . . . . .

TOTAL 75 72 119 266 (0.22) 946 (0.78)

Table: Agreement table

Multi-π agreementAo = 0.82382 Ae = 0.65739

kappa = 0.48577

Multi−π Agreement 404-annotation

pairs of words A1 A2 A3 Yes No

(forest,terra firma) 1 1 1 3 0(wind,process) 0 0 0 0 3(forest,object) 0 0 0 0 3(cloud,state) 0 1 0 1 2(soil,object) 0 1 1 2 1

(wind,breath) 0 0 0 0 3(wind,act) 0 0 0 0 3

(topography,geography) 1 1 1 3 0. . . . . . . . . . . . . . . . . .

TOTAL 75 72 119 266 (0.22) 946 (0.78)

Table: Agreement table

Multi-π agreementAo = 0.82382 Ae = 0.65739

kappa = 0.48577

Experiments

ObjectiveTo compute a model using both a background domain and anexisting ontology can be positively used to learn the isa relationin Earth Observation Domain.

We compare two systemsWN-System: existing hyperonym links in WordNetOur-System: our learner model

measuring their performance to replicate the three targetontologies produced by the three annotators

Experiments

ObjectiveTo compute a model using both a background domain and anexisting ontology can be positively used to learn the isa relationin Earth Observation Domain.

We compare two systemsWN-System: existing hyperonym links in WordNetOur-System: our learner model

measuring their performance to replicate the three targetontologies produced by the three annotators

Results

annotators recall precision f-measure

A1 0,36 0.184932 0,244344A2 0,305556 0,150685 0,201836A3 0,470588 0,383562 0,422642

Table: WN-System against the 3 annotators

A1 0,493333 0,253425 0,334842A2 0,4305556 0,212329 0,284404A3 0,4369748 0,356164 0,392453

Table: Our-System against the 3 annotators

Results

A1 0,36 0.184932 0,244344A2 0,305556 0,150685 0,201836A3 0,470588 0,383562 0,422642

Table: WN-System against the 3 annotators

A1 0,493333 0,253425 0,334842A2 0,4305556 0,212329 0,284404A3 0,4369748 0,356164 0,392453

Table: Our-System against the 3 annotators

1 Motivations

Conclusions

We propose a model adaptation strategy that use a back-ground domain to learn the isa relations in a specific do-mainExperiments show that this way of using a model identifiedin a background domain is helpful to learn the isa relationin Earth Observation Domain.We will try to learn ontologies in other target domain

Generic Ontology Learners on Application Domains · Motivations Probabilistic Ontology Learning...

Documents

Method for ontology generation from concept maps in shallow domains

Ontology engineering: Ontology alignment

HNS Ontology Using Faceted Approach...Knowl. Org. 46(2019)No.3 D. Naskar and S. Das. HNS Ontology using Faceted Approach 188 of ontologies in a wide range of domains has been devel-oped,

Ontology Engineering: Ontology evaluation

Building An Ontology For Command & Control - dodccrp.org · Environment Mission Objectives Command Observe Orient DecideDecide Act. ... describing domains and their internal and external

The Scarcity Trap:The Scarcity Trap: material ... · PDF fileThe Scarcity Trap:The Scarcity Trap: ... Phil Purnell (iRI, School of Civil Engineering) Katy Roelich ... Scoring criteria

Ontology and Information Systems - Buffalo Ontology …ontology.buffalo.edu/ontology(PIC).pdf1 Ontology and Information Systems Barry Smith1 Philosophical Ontology Ontology as a branch

The environment ontology in 2016: bridging domains with ... · RESEARCH Open Access The environment ontology in 2016: bridging domains with increased scope, semantic density, and

RODI: Benchmarking Relational-to-Ontology Mapping Generation … · oil and gas domains. Scenarios are constituted of databases, ontologies, mappings, and queries to test expected

Confronting Scarcity

Learning Lightweight Ontologies from Text across Diﬀerent ... · Learning Lightweight Ontologies from Text across Diﬀerent Domains using the Web as Background Knowledge ... ontology

Scarcity project

Kelangkaan Scarcity)

Ontology Alignment. Ontology alignment Ontology alignment Ontology alignment strategies Evaluation of ontology alignment strategies Ontology alignment

Ontologie-Management Kapitel 3: Anwendungen · 3 Klassifikation von Ontologien Top-Level Ontology (Upper Ontology, Foundation Ontology) Application Ontology Domain Ontology Task Ontology

Building up a Large Scale of Ontology from …Today’s Talk zBackground zProposed Methods IS‐A Hierarchy, Class – Instance, RDF triple, Property Domains, Synonyms zApplication

THE PHENOMENON OF SCARCITY · Keywords: scarcity, ontology, form, matter, change, cause, freedom, morality, economy. 1 This ... Scarcity is revealed as a core stimulus of any change,

Achieving Maturity: the State of Practice in Ontology ...community-driven. Mature ontologies already exist in domains such as eHealth or eCommerce. For other domains, initiatives such

Water scarcity

Neoclassical Main principle: resource scarcity Not absolute scarcity, relative scarcity Scarcity relative to “unlimited human wants” Opportunity cost –