91
Learning domain-specific Framenets from texts Roberto Basili Cristina Giannone Diego De Cao DISP University of Rome Tor Vergata, Rome, Italy {basili,giannone,decao}@info.uniroma2.it Ontology Learning and Population Workshop ECAI 2008, Patras, July 22nd 2008

Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Learning domain-specific Framenetsfrom texts

Roberto Basili Cristina Giannone Diego De Cao

DISPUniversity of Rome Tor Vergata, Rome, Italy

basili,giannone,[email protected]

Ontology Learning and Population WorkshopECAI 2008, Patras, July 22nd 2008

Page 2: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Outline

MotivationsReuse: Frames as ontology design patterns

An Unsupervised Ontology Learning model based on Framenet

Semantic Spaces (Pado and Lapata, CL 2007) and LSALexical Semantics for LU and Frame modelingSemantic disambiguation through Wordnet

Lexical Unit Classification and semantic browsingLeave-One-Out testsSyntactic Pattern Acquisition and InterpretationAcquisition of Domain-Specific FramesA Short Demo

ConclusionsCurrent Empirical Evidence and Future Problems

Page 3: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Outline

MotivationsReuse: Frames as ontology design patterns

An Unsupervised Ontology Learning model based on Framenet

Semantic Spaces (Pado and Lapata, CL 2007) and LSALexical Semantics for LU and Frame modelingSemantic disambiguation through Wordnet

Lexical Unit Classification and semantic browsingLeave-One-Out testsSyntactic Pattern Acquisition and InterpretationAcquisition of Domain-Specific FramesA Short Demo

ConclusionsCurrent Empirical Evidence and Future Problems

Page 4: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Outline

MotivationsReuse: Frames as ontology design patterns

An Unsupervised Ontology Learning model based on Framenet

Semantic Spaces (Pado and Lapata, CL 2007) and LSALexical Semantics for LU and Frame modelingSemantic disambiguation through Wordnet

Lexical Unit Classification and semantic browsingLeave-One-Out testsSyntactic Pattern Acquisition and InterpretationAcquisition of Domain-Specific FramesA Short Demo

ConclusionsCurrent Empirical Evidence and Future Problems

Page 5: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Outline

MotivationsReuse: Frames as ontology design patterns

An Unsupervised Ontology Learning model based on Framenet

Semantic Spaces (Pado and Lapata, CL 2007) and LSALexical Semantics for LU and Frame modelingSemantic disambiguation through Wordnet

Lexical Unit Classification and semantic browsingLeave-One-Out testsSyntactic Pattern Acquisition and InterpretationAcquisition of Domain-Specific FramesA Short Demo

ConclusionsCurrent Empirical Evidence and Future Problems

Page 6: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Reuse: Frames as Ontology Design Patterns

CODeP (Conceptual (or Content) Ontology DesignPattern) are general rules for coding domain knowledge.Their reuse is ensured through specialization orcomposition operation in the modeling of new domains.

Assuming Frames as CODeP is interesting for their tightconnection with language constraints at lexical, syntaticand semantic levelMost relations in (useful) ontologies can be seen asspecialization of some frames with a cleaner design effect

Page 7: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Reuse: Frames as Ontology Design Patterns

CODeP (Conceptual (or Content) Ontology DesignPattern) are general rules for coding domain knowledge.Their reuse is ensured through specialization orcomposition operation in the modeling of new domains.Assuming Frames as CODeP is interesting for their tightconnection with language constraints at lexical, syntaticand semantic level

Most relations in (useful) ontologies can be seen asspecialization of some frames with a cleaner design effect

Page 8: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Reuse: Frames as Ontology Design Patterns

CODeP (Conceptual (or Content) Ontology DesignPattern) are general rules for coding domain knowledge.Their reuse is ensured through specialization orcomposition operation in the modeling of new domains.Assuming Frames as CODeP is interesting for their tightconnection with language constraints at lexical, syntaticand semantic levelMost relations in (useful) ontologies can be seen asspecialization of some frames with a cleaner design effect

Page 9: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Frames as Conceptual Patterns

Frames (Fillmore, 1985) are conceptual structures modelingprototypical situations. A frame is evoked in texts through theoccurrence of its lexical units.

Frames and knowledge constraintsConceptual constraints: Frames are characterized by roles,as Frame elementsLexical constraints: (predicate) words evoke frames.Semantic constraints. Predicate arguments areselectionally constrained by a system of semantic types

Page 10: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Frames as Conceptual Patterns

An example: the killing frameFrame: KILLING

A KILLER or CAUSE causes the death of the VICTIM.

KILLER John drowned Martha.VICTIM John drowned Martha.MEANS The flood exterminated the rats by cutting off access

to food.CAUSE The rockslide killed nearly half of the climbers.INSTRUMENT It’s difficult to suicide with only a pocketknife.

Fram

eE

lem

ents

Pred

icat

es

annihilate.v, annihilation.n, asphyxiate.v,assassin.n, assassinate.v,assassination.n, behead.v, beheading.n, blood-bath.n, butcher.v,butchery.n, carnage.n, crucifixion.n, crucify.v, deadly.a, decapi-tate.v, decapitation.n, destroy.v, dispatch.v, drown.v, eliminate.v,euthanasia.n, euthanize.v, . . .

Page 11: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Main assumptions

Ontological Assumptions

Frames as backbones of ontologically relevant predicates in a domain.

We want to automatize the learning of domain specific-frames throughthe specialization of general Framenet predicates.

These latter act as constraints during the OL process

Geometrical models (i.e. Semantic spaces) can be used to representframe properties.

Advantages in an inductive perspective

Semantic spaces are useful for most of the involved lexical inductionsteps, such as LU and sentence classification, ...

The extracted information captures the textual realisations ofpredicates where (semantic) ambiguity and data sparseness are lesspervasive

Page 12: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Main assumptions

Ontological Assumptions

Frames as backbones of ontologically relevant predicates in a domain.

We want to automatize the learning of domain specific-frames throughthe specialization of general Framenet predicates.

These latter act as constraints during the OL process

Geometrical models (i.e. Semantic spaces) can be used to representframe properties.

Advantages in an inductive perspective

Semantic spaces are useful for most of the involved lexical inductionsteps, such as LU and sentence classification, ...

The extracted information captures the textual realisations ofpredicates where (semantic) ambiguity and data sparseness are lesspervasive

Page 13: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Main assumptions

Ontological Assumptions

Frames as backbones of ontologically relevant predicates in a domain.

We want to automatize the learning of domain specific-frames throughthe specialization of general Framenet predicates.

These latter act as constraints during the OL process

Geometrical models (i.e. Semantic spaces) can be used to representframe properties.

Advantages in an inductive perspective

Semantic spaces are useful for most of the involved lexical inductionsteps, such as LU and sentence classification, ...

The extracted information captures the textual realisations ofpredicates where (semantic) ambiguity and data sparseness are lesspervasive

Page 14: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Main assumptions

Ontological Assumptions

Frames as backbones of ontologically relevant predicates in a domain.

We want to automatize the learning of domain specific-frames throughthe specialization of general Framenet predicates.

These latter act as constraints during the OL process

Geometrical models (i.e. Semantic spaces) can be used to representframe properties.

Advantages in an inductive perspective

Semantic spaces are useful for most of the involved lexical inductionsteps, such as LU and sentence classification, ...

The extracted information captures the textual realisations ofpredicates where (semantic) ambiguity and data sparseness are lesspervasive

Page 15: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Main assumptions

Ontological Assumptions

Frames as backbones of ontologically relevant predicates in a domain.

We want to automatize the learning of domain specific-frames throughthe specialization of general Framenet predicates.

These latter act as constraints during the OL process

Geometrical models (i.e. Semantic spaces) can be used to representframe properties.

Advantages in an inductive perspective

Semantic spaces are useful for most of the involved lexical inductionsteps, such as LU and sentence classification, ...

The extracted information captures the textual realisations ofpredicates where (semantic) ambiguity and data sparseness are lesspervasive

Page 16: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Main assumptions

Ontological Assumptions

Frames as backbones of ontologically relevant predicates in a domain.

We want to automatize the learning of domain specific-frames throughthe specialization of general Framenet predicates.

These latter act as constraints during the OL process

Geometrical models (i.e. Semantic spaces) can be used to representframe properties.

Advantages in an inductive perspective

Semantic spaces are useful for most of the involved lexical inductionsteps, such as LU and sentence classification, ...

The extracted information captures the textual realisations ofpredicates where (semantic) ambiguity and data sparseness are lesspervasive

Page 17: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces: a definitionA Semantic Space for a set of N targets is 4-tuple < B,A,S,V > where

B is the set of basic features (e.g. words co-occurring with the targets)

A is a lexical association function that weights the correlationsbetween b ∈ B and the targets

S is a similarity function between targets (i.e. in ℜ|B|×ℜ|B|)

V is a linear transformation over the original N×|B| matrix

Examples

In IR systems targets are documents, B is the term vocabulary, A is thetf · idf score. The S function is usually the cosine similarity, i.e.sim(~t1,~t2) = ∑i t1i·t2i

||~t1||·||~t2||

In Latent Semantic Analysis (Berry et al. 94) targets are documents (orwords), and the SVD transrmation is used as V

Page 18: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces: a definitionA Semantic Space for a set of N targets is 4-tuple < B,A,S,V > where

B is the set of basic features (e.g. words co-occurring with the targets)

A is a lexical association function that weights the correlationsbetween b ∈ B and the targets

S is a similarity function between targets (i.e. in ℜ|B|×ℜ|B|)

V is a linear transformation over the original N×|B| matrix

Examples

In IR systems targets are documents, B is the term vocabulary, A is thetf · idf score. The S function is usually the cosine similarity, i.e.sim(~t1,~t2) = ∑i t1i·t2i

||~t1||·||~t2||

In Latent Semantic Analysis (Berry et al. 94) targets are documents (orwords), and the SVD transrmation is used as V

Page 19: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces: a definitionA Semantic Space for a set of N targets is 4-tuple < B,A,S,V > where

B is the set of basic features (e.g. words co-occurring with the targets)

A is a lexical association function that weights the correlationsbetween b ∈ B and the targets

S is a similarity function between targets (i.e. in ℜ|B|×ℜ|B|)

V is a linear transformation over the original N×|B| matrix

Examples

In IR systems targets are documents, B is the term vocabulary, A is thetf · idf score. The S function is usually the cosine similarity, i.e.sim(~t1,~t2) = ∑i t1i·t2i

||~t1||·||~t2||

In Latent Semantic Analysis (Berry et al. 94) targets are documents (orwords), and the SVD transrmation is used as V

Page 20: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces: a definitionA Semantic Space for a set of N targets is 4-tuple < B,A,S,V > where

B is the set of basic features (e.g. words co-occurring with the targets)

A is a lexical association function that weights the correlationsbetween b ∈ B and the targets

S is a similarity function between targets (i.e. in ℜ|B|×ℜ|B|)

V is a linear transformation over the original N×|B| matrix

Examples

In IR systems targets are documents, B is the term vocabulary, A is thetf · idf score. The S function is usually the cosine similarity, i.e.sim(~t1,~t2) = ∑i t1i·t2i

||~t1||·||~t2||

In Latent Semantic Analysis (Berry et al. 94) targets are documents (orwords), and the SVD transrmation is used as V

Page 21: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces: a definitionA Semantic Space for a set of N targets is 4-tuple < B,A,S,V > where

B is the set of basic features (e.g. words co-occurring with the targets)

A is a lexical association function that weights the correlationsbetween b ∈ B and the targets

S is a similarity function between targets (i.e. in ℜ|B|×ℜ|B|)

V is a linear transformation over the original N×|B| matrix

Examples

In IR systems targets are documents, B is the term vocabulary, A is thetf · idf score. The S function is usually the cosine similarity, i.e.sim(~t1,~t2) = ∑i t1i·t2i

||~t1||·||~t2||

In Latent Semantic Analysis (Berry et al. 94) targets are documents (orwords), and the SVD transrmation is used as V

Page 22: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces: a definitionA Semantic Space for a set of N targets is 4-tuple < B,A,S,V > where

B is the set of basic features (e.g. words co-occurring with the targets)

A is a lexical association function that weights the correlationsbetween b ∈ B and the targets

S is a similarity function between targets (i.e. in ℜ|B|×ℜ|B|)

V is a linear transformation over the original N×|B| matrix

Examples

In IR systems targets are documents, B is the term vocabulary, A is thetf · idf score. The S function is usually the cosine similarity, i.e.sim(~t1,~t2) = ∑i t1i·t2i

||~t1||·||~t2||

In Latent Semantic Analysis (Berry et al. 94) targets are documents (orwords), and the SVD transrmation is used as V

Page 23: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces and Frame semantics

These lexicalized models corresponds to useful generalizations regardingsinonimy, class membership or topical similarity

As frames are rich linguistic structures it is clear that more than one ofsuch properties hold among members (i.e. LUs) of the same frame

Topical similarity plays a role as frames evoke events in very similartopical situations (e.g. KILLING vs. ARREST)

Sinonimy is also informative as LU’s in a frame can be synonyms(such as kid, child), quasi-sinonyms (such as mother vs. father) andco-hyponims

Which feature models and metrics correspond to a suitable geometricalnotion of framehood?

Page 24: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces and Frame semantics

These lexicalized models corresponds to useful generalizations regardingsinonimy, class membership or topical similarity

As frames are rich linguistic structures it is clear that more than one ofsuch properties hold among members (i.e. LUs) of the same frame

Topical similarity plays a role as frames evoke events in very similartopical situations (e.g. KILLING vs. ARREST)

Sinonimy is also informative as LU’s in a frame can be synonyms(such as kid, child), quasi-sinonyms (such as mother vs. father) andco-hyponims

Which feature models and metrics correspond to a suitable geometricalnotion of framehood?

Page 25: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces and Frame semantics

These lexicalized models corresponds to useful generalizations regardingsinonimy, class membership or topical similarity

As frames are rich linguistic structures it is clear that more than one ofsuch properties hold among members (i.e. LUs) of the same frame

Topical similarity plays a role as frames evoke events in very similartopical situations (e.g. KILLING vs. ARREST)

Sinonimy is also informative as LU’s in a frame can be synonyms(such as kid, child), quasi-sinonyms (such as mother vs. father) andco-hyponims

Which feature models and metrics correspond to a suitable geometricalnotion of framehood?

Page 26: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Semantic Spaces and Frame semantics

These lexicalized models corresponds to useful generalizations regardingsinonimy, class membership or topical similarity

As frames are rich linguistic structures it is clear that more than one ofsuch properties hold among members (i.e. LUs) of the same frame

Topical similarity plays a role as frames evoke events in very similartopical situations (e.g. KILLING vs. ARREST)

Sinonimy is also informative as LU’s in a frame can be synonyms(such as kid, child), quasi-sinonyms (such as mother vs. father) andco-hyponims

Which feature models and metrics correspond to a suitable geometricalnotion of framehood?

Page 27: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Latent Semantic Spaces

LSA and Frame semanticsIn our approach SVD is applied to source co-occurrence matrices in order to

Reduce the original dimensionality

Capture topical similarity latent in the original documents, i.e. secondorder relations among targets

Page 28: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Latent Semantic Spaces

LSA and Frame semanticsIn our approach SVD is applied to source co-occurrence matrices in order to

Reduce the original dimensionality

Capture topical similarity latent in the original documents, i.e. secondorder relations among targets

Page 29: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Latent Semantic Spaces

LSA and Frame semanticsIn our approach SVD is applied to source co-occurrence matrices in order to

Reduce the original dimensionality

Capture topical similarity latent in the original documents, i.e. secondorder relations among targets

Page 30: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Latent Semantic Spaces

LSA and Frame semanticsIn our approach SVD is applied to source co-occurrence matrices in order to

Reduce the original dimensionality

Capture topical similarity latent in the original documents, i.e. secondorder relations among targets

Page 31: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Latent Semantic Spaces

LSA and Frame semanticsIn our approach SVD is applied to source co-occurrence matrices in order to

Reduce the original dimensionality

Capture topical similarity latent in the original documents, i.e. secondorder relations among targets

Page 32: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

LSA: semantic interpretation

LSA and PCA

SVD let the principal components of the distribution emerge

Principal components are linear combinations of the originaldimensions, i.e. pseudo concepts

Page 33: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Framehood in a semantic space

Frames are rich polymorphic classes and clustering is applied formultiple centroidsRegions of the space where LU’s manifest are also useful for capturingsentences including the intended predicate semantics

Page 34: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Framehood in a semantic space

Frames are rich polymorphic classes and clustering is applied formultiple centroids

Regions of the space where LU’s manifest are also useful for capturingsentences including the intended predicate semantics

Page 35: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Harvesting ontologies in semantic fields

Framehood in a semantic space

Frames are rich polymorphic classes and clustering is applied formultiple centroidsRegions of the space where LU’s manifest are also useful for capturingsentences including the intended predicate semantics

Page 36: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Challenges: Semantic ambiguity

Sources of Semantic Ambiguity

Word occurrences in texts

Multiply classified LUs in Framenet

Argument semantics (e.g. to kill the head vs. the leader)

Role interpretation

SolutionsLatent Semantic spaces let the most significant topical similarityemerge in the model

Semantic similarity can be computed in Wordnet given a proper notionof context

Page 37: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Generalizing Syntactic Arguments throughWordnet

Task Definition

Given a lexical unit lu ∈ F, and one of its syntactic relations rDetermine the suitable generalizations α in WN able to subsume most ofthe fillers Fr of r

Lexical Fillers for the Subj of killGeneralizations CD Score Cluster of Lexical Fillers

12718325 fire, flame, flaming 1,00 blaze, fire

6859884explosion, detonation,blowup 0,25 explosion, blast

3945064 rocket 0,13 rocket, missile

5921623grammatical category,syntactic category 0,05 agent, number

914938attack, onslaught, on-set, onrush 0,05 shelling, attack, fire

7701234military unit, militaryforce, military group,force

0,04force, troop, armyunit, troop

9770195policeman, police offi-cer, officer 0,04 police officer

9424359executive, executivedirector 0,04 president, minister

Page 38: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Generalizing Arguments through Wordnet

SolutionFor each dependency relation r, and the corresponding set of lexical fillersFr, the semantic similarity in Fr is computed according to the conceptualdensity metric (Basili et al., 2004).

Given Fr, a synset α in Wordnet used to generalize n different nouns w ∈ Fr,the conceptual density, cdFr(α), of α with respect to Fr is defined as:

cdFr(α) = ∑hi=0 µ i

area(α)

where h is the estimated depth of a tree able to generalize the n nouns, i.e.

h =blogµ nc iff µ 6= 1n otherwise

µ is the average branching factor in the Wordnet subhierarchy dominated byα , area(α) is the number of nodes in the α subhierarchy.

Page 39: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Generalizing Arguments through Wordnet

SolutionFor each dependency relation r, and the corresponding set of lexical fillersFr, the semantic similarity in Fr is computed according to the conceptualdensity metric (Basili et al., 2004).

Given Fr, a synset α in Wordnet used to generalize n different nouns w ∈ Fr,the conceptual density, cdFr(α), of α with respect to Fr is defined as:

cdFr(α) = ∑hi=0 µ i

area(α)

where h is the estimated depth of a tree able to generalize the n nouns, i.e.

h =blogµ nc iff µ 6= 1n otherwise

µ is the average branching factor in the Wordnet subhierarchy dominated byα , area(α) is the number of nodes in the α subhierarchy.

Page 40: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Generalizing Arguments through Wordnet

SolutionFor each dependency relation r, and the corresponding set of lexical fillersFr, the semantic similarity in Fr is computed according to the conceptualdensity metric (Basili et al., 2004).

Given Fr, a synset α in Wordnet used to generalize n different nouns w ∈ Fr,the conceptual density, cdFr(α), of α with respect to Fr is defined as:

cdFr(α) = ∑hi=0 µ i

area(α)

where h is the estimated depth of a tree able to generalize the n nouns, i.e.

h =blogµ nc iff µ 6= 1n otherwise

µ is the average branching factor in the Wordnet subhierarchy dominated byα , area(α) is the number of nodes in the α subhierarchy.

Page 41: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

An example of CD estimation

Page 42: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Semantic disambiguation and LU classification

Distributional information and WordnetThe algorithm that decides which frames better characterize a candidate(unknown) lexical unit works as follows

(Candidate) LU’s are first modeled in a semantic space

The set of known LU for a Frame F are the input of a clusterng processresulting in a set of clusters for each F

Vectors of the candidate LU l are compared with the clusters Ci andthe best K frames are retained in this way

CD is computed over the set l∪Ci and the corresponding frames areranked accordingly

The best k of this ranking are suggested as possible frames for l

Page 43: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Semantic disambiguation and LU classification

Distributional information and WordnetThe algorithm that decides which frames better characterize a candidate(unknown) lexical unit works as follows

(Candidate) LU’s are first modeled in a semantic space

The set of known LU for a Frame F are the input of a clusterng processresulting in a set of clusters for each F

Vectors of the candidate LU l are compared with the clusters Ci andthe best K frames are retained in this way

CD is computed over the set l∪Ci and the corresponding frames areranked accordingly

The best k of this ranking are suggested as possible frames for l

Page 44: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Semantic disambiguation and LU classification

Distributional information and WordnetThe algorithm that decides which frames better characterize a candidate(unknown) lexical unit works as follows

(Candidate) LU’s are first modeled in a semantic space

The set of known LU for a Frame F are the input of a clusterng processresulting in a set of clusters for each F

Vectors of the candidate LU l are compared with the clusters Ci andthe best K frames are retained in this way

CD is computed over the set l∪Ci and the corresponding frames areranked accordingly

The best k of this ranking are suggested as possible frames for l

Page 45: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Semantic disambiguation and LU classification

Distributional information and WordnetThe algorithm that decides which frames better characterize a candidate(unknown) lexical unit works as follows

(Candidate) LU’s are first modeled in a semantic space

The set of known LU for a Frame F are the input of a clusterng processresulting in a set of clusters for each F

Vectors of the candidate LU l are compared with the clusters Ci andthe best K frames are retained in this way

CD is computed over the set l∪Ci and the corresponding frames areranked accordingly

The best k of this ranking are suggested as possible frames for l

Page 46: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA space

Clustering applied as a model of polymorphic framesSynonimy based distanceClassification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 47: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA spaceClustering applied as a model of polymorphic frames

Synonimy based distanceClassification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 48: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA spaceClustering applied as a model of polymorphic framesSynonimy based distance

Classification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 49: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA spaceClustering applied as a model of polymorphic framesSynonimy based distanceClassification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 50: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA spaceClustering applied as a model of polymorphic framesSynonimy based distanceClassification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 51: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA spaceClustering applied as a model of polymorphic framesSynonimy based distanceClassification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 52: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA spaceClustering applied as a model of polymorphic framesSynonimy based distanceClassification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 53: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE approach

The processing cascade

LU classification

Geometrical modeling in the word-based LSA spaceClustering applied as a model of polymorphic framesSynonimy based distanceClassification (in a k-NN perspective)

Extraction of Syntactic Collocations and Acquisition ofSyntactic Patterns

Argument Generalization through WN

Argument Interpretation via FE’s

Compilation of Domain-specific Frames

Page 54: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The overall architecture

Page 55: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Current experimental set-up

The Corpus

TREC 2005 vol. 2

# of docs: about 230,000

# of tokens: about 110,000,000 (more than 70,000 types)

Source Dimensionality: 230,000 × 49,000

LSA Dimensionality reduction: 7,700 × 100

Syntactic and Semantic Analysis

Parsing: Minpar (Lin,1998)

Synonimy, hyponimy info: Wordnet 1.7

Semantic Similarity Estimation: CD library (Basili et al„ 2004)

Framenet: 2.0 version

Page 56: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Current experimental set-up

The Corpus

TREC 2005 vol. 2

# of docs: about 230,000

# of tokens: about 110,000,000 (more than 70,000 types)

Source Dimensionality: 230,000 × 49,000

LSA Dimensionality reduction: 7,700 × 100

Syntactic and Semantic Analysis

Parsing: Minpar (Lin,1998)

Synonimy, hyponimy info: Wordnet 1.7

Semantic Similarity Estimation: CD library (Basili et al„ 2004)

Framenet: 2.0 version

Page 57: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

LU CLassification: the LOO Test

The experiment set-up

English Number of frames: 701Number of LUs: 8462(nouns: 3524) (verbs: 3591) (adjectives: 1347)Most likely frames: Self_Motion (p=0.015),Clothing (p=0.014)

Page 58: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

LOO Test: Results

Page 59: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

LU Classification Results

Outcomes

A geometrial model capturing topical similarity establish significantevidence for classyfing novel LU’s not yet covered by Framenet

The role of lexical synonimy helps in improving the distributionalmodel at the expense of a slight reduction in coverage

The flexiblity of the model allow to apply it also to non-EnglishFramenets, such as the Italian, as Wordnet covers more languages

Future Work

Test different distributional models, eg. syntactic ones (Pado&Lapata,2007),for more robust classification

Devise more complex linear transoformation methods, better suited forlocality properties (e.g LPP in (Basili et al., TIR 08))

Page 60: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The overall architecture

Page 61: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Generalizing Arguments through Wordnet

Task Definition

Given a lexical unit lu ∈ F, and one of its syntactic relations rDetermine the suitable generalizations α in WN able to subsume most ofthe fillers Fr of r

Lexical Fillers for the Subj of killGeneralizations CD Score Cluster of Lexical Fillers

12718325 fire, flame, flaming 1,00 blaze, fire

6859884explosion, detonation,blowup 0,25 explosion, blast

3945064 rocket 0,13 rocket, missile

5921623grammatical category,syntactic category 0,05 agent, number

914938attack, onslaught, on-set, onrush 0,05 shelling, attack, fire

7701234military unit, militaryforce, military group,force

0,04force, troop, armyunit, troop

9770195policeman, police offi-cer, officer 0,04 police officer

9424359executive, executivedirector 0,04 president, minister

Page 62: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Generalizing Arguments through Wordnet

SolutionFor each dependency relation r, and the corresponding set of lexical fillersFr, the semantic similarity in Fr is computed according to the conceptualdensity metric (Basili et al., 2004).

Given Fr, a synset α in Wordnet used to generalize n different nouns w ∈ Fr,the conceptual density, cdFr(α), of α with respect to Fr is defined as:

cdFr(α) = ∑hi=0 µ i

area(α)

where h is the estimated depth of a tree able to generalize the n nouns, i.e.

h =blogµ nc iff µ 6= 1n otherwise

µ is the average branching factor in the Wordnet subhierarchy dominated byα , area(α) is the number of nodes in the α subhierarchy.

Page 63: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

An example of CD estimation

Page 64: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Argument interpretation

Task Definition

Given a lexical unit lu ∈ F, one of its syntactic relations r and ageneralization α in WN,determine the suitable frame element FE for (lu,r,α) in F.

SolutionFor each pair (r, α) in a pattern, its semantic similarity with respect to everyFE is computed.

The similarity is established between the set of the fillers F(r,α) AND thelexical representatives ΩFE of FE, such as its label, its semantic typeconstraint or the nominal head of its definition

For all FE’s, conceptual density scores over ΩFE are computed and theargmax is returned, i.e.

FE(r,α) = argmaxFEmaxl∈ΩFEcd(l∪F(r,α).

Page 65: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Frames as Conceptual Patterns

An example: the killing frameFrame: KILLING

A KILLER or CAUSE causes the death of the VICTIM.

CAUSE The rockslide killed nearly half of the climbers.INSTRUMENT It’s difficult to suicide with only a pocketknife.KILLER John drowned Martha.MEANS The flood exterminated the rats by cutting off access

to food.MEDIUM John drowned Martha.

Fram

eE

lem

ents

Pred

icat

es

annihilate.v, annihilation.n, asphyxiate.v,assassin.n, assassinate.v,assassination.n, behead.v, beheading.n, blood-bath.n, butcher.v,butchery.n, carnage.n, crucifixion.n, crucify.v, deadly.a, decapi-tate.v, decapitation.n, destroy.v, dispatch.v, drown.v, eliminate.v,euthanasia.n, euthanize.v, . . .

Page 66: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Argument interpretation: direct objects vs.VICTIM

Page 67: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Argument interpretation: direct objects vs.CAUSE

Page 68: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Acquisition of Semantic Patterns

The local detection of FE mappings for every dependency relation in apattern supports the induction of its global interpretation

Semantic Patterns Induction: a definition

Given a lexical unit lu ∈ F, a pattern made of all the syntactic relations r andcorresponding generalization α in WN, and FE’sDetermine the suitable frame structure, made of roles and semantic typeconstraints.

SolutionSolve the joint model over the preferences for dependencies (r), Wn synsets(α) and FE assignments (fα ) of a given syntactic pattern p, i.e.

SemPatt(lu,p) = argmax(r,α) ∏r∈p

σ1(α|r) ·σ2(fα |(r,α))

where σi are confidence measures over the individual inductive steps

Page 69: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Acquisition of Semantic Patterns

The local detection of FE mappings for every dependency relation in apattern supports the induction of its global interpretation

Semantic Patterns Induction: a definition

Given a lexical unit lu ∈ F, a pattern made of all the syntactic relations r andcorresponding generalization α in WN, and FE’s

Determine the suitable frame structure, made of roles and semantic typeconstraints.

SolutionSolve the joint model over the preferences for dependencies (r), Wn synsets(α) and FE assignments (fα ) of a given syntactic pattern p, i.e.

SemPatt(lu,p) = argmax(r,α) ∏r∈p

σ1(α|r) ·σ2(fα |(r,α))

where σi are confidence measures over the individual inductive steps

Page 70: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Acquisition of Semantic Patterns

The local detection of FE mappings for every dependency relation in apattern supports the induction of its global interpretation

Semantic Patterns Induction: a definition

Given a lexical unit lu ∈ F, a pattern made of all the syntactic relations r andcorresponding generalization α in WN, and FE’sDetermine the suitable frame structure, made of roles and semantic typeconstraints.

SolutionSolve the joint model over the preferences for dependencies (r), Wn synsets(α) and FE assignments (fα ) of a given syntactic pattern p, i.e.

SemPatt(lu,p) = argmax(r,α) ∏r∈p

σ1(α|r) ·σ2(fα |(r,α))

where σi are confidence measures over the individual inductive steps

Page 71: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Acquisition of Semantic Patterns

The local detection of FE mappings for every dependency relation in apattern supports the induction of its global interpretation

Semantic Patterns Induction: a definition

Given a lexical unit lu ∈ F, a pattern made of all the syntactic relations r andcorresponding generalization α in WN, and FE’sDetermine the suitable frame structure, made of roles and semantic typeconstraints.

SolutionSolve the joint model over the preferences for dependencies (r), Wn synsets(α) and FE assignments (fα ) of a given syntactic pattern p, i.e.

SemPatt(lu,p) = argmax(r,α) ∏r∈p

σ1(α|r) ·σ2(fα |(r,α))

where σi are confidence measures over the individual inductive steps

Page 72: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Acquisition of Semantic Patterns

The local detection of FE mappings for every dependency relation in apattern supports the induction of its global interpretation

Semantic Patterns Induction: a definition

Given a lexical unit lu ∈ F, a pattern made of all the syntactic relations r andcorresponding generalization α in WN, and FE’sDetermine the suitable frame structure, made of roles and semantic typeconstraints.

SolutionSolve the joint model over the preferences for dependencies (r), Wn synsets(α) and FE assignments (fα ) of a given syntactic pattern p, i.e.

SemPatt(lu,p) = argmax(r,α) ∏r∈p

σ1(α|r) ·σ2(fα |(r,α))

where σi are confidence measures over the individual inductive steps

Page 73: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

An example of multiple semantic interpretations

Page 74: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Compiling Domain-specific events in OWL

Page 75: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

The FOLIE system on-line

... a quick tour in the demo

Page 76: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

FOLIE is a system for the acquisition of large scaleframenets from domain corpora

It embodies and makes usable several ideas from (a largebody of) machine learning and LA literature

Advanced kernels (i.e. Latent Semantic spaces) forpredicate detection and classificationSentence collection/retrieval through the duality propertiesof the LSA modelingLexical Pattern acquisition (dates back to(Grishman&Sterling,COLING92) or (Basili et al, ANLP92) )Unsupervised Semantic disambiguation over Wordnetthrough syntactic constraints and topological (n-ary)measures

Page 77: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

FOLIE is a system for the acquisition of large scaleframenets from domain corporaIt embodies and makes usable several ideas from (a largebody of) machine learning and LA literature

Advanced kernels (i.e. Latent Semantic spaces) forpredicate detection and classification

Sentence collection/retrieval through the duality propertiesof the LSA modelingLexical Pattern acquisition (dates back to(Grishman&Sterling,COLING92) or (Basili et al, ANLP92) )Unsupervised Semantic disambiguation over Wordnetthrough syntactic constraints and topological (n-ary)measures

Page 78: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

FOLIE is a system for the acquisition of large scaleframenets from domain corporaIt embodies and makes usable several ideas from (a largebody of) machine learning and LA literature

Advanced kernels (i.e. Latent Semantic spaces) forpredicate detection and classificationSentence collection/retrieval through the duality propertiesof the LSA modeling

Lexical Pattern acquisition (dates back to(Grishman&Sterling,COLING92) or (Basili et al, ANLP92) )Unsupervised Semantic disambiguation over Wordnetthrough syntactic constraints and topological (n-ary)measures

Page 79: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

FOLIE is a system for the acquisition of large scaleframenets from domain corporaIt embodies and makes usable several ideas from (a largebody of) machine learning and LA literature

Advanced kernels (i.e. Latent Semantic spaces) forpredicate detection and classificationSentence collection/retrieval through the duality propertiesof the LSA modelingLexical Pattern acquisition (dates back to(Grishman&Sterling,COLING92) or (Basili et al, ANLP92) )

Unsupervised Semantic disambiguation over Wordnetthrough syntactic constraints and topological (n-ary)measures

Page 80: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

FOLIE is a system for the acquisition of large scaleframenets from domain corporaIt embodies and makes usable several ideas from (a largebody of) machine learning and LA literature

Advanced kernels (i.e. Latent Semantic spaces) forpredicate detection and classificationSentence collection/retrieval through the duality propertiesof the LSA modelingLexical Pattern acquisition (dates back to(Grishman&Sterling,COLING92) or (Basili et al, ANLP92) )Unsupervised Semantic disambiguation over Wordnetthrough syntactic constraints and topological (n-ary)measures

Page 81: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

Novel Issues in FOLIELexical Semantics is used in FOLIE to drive the LUclassification task, as in Detour

Semantic disambiguation of syntactic arguments is drivenby the Wordnet topology, via the CD estimates (Basili et al., 2004)

Conceptual density is also used to suggest theinterpretations of predicate arguments in term ofsemantically similar Frame ElementsThe process thus relies on a combination of distributionaland paradigmatic information that tries to optimze theoverall evidence available without manual taggingLexicalized processes are more readable and usable by theuser

Missing thingsEvaluation: any suggestion?Application to Question AnsweringAdd some supervision through Kernel based classifiers

Page 82: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

Novel Issues in FOLIELexical Semantics is used in FOLIE to drive the LUclassification task, as in DetourSemantic disambiguation of syntactic arguments is drivenby the Wordnet topology, via the CD estimates (Basili et al., 2004)

Conceptual density is also used to suggest theinterpretations of predicate arguments in term ofsemantically similar Frame ElementsThe process thus relies on a combination of distributionaland paradigmatic information that tries to optimze theoverall evidence available without manual taggingLexicalized processes are more readable and usable by theuser

Missing thingsEvaluation: any suggestion?Application to Question AnsweringAdd some supervision through Kernel based classifiers

Page 83: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

Novel Issues in FOLIELexical Semantics is used in FOLIE to drive the LUclassification task, as in DetourSemantic disambiguation of syntactic arguments is drivenby the Wordnet topology, via the CD estimates (Basili et al., 2004)

Conceptual density is also used to suggest theinterpretations of predicate arguments in term ofsemantically similar Frame Elements

The process thus relies on a combination of distributionaland paradigmatic information that tries to optimze theoverall evidence available without manual taggingLexicalized processes are more readable and usable by theuser

Missing thingsEvaluation: any suggestion?Application to Question AnsweringAdd some supervision through Kernel based classifiers

Page 84: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

Novel Issues in FOLIELexical Semantics is used in FOLIE to drive the LUclassification task, as in DetourSemantic disambiguation of syntactic arguments is drivenby the Wordnet topology, via the CD estimates (Basili et al., 2004)

Conceptual density is also used to suggest theinterpretations of predicate arguments in term ofsemantically similar Frame ElementsThe process thus relies on a combination of distributionaland paradigmatic information that tries to optimze theoverall evidence available without manual taggingLexicalized processes are more readable and usable by theuser

Missing thingsEvaluation: any suggestion?

Application to Question AnsweringAdd some supervision through Kernel based classifiers

Page 85: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

Novel Issues in FOLIELexical Semantics is used in FOLIE to drive the LUclassification task, as in DetourSemantic disambiguation of syntactic arguments is drivenby the Wordnet topology, via the CD estimates (Basili et al., 2004)

Conceptual density is also used to suggest theinterpretations of predicate arguments in term ofsemantically similar Frame ElementsThe process thus relies on a combination of distributionaland paradigmatic information that tries to optimze theoverall evidence available without manual taggingLexicalized processes are more readable and usable by theuser

Missing thingsEvaluation: any suggestion?Application to Question Answering

Add some supervision through Kernel based classifiers

Page 86: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

Novel Issues in FOLIELexical Semantics is used in FOLIE to drive the LUclassification task, as in DetourSemantic disambiguation of syntactic arguments is drivenby the Wordnet topology, via the CD estimates (Basili et al., 2004)

Conceptual density is also used to suggest theinterpretations of predicate arguments in term ofsemantically similar Frame ElementsThe process thus relies on a combination of distributionaland paradigmatic information that tries to optimze theoverall evidence available without manual taggingLexicalized processes are more readable and usable by theuser

Missing thingsEvaluation: any suggestion?Application to Question AnsweringAdd some supervision through Kernel based classifiers

Page 87: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Conclusions

Novel Issues in FOLIELexical Semantics is used in FOLIE to drive the LUclassification task, as in DetourSemantic disambiguation of syntactic arguments is drivenby the Wordnet topology, via the CD estimates (Basili et al., 2004)

Conceptual density is also used to suggest theinterpretations of predicate arguments in term ofsemantically similar Frame ElementsThe process thus relies on a combination of distributionaland paradigmatic information that tries to optimze theoverall evidence available without manual taggingLexicalized processes are more readable and usable by theuser

Missing thingsEvaluation: any suggestion?Application to Question AnsweringAdd some supervision through Kernel based classifiers

Page 88: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Future Work

Modeling

Modeling predicate argument interpretation also via distibutionalmodels, as suggested in the ISP research of Hovy and colleagues onparaphrase patterns (see (Basili et al., RANLP 07)

Improving the ribustness of the model via probabilistic interpretationso fthe joint model for semantic patern acquisitionModel in OWL the grammatical knowledge acquired during theprocess as side effect of the acquisition process

Applications to non-English langauges

Apply FOLIE in the development of non-English FramenetsExploit the multilingual settings available for Wordnet to compilelarge scale LU repositoriesUse non-English corpora by parsing and inducing lexicalized domainspecific patterns (see (Basili et al, STEP 2008 forthcoming))

Page 89: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

Future Work

Modeling

Modeling predicate argument interpretation also via distibutionalmodels, as suggested in the ISP research of Hovy and colleagues onparaphrase patterns (see (Basili et al., RANLP 07)

Improving the ribustness of the model via probabilistic interpretationso fthe joint model for semantic patern acquisitionModel in OWL the grammatical knowledge acquired during theprocess as side effect of the acquisition process

Applications to non-English langauges

Apply FOLIE in the development of non-English FramenetsExploit the multilingual settings available for Wordnet to compilelarge scale LU repositoriesUse non-English corpora by parsing and inducing lexicalized domainspecific patterns (see (Basili et al, STEP 2008 forthcoming))

Page 90: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

References

Semantic Spaces

R. Basili, P. Marocco and D. Milizia, Semantically rich spaces fordocument clustering, Proceedings of the DEXA Workshop onText-based Information Retrieval, Torino, September 2008.Marco Pennacchiotti, Diego De Cao, Paolo Marocco, Roberto Basili,Towards a Vector Space Model for FrameNet-like Resources,Proceedings of the LREC Conference 2008, May 2008, Marrakesh,Morocco.Diego De Cao, Danilo Croce, Marco Pennacchiotti, Roberto Basili,Combining word sense and usage for modeling frame semantics,Proceedings of the Symposium On Semantics In Systems For TextProcessing (STEP 08), September 22-24, 2008 - Venice, Italy.Marco Pennacchiotti, Diego De Cao, Roberto Basili, Danilo Croce,Michael Roth, Automatic induction of FrameNet lexical units,Proceedings of the Int. Conference on EMNLP, Hawaii, USA,October, 2008.

Page 91: Learning domain-specific Framenets from texts · Ontology Learning and Population Workshop ... Patras, July 22nd 2008. OverviewMotivationThe OL approachThe FOLIE systemConclusions

Overview Motivation The OL approach The FOLIE system Conclusions and Future Works References

References (2)

Development of a Framenet-based Ontology

Roberto Basili, Cristina Giannone, Diego De Cao, Learningdomain-specific framenets from texts, Proceedings of the ECAI WS onOntology Learning and Population, Patras, Greece, 2008.Roberto Basili, Cristina Giannone, Diego De Cao, Language-drivenontology learning, (System Demo Paper) in Proceedings of EKAW,2008, Aci Trezza, Sicily, September 2008.

Kernel-based CoNLL

Basili R., Marco Cammisa, Alessandro Moschitti. (2006). A SemanticKernel to classify texts with very few training examples. Informatica.ISSN: 0350-5596.Alessandro Moschitti, Daniele Pighin and Roberto Basili. Tree Kernelsfor Semantic Role Labeling, Special Issue on Semantic Role Labeling,Computational Linguistics Journal. MIT Press for ACL. June 2008,Vol. 34, No. 2, Pages 145-159.