50
AAAI 2002 WS 1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University [email protected], {ding,embley}@cs.byu.edu, [email protected] (Boosting conceptual content for ontology generation)

AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University [email protected], {ding,embley}@cs.byu.edu,

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 1

Peppering knowledge sources

with SALT

Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby

Brigham Young University [email protected], {ding,embley}@cs.byu.edu,

[email protected]

(Boosting conceptual content for ontology generation)

Page 2: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 2

Acknowledgements Co-authors (Embley, Ding) EU Fifth Framework IST/HLT 3.4.1 NSF Information and Intelligent

Systems grant IIS-0083127 Gerhard Budin (Eurodicautom data) Sergei Nirenburg (Mikrokosmos

ontology)

Page 3: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 3

Outline Termbases and lexicons: (re)use(s) The SALT and TIDIE projects Data modeling and data resources Termbase conversion Ontology generation Results and evaluation Conclusions

Page 4: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 4

Termbases Terminology databases for humans

in multilingual documentation industry

Several models, formats; often concept-oriented in nature

Termium, Eurodicautom, etc.

Page 5: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 5

Lexicons NLP applications: IR, MT, NLU,

speech understanding Widely varying data formats Description at various levels of

linguistic theory

Page 6: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 6

Sharing resources Integration is the trend

Lexicons (OLIF for MT system lexicons) Termbases (MARTIF for human termbases) Lexicons and termbases

Needed: principled data-modeling approach Wide variety of information to be treated Wide range of formats currently in use

Page 7: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 7

The SALT project SALT: Standards-based Access service

to multilingual Lexicons and Terminologies (www.ttt.org/salt).

International cooperation, standards for coding and interchange of linguistic data, and the combining of technologies

Several partners (BYU TRG, KSU, etc.) Data modeling approach to addresses

the problem of interchange among diverse collections of such data, including their ontological substructure

Page 8: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 8

The SALT approach Goal: provide

1) Modularity differentiate core structure vs. data category

specification 2) Coherence

use a meta-model 3) Flexibility

Support interoperable alternative representations Modular meta-model approach

Implemented in various settings Ongoing refinement: model’s coverage

Page 9: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 9

The TIDIE project TIDIE: Target-based Independent-of-

Document Information Extraction (www.deg.byu.edu)

Ontology-based data extraction Conceptual modeling of real-world

applications Narrow, data-rich domains Leverage (or build) custom ontologies

for target-based extraction

Page 10: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 10

Information exchangeSource Target

InformationExtraction

SchemaMatching

Leveragethis …

… to dothis

Page 11: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 11

Information Extraction Examine/retrieve information from

documents to fill information from user-supplied template

Requires some user-oriented specification of information

Our approach: finding, extracting, structuring, and synthesizing information is easier given a conceptual-model-based ontology

Page 12: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 12

Extracting pertinent information from documents

Page 13: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 13

A Conceptual Modeling Solution

Year Price

Make Mileage

Model

Feature

PhoneNr

Extension

Car

hashas

has

has is for

has

has

has

1..*

0..1

1..*

1..* 1..*

1..*

1..*

1..*

0..1 0..10..1

0..1

0..1

0..1

0..*

1..*

Page 14: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 14

Car-Ads OntologyCar [->object];Car [0..1] has Year [1..*];Car [0..1] has Make [1..*];Car [0...1] has Model [1..*];Car [0..1] has Mileage [1..*];Car [0..*] has Feature [1..*];Car [0..1] has Price [1..*];PhoneNr [1..*] is for Car [0..*];PhoneNr [0..1] has Extension [1..*];Year matches [4]

constant {extract “\d{2}”; context "([^\$\d]|^)[4-9]\d,[^\d]"; substitute "^" -> "19"; }, …End;

Page 15: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 15

Recognition and Extraction

Car Year Make Model Mileage Price PhoneNr0001 1989 Subaru SW $1900 (363)835-85970002 1998 Elandra (336)526-54440003 1994 HONDA ACCORD EX 100K (336)526-1081

Car Feature0001 Auto0001 AC0002 Black0002 4 door0002 tinted windows0002 Auto0002 pb0002 ps0002 cruise0002 am/fm0002 cassette stero0002 a/c0003 Auto0003 jade green0003 gold

Page 16: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 16

Lexical resources for data modeling Information extraction also requires knowledge

representations with terminological and conceptual content.

Extraction ontology knowledge sources must: be of a general nature contain meaningful relationships already exist in machine-readable form have a straightforward conversion into XML.

This paper: create, leverage large-scale termbase

some ontological structure reformatted according to the SALT standard converted into μK-compliant XML for use by the

ontology generator

Page 17: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 17

Eurodicautom Well-known, widely-used termbase

> 1 million concept entries Wide range of topics Entries are multilingual

Entry information: sources cited, input/approval dates, …

Single-word terms (e.g. “generator”) or multi-word expressions (e.g. “black humus”)

Entries each have Lenoch subject-area code Hierarchical representation for classifying terms (and

by extension their related concepts)

Page 18: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 18

Partial Eurodicautom entry%%CM AG4 CH6 GO6%%DA%%VE lavmosetørv%%RF A.Klougart%%EN%%VE black humus%%RF CILF,Dict.Agriculture,ACCT,1977%%IT%%VE humus nero%%RF BTB%%ES%%VE humus negro%%RF CILF,Dict.Agriculture,ACCT,1977%%SV%%VE sumpjord%%RF Mats Olsson,SLU(1997)

Page 19: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 19

Sample Lenoch codesAD Public Administration - Private Administration - OfficesAD1 general aspects of the subject fieldAD2 public and private organisationsAD3 publications & documentary searchAD31 documentation and information systemsAD4 administrative staffAD5 public procurement AD51 expropriation in the public interest

TEH testing methodsTEH1 general aspects of testing methodsTEH2 non-destructive testingTEH21 chemical testsTEH22 photometrical testingTEH221 X-ray spectrometrical testing

Page 20: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 20

Converting the termbase Use several thousand English terms and their subject

codes %%CM line lists three Lenoch codes:

AG4 (representing the subclass AGRONOMY), CH6 (representing ANALYTICAL-CHEMISTRY) GO6 (representing GEOMORPH-OLOGY).

Convert termbase entries via the SALT-developed TBX termbase exchange framework

XML-based refinement of MARTIF Convert to μK XML format used by ontology engine Result: TBX-mediated conversion from native

Eurodicautom terms to the final XML-specified ontology (μK)

Lenoch codes re-interpreted as typical hierarchical relations (e.g. IS-A and SUBCLASS)

Page 21: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 21

Conversion process

Eurodicautom

(native)

Lenoch

Eurodicautom

(TBX)

SALT

Eurodicautom

(μK)

Page 22: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 22

Eurodicautom-TBX encoding

<?xml version="1.0" encoding="UTF-8"?><martif xmlns="x-schema:XLTcsV04.xml" lang="en" type="DXLT"> <martifHeader> <fileDesc> <sourceDesc> <p>sample Eurodicautom entry</p> </sourceDesc> </fileDesc> <encodingDesc> <p type="DCSName">DXLTdv04.xml</p> </encodingDesc> </martifHeader> <text> <body> <termEntry id="eid-EDIC-BTB-DAG77-63"> <admin type="originatingInstitution">BTB</admin> <admin type="projectSubset">DAG77</admin> <descrip type="reliabilityCode">4</descrip> <langSet lang="pt"> <ntig> <termGrp> <term>souto</term> <termNote type="termType">fullForm</termNote> </termGrp> <admin type="conceptID">BTB-DAG77-63</admin> <admin target="bib-Mock" type="sourceIdentifier">V.Correia,Engº Agrónomo,PDR Vale do Lima</admin> </ntig> <ntig> <termGrp> <term>minifúndio</term> <termNote type="termType">fullForm</termNote> </termGrp> <admin type="conceptID">BTB-DAG77-63</admin> <admin target="bib-Mock" type="sourceIdentifier">V.Correia,Engº Agrónomo,PDR Vale do Lima</admin> </ntig> </langSet> </termEntry>

Page 23: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 23

Derived XML ontology<RECORD> <CONCEPT>xenobiotic substances</CONCEPT> <SLOT>SUBCLASSES</SLOT> <FACET>VALUE/FACET> <FILLER>hazardous raw materials </FILLER> <UID>0</UID></RECORD><RECORD> <CONCEPT>physical nuisances</CONCEPT> <SLOT>SUBCLASSES</SLOT> <FACET>VALUE/FACET> <FILLER>ambient light</FILLER> <UID>0</UID></RECORD><RECORD> <CONCEPT>financial statistics</CONCEPT> <SLOT>IS-A</SLOT> <FACET>VALUE/FACET> <FILLER>economic statistics</FILLER> <UID>0</UID></RECORD>….

Page 24: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 24

Ontology generation Goal: specify an ontology for

information extraction purposes Problem: complex, tedious, costly Ideally: automatically generate

schemas, ontologies Source: natural-language text,

tables, etc.

Page 25: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 25

Ontology generation overview

Page 26: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 26

Knowledge sources Mikrokosmos (μK) ontology

About 5,000 hierarchically-arranged concepts Fairly high connectivity ( about 14 inter-concept links

per node) Fairly general content, inheritance of properties

Data frame library regular-expression templates for matching

structured low-level lexical items (e.g. measurements, dates, currency expressions, and phone numbers)

provide information for conceptual matching via inheritance

Lexicons (e.g. onomastica, WordNet synsets) Domain-specific training documents

Page 27: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 27

Knowledge integration

Page 28: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 28

Methodology Preprocess input knowledge sources: Integrate: map lexicon content and data frame

templates to nodes in the merged ontology Extract: match information from training

documents collection Parse, tokenize, regularize lexical content

Generate the ontology: four-stage generation process

concept selection relationship retrieval constraint discovery refinement of the output ontology

Page 29: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 29

Processing input documents

Page 30: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 30

Concept selection Finding which subset of the ontology’s

concepts is of interest to a user Concepts are selected via string matches

between textual content and the ontological data.

Three different selection heuristics concept-name matching concept-value matching data-frame pattern matching

String matching plus: word synonym matching: WordNet synonym sets multi-word term matching: bag-of-words

(CAPITAL-CITY is considered a synonym of capital and city)

Page 31: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 31

Concept selection algorithmPROCEDURE ConceptSelection(Tdoc, Kbase) SourceDoc = Parse(Tdoc); PrimarySelectedConceptsList = MikroSelection(M-Ontology); SecondarySelectedConceptsList = DataFrameSelection(DF-

Library); ConflictHandling(); SelectedSubgraphGeneration();

Page 32: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 32

Basic Selection Strategy Select from Mikrokosmos

Ontology

Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar

Mazar-e-Sharif Konduz Terrain: Landlocked;

mostly mountains and desert.

Climate: Dry, with cold winters and hot summers.

Population:17.7 million. Agriculture: Wheat, corn,

barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Page 33: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 33

Basic Selection Strategy Select from Mikrokosmos

Ontology concept names and their

synonyms

Afghanistan smaller than Texas. Area<GeographicalArea>:

648,000 sq. km. Capital<CapitalCity><Financi

alCapital>--Kabul, Other cities--Kandahar Mazar-e-Sharif

Konduz Terrain: Landlocked; mostly mountains

and desert. Climate: Dry, with cold winters and hot

summers.

Population<Population>:17.7 million.

Agriculture:Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Page 34: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 34

Select from Mikrokosmos Ontology concept names and their

synonyms concept values and their

synonyms

Afghanistan<Nation> smaller than

Texas<USState>. Area<GeographicalArea>:

648,000 sq. km. Capital<CapitalCity><Financi

alCapital>--Kabul<CapitalCity>,

Other cities--Kandahar Mazar-e-Sharif Konduz

Terrain: Landlocked; mostly mountains and desert.

Climate: Dry, with cold winters and hot summers.

Population<Population>:17.7 million.

Agriculture:Wheat<FoodStuff><AgriculturalProduct>, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Basic Selection Strategy

Page 35: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 35

Select from Mikrokosmos Ontology concept names and their

synonyms concept values and their

synonyms Select from Data Frame

Libraries

Afghanistan smaller than Texas. Area: 648,000 sq. km. Capital--Kabul, Other cities--Kandahar

Mazar-e-Sharif Konduz Terrain: Landlocked;

mostly mountains and desert.

Climate: Dry, with cold winters and hot summers.

Population:17.7 million. Agriculture: Wheat, corn,

barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Basic Selection Strategy

Page 36: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 36

Select from Mikrokosmos Ontology concept names and their

synonyms concept values and their

synonyms Select from Data Frame

Libraries extract result based on the

data frames

Afghanistan smaller than Texas. Area:

648,000<Area><Mileage> sq. km.

Capital--Kabul, Other cities--Kandahar

Mazar-e-Sharif Konduz Terrain: Landlocked;

mostly mountains and desert.

Climate: Dry, with cold winters and hot summers.

Population:17.7<Time> million<Population><Price>.

Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Basic Selection Strategy

Page 37: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 37

Concept conflict resolution Arrive at an internally consistent set of

selected concepts. Two levels of resolution

Document-level resolution Knowledge-source resolution

Criteria: lexical occurrence, proximity, length and distribution of words and terms

Preferences from among knowledge sources specifying matches

Other default strategies

Page 38: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 38

Document-Level Conflict Afghanistan smaller than Texas. Area: 648,000<Area><Mileage> sq. km. Capital<CapitalCity><FinancialCapital>--

Kabul<CapitalCity>, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population:17.7<Time> million<Population><Price>. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts,

karakul pelts, wool, mutton.

Page 39: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 39

Concept-Level Conflict Afghanistan smaller than Texas. Area<GeographicalArea>: 648,000<Area> sq. km. Capital--Kabul, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population<Population>: 17.7 million<Population>. Agriculture: Wheat<FoodStuff><AgriculturalProduct>, corn,

barley,rice, cotton, fruit, nuts, karakul pelts, wool, mutton.

Page 40: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 40

Relationship retrieval Ontology structure: directed graph, nodes are concepts Conceptual relationship: all paths connecting concepts

generated at given stage Theoretical solution: find all the paths in the graph (NP-

complete) When multiple paths do exist, take the shortest path

between 2 concepts (Cf. μK Onto-Search algorithm) Dijkstra’s (polynomial) algorithm to compute the most

salient relationships between concepts Distance threshold on path length to prune weak

relationships Construct schemas, or linked conceptual configurations,

from the relationships posited in the previous step. Primary concept selected (or posited): highest connectivity Cardinalities inferred from observed relationships

Page 41: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 41

Participation Constraints Afghanistan<Nation> smaller than Texas. Area: 648,000 sq. km. Capital—Kabul<CapitalCity>, Other cities--Kandahar Mazar-e-Sharif Konduz Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population: 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts,

karakul pelts, wool, mutton.

CapitalCity [1:1] IsA.CITY.PartOf Nation [1:1]

Page 42: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 42

Participation Constraints (2) Afghanistan<Nation> smaller than Texas. Area: 648,000 sq. km. Capital--Kabul<City>, Other cities<City>--Kandahar<City> Mazar-e-Sharif<City>

Konduz<City> Terrain: Landlocked; mostly mountains and desert. Climate: Dry, with cold winters and hot summers. Population: 17.7 million. Agriculture: Wheat, corn, barley,rice, cotton, fruit, nuts,

karakul pelts, wool, mutton.

City [1:1] PartOf Nation [1:*]

Page 43: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 43

Refining results Output ontology: may require hand-crafting

can be done in a text editor (flat ASCII ontology) Considerable expertise required:

markup syntax specification of conceptual relations. familiarity with regular-expression writing

Possible solution: ontology editors for typical end-users

With rich enough knowledge sources and a good set of training documents, however, we believe that the generation of extraction ontologies can be fully automatic.

Page 44: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 44

Testing the system Input: various of U.S. Department

of Energy abstracts Knowledge base:

μK ontology Energy sub-hierarchy of

Eurodicautom terms (300)

Page 45: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 45

Sample application document

The trend in supply and demand of fuel and the fuels for electric power generation, iron manufacturing and transportation were reviewed from theliterature published in Japan and abroad in 1986. FY 1986 was a turning point in the supply and demand of energy and also a serious year for them because the world crude oil price dropped drastically and the exchange rate of yen rose rapidly since the end of 1985 in Japan as well. The fuel consumption for steam power generation in FY 1986 shows the negative growth for two successive years as much as 98.1%, or 65,730,000 kl in heavy oil equivalent, to that in the previous year. The total energy consumption in the iron and steel industry in 1986 was 586 trillion kcal (626 trillion kcal in the previous year). The total sales amount of fuel in 1986 was 184,040,000 kl showing a 1.5% increase from that in the previous year. The concept Best Mix was proposed as the ideal way in the energy industry. (21 figs, 2 tabs, 29 refs)

Page 46: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 46

Sample output-- energy2 Information Ontologyenergy2 [-> object];energy2 [0:*] has Alloy [1:*];energy2 [0:*] has Consumption [1:*];energy2 [0:*] has CrudeOil [1:*];energy2 [0:*] has ForProfitCorporation [1:*];energy2 [0:*] has FossilRawMaterials [1:*];energy2 [0:*] has Gas [1:*];energy2 [0:*] has Increase [1:*];energy2 [0:*] has LinseedOil [1:*];energy2 [0:*] has MetallicSolidElement [1:*];energy2 [0:*] has Ores [1:*];energy2 [0:*] has Produce [1:*];energy2 [0:*] has RawMaterials [1:*];energy2 [0:*] has RawMaterialsSupply [1:*];Alloy [0:*] MadeOf.SOLIDELEMENT.Subclasses MetallicSolidElement [0:*];Alloy [0:*] IsA.METAL.StateOfMatter.SOLID.Subclasses CrudeOil [0:*];Alloy [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Produce [0:*];AmountAttribute [0:*] IsA.SCALARATTRIBUTE.MeasuredIn.MEASURINGUNITConsumption [0:*] IsA.FINANCIALEVENT.Agent Human [0:*];ControlEvent [0:*] IsA.SOCIALEVENT.Agent Human [0:*];ControlEvent [0:*] IsA.SOCIALEVENT.Location.PLACE.Subclasses Nation [0:*];CountryName [0:*] NameOf Nation [0:*];CountryName [0:*] IsA.REPRESENTATIONALOBJECT.OwnedBy Human [0:*];CrudeOil [0:*] IsA.PHYSICALOBJECT.Location.PLACE.Subclasses Nation [0:*];CrudeOil [0:*] IsA.PHYSICALOBJECT.OwnedBy Human [0:*];CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.GROW.Subclasses GrowAnimate [0:*];CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Increase [0:*];CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Combine [0:*];CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Display [0:*];CrudeOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Produce [0:*];Custom [0:*] IsA.ABSTRACTOBJECT.ThemeOf.MENTALEVENT.Subclasses AddUp [0:*];Display [0:*] IsA.PHYSICALEVENT.Theme.PHYSICALOBJECT.Subclasses Gas [0:*];Display [0:*] IsA.PHYSICALEVENT.Theme.PHYSICALOBJECT.OwnedBy Human [0:*];ForProfitCorporation [0:*] OwnedBy Human [0:*];ForProfitCorporation [0:*] IsA.CORPORATION.HasNationality Nation [0:*];Gas [0:*] IsA.PHYSICALOBJECT.Location.PLACE.Subclasses Nation [0:*];Gas [0:*] IsA.PHYSICALOBJECT.ThemeOf.GROW.Subclasses GrowAnimate [0:*];LinseedOil [0:*] IsA.PHYSICALOBJECT.ThemeOf.PHYSICALEVENT.Subclasses Increase [0:*];

Page 47: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 47

Evaluation Several dozen relationships are generated

Correct: relationship is posited between the concept CRUDE-OIL and the action PRODUCE; the role is Theme, meaning that one can PRODUCE CRUDE-OIL

Incorrect: relationship between GAS and GROW Precision: relatively low (around 75%) due to

high number of matches Recall: better (around 90%) Note: it’s easier for a human to refine the

system’s output by rejecting spurious relationships (i.e. deleting false positives) than to specify relationships that the system has missed.

Page 48: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 48

How to improve results Less general, more focused

ontologies Richer ontological structure

More types of hierarchical relationships (beyond IS-A and its inverse, SUB-CLASSES)

Deeper hierarchies (maximum 4 in Lenoch)

Note: TBX supports several data types for conceptual encoding

Page 49: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 49

Related work Lexical chaining in NLP

extracting and associating chains of word-based relationships from text

relating words and terms to resources like WordNet

Widely used in text categorization, automatic summarization, and topic detection and tracking

Our contributions: integrating disparate knowledge sources for

similar tasks Discovering and generating a compatible

set of ontological relationships

Page 50: AAAI 2002 WS1 Peppering knowledge sources with SALT Deryle Lonsdale, Yihong Ding, David W. Embley, Alan Melby Brigham Young University lonz@byu.edu, {ding,embley}@cs.byu.edu,

AAAI 2002 WS 50

Conclusions The knowledge acquisition bottleneck impacts ontology

construction for information extraction. Terminographers and lexicographers codify information

that can be advantageous for work in semantic-based processing.

Integrating these two disparate areas, it is possible to leverage large-scale terminological and conceptual information with relationship-rich semantic resources in order to reformulate, match, and merge retrieved information of interest to a user.

Possible future applications: Knowledge-focused personal agents Customized search, filtering, and extraction tools Individually tailored views of data via integration,

organization, and summarization Lots of work still to be done…