Upload
aldo-tolton
View
216
Download
0
Embed Size (px)
Citation preview
From General Ontology to SpeFrom General Ontology to Specific Ontology:cific Ontology:
Study of Shu-Shi Poems Study of Shu-Shi Poems
Ru-Yng Chang, Sue-ming Chang,
Feng-ju Luo*, Chu-Ren Huang
Academia Sinica, Yuan-Ze University*
Knowledge and Knowledge and Knowledge Structure VariationKnowledge Structure Variation
Knowledge is Structured Information
Most salient factors dictating variations in knowledge structures are time, space, and domain
Language is both the product and conduit of the conceptual structure of its speakers
Accessing Knowledge StructureAccessing Knowledge Structure
In order to become sharable and reusable knowledge, all extracted information must first be correctly situated in a knowledge structure
The situated information must be allowed to transfer from knowledge structure to knowledge structure without losing its meaningful content
Research GoalResearch Goal
Knowledge Structure Discovery– Knowledge as situated information– Language endows information with structure
Text-based and Lexicon-driven Knowledge Structure Discovery– General Ontology: the upper ontology shared
by all domains (such as SUMO)– Specific Ontology: a ontology specific to a
domain, historical period, an author etc.
Research MethodologyResearch Methodology
The Mental Lexicon Approach
The Shakespearean-garden Approach
The Ontology-merging as Ontology-discovery Approach
The Mental Lexicon ApproachThe Mental Lexicon Approach
Concepts are stored in the mental lexicon
The basic unit of mental lexicon organization and access is lexical entry
A complete list of lexical entries covers the complete list of conceptual atoms
Lexical semantic relations mirror conceptual relations
Each Word is a Conceptual Atom
The Shakespearean-garden ApproachThe Shakespearean-garden Approach
A Shakespearean garden collects all the plants referred to in Shakespearean texts.– The garden is used to illustrate the flora of the
Shakespearean England and gives scholars a context in which to interpret his work.
There is a knowledge structure behind each corpus (i.e. a collection of texts with design criteria)
Lexicon as a Structured Inventory of Conceptual Atoms– For instance, complete set of texts by an author, from
a certain period, or in a certain domain
The Ontology-merging as The Ontology-merging as Ontology-discovery Approach IOntology-discovery Approach I
Ontology provides a structure for knowledge to be situatedHowever, there is a dilemma for the construction of a new ontology– If no existing ontology is referred to:
reinventing the wheel, difficult to start a structure from scratch without rules
– If existing ontology is referred to: mislead by existing structure, mismatched or erroneous
The Ontology-merging as The Ontology-merging as Ontology-discovery Approach IIOntology-discovery Approach II
The SolutionMap conceptual atoms to two (or more) reference ontologiesMerge the two resultant ontologies– Matched Mapping: Confirmation of knowledge
structure– Mismatched Mapping: Only one or neither is c
orrect. Possibly lead to discovery of new knowledge structure
– Complimentary Mapping: Increases coverage
Resources usedResources used
WordNethttp://www.cogsci.princeton.edu/~wn/
SUMO Ontologyhttp://www.ontologyportal.org
Academia Sinica Bilingual Ontological Wordnet (Sinica BOW) : SUMO + WordNethttp://bow.sinica.edu.tw
Segmentation Program etc.http://LingAnchor.sinica.edu.tw/
Domain Lexicon Management System:Segmentation, New Word DetectionLexical Database
The information of Sinica BOWThe information of Sinica BOW
EX: fish
Sense
Domain
POS
Definition
Translation
Semantic relation
SUMO
Example
SUMO: SUMO: Suggested Upper Merged OntologySuggested Upper Merged Ontology
SUMO AtomsConcepts: around 1000Note that concepts are not necessarily linguistically realizedRelations (ISA): See SUMO Graph
Axioms: for inference
Open resource created under an initiative from IEEE Standard Upper Ontology Working Group
MethodologyMethodology
From lexicon to ontology (from items to structure)
Ontology discovery through ontology merging
WHY?WHY?
We do not have the knowledge structure (ontology) of a new domain (historical period, field etc.)
But typical ontology discovery needs a framework to be mapped to
To solve the dilemma we map the conceptual atoms to both SUMO and WN (as a linguistic ontology)
How to build a domain ontologyHow to build a domain ontology
Word segmentationWord segmentation
Match WordNet synset and SUMO concept automatically Match WordNet synset and SUMO concept automatically
WordNet
SUMOUse WordNet information to check results and extend concept
Use WordNet information to check results and extend concept
Transform into ontology browser format
Transform into ontology browser format
Distribution of Shu Shi lexiconDistribution of Shu Shi lexicon
98,430 words in NO.1-45 volume
Distribution of concepts in Shu Shi'spoems
plant(1.7393%) animal(1.4294%)
artifact(1.4467%) other(95.3845%)
The distribution of animal, plant, and The distribution of animal, plant, and artifact concepts in Shu Shi’s poems artifact concepts in Shu Shi’s poems
Distribution of plant concepts in ShuShi's poems
body part(3.2126%) fungus(0.0584%)
uncertain(2.278%) alga(0.1168%)moss(1.1682%) reproductive body(0.6425%)
fern(0.2336%) flowering plant (92.2897%)
Distribution of animal concepts in Shu Shi'spoems
arachnid(0.1421%) invertebrate(0.2132%)myriapod(0.0711%) larval(1.2082%)feline(2.7008%) uncertain(0.2843%)canine(2.5586%) carnivore(0.0711%)amphibian(1.6347%) crustacean(1.2793%)insect(8.742%) reptile(5.4726%)hoofed mammal(22.6724%) mammal(1.0661%)worm(0.924%) fish(7.8181%)rodent(2.5586%) bird(37.1002%)aquatic mammal(0.6397%) mollusk(0.7107%)monkey(2.1322%)
Distribution of artifact concepts in ShuShi's poems
transportation device(16.6433%) artifact(32.5140%)fabric(6.1096%) uncertain(6.1798%)icon(0.4916%) mineral(0.9831%)musical instrument(0.6320%) device(10.6039%)engineering component(0.2107%) clothing(20.1545%)currency measure(1.7556%) substance(0.4916%)ordering(0.0702%) weapon(2.5983%)art work(0.4916%) law(0.0702%)
Concepts found in Shu Shi's Concepts found in Shu Shi's but not in Tang 300but not in Tang 300
aquatic mammal (whale 鯨 *)
amphibian (frog 蛙 *、 toad 蟾蜍 *、 salamander鯢 )
mollusk (clam蛤 *、 gastropod螺 *、 oyster蠔、 snail蝸牛 *、 earthworm蚯蚓 *)
crustacean (crab蟹 *、 shrimp蝦 ) Guangdong and Hainan Island
Words stand for multiple concepWords stand for multiple concepts in the Shu Shi Poemsts in the Shu Shi Poems
Source Word SUMO English Example sentence
Shu Shi 杜鵑 bird cuckoo 子規>和孔郎中荊林馬上見寄>春山聞子規。啼鴃>次韻劉景文登介亭>朝先啼鴃起,
flowering plant
azalea 杜鵑>菩提寺南漪堂杜鵑花>南漪杜鵑天下無,
葛 flowering plant
kudzu vine
白葛>和文與可洋川園池三十首:湖橋>白葛烏紗曳履行。
fabric kudzu vine
白葛>病中遊祖塔院>烏紗白葛道衣涼。
What We Learned about Specific What We Learned about Specific OntologyOntology
Constructing ontology from a larger corpus and comparison of two specific ontologiesLocal information can be effectively mappedGlobal information offers deeper insights into the knowledge structure– Human conceptualization of animals and plants has b
een relatively stable. But NOT artifacts.• Regardless of the criteria for classification, genetically deter
mined features (behaviors, appearances etc.) do not vary greatly
• However, human technology is highly fluid. Our conceptualization of artifacts is highly dependent on the development of engineering and by our varying societal needs.
http://bow.sinica.edu.tw/ont/ShuShi_ont.htmlhttp://bow.sinica.edu.tw/ont/ShuShi_ont.html
Example of SUMO conceptExample of SUMO concept
Axiom in SUMOAxiom in SUMO(instance GeorgeBush Human) – GeorgeBush is an instance of the cla
ss of humans
(exists (?X) (parent ?X GeorgeBush)) – there exists something of which George B
ush is the parent
(instance parent BinaryPredicate) – the relation of parent is a binary relation
(domain parent 1 Organism) – the first argument to the parent relation must be an instance of the class Organism
(domain parent 2 Organism) – similarly for the second argument
Example of WordNet lexical relatiExample of WordNet lexical relationon
CuckooCuckoo
cuculiform_birdcuculiform_bird
aniani roadrunnerroadrunner coucalcoucal
birdbird
pheasant_coucalpheasant_coucalCentropus_sinensisCentropus_sinensis
azaleaazalea
rhododendronrhododendron
shrubshrub bushbush
杜鵑DuJuan
SUMO + WordNetSUMO + WordNet
SUMO WordNet
CuckooCuckoo
cuculiform_birdcuculiform_bird
aniani roadrunnerroadrunner coucalcoucal
birdbird
pheasant_coucalpheasant_coucalCentropus_sinensisCentropus_sinensis
azaleaazalea
rhododendronrhododendron
shrubshrub bushbush
organismorganism
animalanimal
vertebratevertebrate
birdbird
plantplant
invertebrateinvertebrate
warm blooded vertebratewarm blooded vertebrate
Flowering plantFlowering plant
mammalmammal
杜鵑DuJuan
Summary and Future WorkSummary and Future Work
Ontologies represent the knowledge structure of a domain or historical period
We have provided an online interface to browse ontologies and lexica
In the future, we will complete the online ontology editor and browser, which will
– Map lexicon, WordNet and SUMO.– Integrate ontologies based on different texts.– Facilitate comparative studies of various domain ont
ologies.
Towards a Workbench for Specific Towards a Workbench for Specific Ontology: BrowserOntology: Browser and Editor and Editor
User loginUser login
Function menu(Personal ontologies list)
Function menu(Personal ontologies list)
Browse an ontologyBrowse an ontology Edit an ontologyEdit an ontology Add an ontologyAdd an ontology
1. SUMO2. SUMO + WordNet +concept map with l
exicon
1. SUMO2. SUMO + WordNet +concept map with l
exicon
LogoutLogout
1. Update lexical concepts
2. Update mapping between WordNet synset and lexicon
3. Edit other information in lexicon
1. Update lexical concepts
2. Update mapping between WordNet synset and lexicon
3. Edit other information in lexicon
Import textImport text Import lexiconImport lexicon
Word segmentationWord segmentation
Match concept and synset automatically
Match concept and synset automatically
1. Suggestion list2. Missing list
1. Suggestion list2. Missing list
Constructing a Specific Constructing a Specific OntologyOntology
Import text, or domain lexicon – Select style of writing– Select category of word list for word segmentation– Select reference ontologies to match SUMO and lexi
con
Information of suggestion list– Candidate synset– Candidate synset synonyms– Explanation of candidate synset– Concept of candidate synset