View
2
Download
0
Category
Preview:
Citation preview
A KNOWLEDGE BASED INLtORMATION STORAGE AND
RETRIEVAL SYSTEM FOR NATURAL LANGUAGES
A Thesis
Presented to
The Faculty of Graduate Studies
of
The University of Guelph
In partial fùlfilment of requirements
for the degree of
Master of Science
September, 1999
O Deyan Xu, 1999
National Library Bibliotheque nationale du Canada
Acquisitions and Acquisitions et Bibliographie Services services bibliographiques
395 Wellington Street 395. me WellUigtori Ottawa ON K1A ON4 OiiawaON KlAONS Canada CaMda
The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sel copies of this thesis in microfonn, paper or electronic formats.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique.
The author retauis ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation.
ABSTRACT
A KNOWLEDGE BASED INFORMATION STORAGE AND RETRIEVAL
SYSTEM FO NATURAL LANGUAGES
Deyan Xu
University of Guelph, 1999
Advisor:
Professor 3. G. Linders
Natural languages are expressive languages used for communication by hurnans.
Ho wever. from an information processing viewpoint, natural languages are basicall y
unstructured which means they are not readity suitable for machine processing.
Conceptual graphs first introduced by John Sowa in 1984, provide a rich knowledge
representation schema intended to structure and encode natural languages.
This thesis is concerned with the building of a knowledge based management system that
is able to effectively store and retrieve knowledge fiom natural languages. It discusses
three points that are essential for natural language processing:
1 . conceptual languages to structure and encode naturai language
2. a repository to store the information that is represented in the conceptual language
3. a matcher that determines whether one statement in the conceptual language is the
sarne as or an instance of a second statement.
Other aspects such indexing and querying techniques are also presented.
DEDICATED TO MY PARENTS AND MY WIFE
Who have made this possible
Acknowledgements
1 would like to thank my supervisor Dr. Linders for his guidance and continued support
throughout my graduate program. Though out the planning and implernentation of the
thesis, Dr. Linders had given me tremendous valuable suggestions and helps. Gratitude is
also extended to Er. Wilson and Dr. Wang, members of my thesis exam cornmittee, for
their constructive criticisms for the thesis, 1 also wish to thank my CO-graduate student
Finnegan Southey for his great support and help.
My sincere appreciation is also extended to my wife and my parents whose support and
confidence have carried me through the hard days and helped me to complete this work
successfiill y.
I also wish to thank my sisters and my parents-in-law for their great support throughout
my university degree career.
Finally. 1 would like to express my gratitude to the University of Guelph and to al1 those
mentioned above.
Table of Contents
Acknowledgements
Table of Contents
List of Figures
C hspter 1 Introduction
1.1 The Motivation
1.2 Analysis of the Problems
1.3 Essential Points in Building a Knowledge Based System
1.4 Overview of the Thesis
Chapter 2 Review of Current Approaches to Information Processing
2.1 Inverted File Methods
2.2 Adaptive Methods
2.3 Similarity Measures, Clustering
2.4 Issues in Representation of Knowledge
2.5 Syntactic Methods
2.6 Semantic Methods
2.7 Frame-Based Methods
2.8 Graph-Based Methods
Chapter 3 Conceptual Grapb as Knowledge Encoding Schema
3.1 Percepts
3 2 Concepts and Conceptual Relations
3.3 Referent
3.4 Conceptual Graph
3.5 Canonical Graphs and Canonical Formation Rules
3.6 Contexts
3 -7 The CGIF Representation of Conceptuai Graphs
3 -8 Why Use Conceptual Graphs
3.9 The Notio' Java Package for Modelling Conceptual Graphs
Chapter 4 An Object-Oriented Knowledge base for Conceptual Graphs
4.1 Why Do We Need a Knowledge base for Conceptual Graphs?
4.2 Relational Database for Conceptual Graphs
4.3 Problems with Relationd Database for Conceptual Graphs
4.4 Object-Oriented Technology ---- A Better Solution
4.4.1 Basic Features of Object-Oriented Modelling(00M)
4.4.2 Abstract Data Typing Maps Concept Type
4.4.3 Obj ect Identity Maps Individual Marker
4.4.4 Mieritance Maps Canonical Formation Rule
4.4.5 Encapsdation Maps Context
4.4.6 Representing Object and Object Class
. . . - 111 -
Chapter 5 Design and Impiementation of the System
5.1 Overview of the System
5.2 Design of Conceptual Graph Object
5.3 Design of Concept Object
5.4 Design of Relation Object
5.5 Indexing
5 -6 Query of Conceptual Graphs
3.7 Graph Matching Mechanism of the System
5.8 Design of the System
Chapter 6 An Example
6.1 The Example
6.2 Building of Concepnial Graphs fiom the Example
6.3 Conceptual Graphs in CGIF Format
6.4 Searching of Conceptual Graphs
6.4.1 Loose Match
6.4.2 Exact Match
6.5 Conclusion
Chaper 7 Summery
7.1 S u m m a r y
7.2 Conclusion
7.3 Future Work
Appendix
A. Object Diagrams of the System
B. Detailed Exarnple Test Script
C. References
List of Figures
Figure 2.1 An Example of Inverted Files
Figure 2.2 ANDIOR tree for text classification
Figure 3.1 A conceptual graph
Figure 3.2 Two canonical graphs
Figure 3 -3 Join of the two canonical graphs
Figure 3.4 The simplification of figure 3.3
Figure 3.5 A conceptual graph containhg a context
Figure 4.1 An example of conceptual graph for storage in a RDB
Figure 4.2 the conceptual graph for a meeting
Figure 4.3 expanded view of the meeting context
Figure 4.4 class definition for Car
Figure 4.5 an instance of car CA 163 1998
Figure 5.1 the Selection Manual
Figure 5.2 the conceptual graph object in memory
Figure 5.3 the concept object in memory
Figure 5.4 the relation graph object in memory
Figure 5.5 a 'BTreeNode' object in disk
Figure 5.6 the system class diagram
Figure 6.1 to Figure 6.7
CHAPTER 1
Introduction
1.1 The Motivation
Today we live in the information era with so many news. conference proceedings and
research articles to be read, not to mention al1 of the information available on the WWW.
No matter how many articles we read and how many web sites we visit. we are still
lacking information that we need. while so much time is wasted on reading unwanted
materials. Without an effective information or knowledge storage and retrieval facility.
we \vil1 be deluged in the information flood. This problem exists today and will get worse
if no new effective information or knowledge retrieval systems are developed. Thus. an
efleciirc. h o wiidge rerrieval sysrern rhar is able tu re rrieve h o wledge j?om natrrral
krngztuge. iike English (ex(. is rrrgently needed.
1.2 Analysis of the Problems
To cope with this problem. many kinds of information management systems and tools
have been developed to store and retrieve information. Howeveq most of them are only
able to store entities. and relations between these entities. or objects and associations
between these objects. The retrieval rnethods they use are basically keyword matching.
Keyword searching and fiequency distributions do not capture the meaning behind the
words. Thus. they inherently are limited in distinguishing relevant and irrelevant
information. Al1 languages have many complicated, unsystematic features that confound
and confuse simple. word-based information retrieval systems- The same object may be
described in more than one way and one word may contain difierent meanings in
different phrases. These make traditional information retrievai systems limited in their
ability to retrieve information. Actudly. very few of them are able to store and retrieve
information based on the meaning of natural languages that express knowledge. Natural
languages are expressive languages that al1 people can understand. Natural languages are
used daily to comrnunicate. acquire information and are also the basis of reasoning.
NaturaI languages are more understandable by humans than any other kind of expression.
Ho~vever. natural languages are basically unstructured which means that they are not
readily suitable for machine processing. hence can be computationally intractable.
These lead to the objective o f rhis thesis. which is concerned with the bzrilding of a
knolc.ledge bmed manugement system rhat is able to eflectively store and retrieve the
represeniation of knowledge zcse conceptual graphs. However. there are some problems
present in the building of such system. Since the information in the news, articles and
WWW has of lirnited structure. this kind of basically unstructured text files has no fixed
keys for searching. Indexing of unstnictured data also presents a major problem. It
requires someone to read the document and provide keys manually. Most importantly it is
necessary to know how to let a computer system know the "meaning" of the text? If a
computer system does not know the "meaning" of the text, the system is basically just
another traditional information retrieval system. It will inhere al1 the drawbacks a typicai
information retrievai system has.
1.3 Essential Points in Building a Knowledge Based System
In order to break through the keyword barrier, a modelling technique that is able to
convert natural language into knowledge representation and structure the language is
needed. In this way a computer based systern can be used to understand the meaning of
the te'cts they process. Secondly. in order to retrieve the information in a knowledge
based systern. a method that is able to determine whether one statement in the knowledge
representation is the sarne as, or an instance of a second statement in the knowledge
representation. must be developed.
The modelling technique presented in this thesis is conceptual graphs. It was introduced
by John Sowa in 1984. Conceptual graph forms a knowledge representation hguage
based on linguistics and semantic networks used in artificial intelligence. A conceptual
graph is a finite. comected bipartite graph where there are two kinds of nodes: "concepts" - and "conceptual relations". The purpose of the conceptual graph is that it forms a bridge
between natural language and a knowledge expression format that is readable by a
computer. With the conceptual graph the literal rneaning of a natural language sentence
cm be mapped to a diagram that is computationally tractable.
The retrieval method presented here is conceptual graph matching. In order for this
matching process to be feasible, the underlying knowledge representation must be
cunonical. This means that al1 sentences with the same basic meaning must be parsed into
the same knowledge representation. In this way. the matcher can determine whether the
conceptual statements are representing the same knowledge.
1.4 Overview of the Thesis
The goal of the thesis is to build a knowledge based management system that is able to
store and retrieve knowledge. The system uses conceptual graphs to encode "knowledge"
as representations of texts. The system also demonstrates a prototype search engine that is
able to perform a very precise searching of knowledge stored in the system.
Chapter 2 is a review of current approaches to information management processing. I t
first introduces several successful traditional information retrieval systems including
inverted file methods. adaptive methods. similarity measures and clustering. then it
presents several knowledge representations such as syntactic methods. semantic methods.
frame- based methods and graph-based methods.
Chapter 3 briefly discusses conceptual graphs as a knowledge encoding schema. It
presents some basic features of conceptual graphs and the advantages of using conceptual
eraphs to mode1 natural languages. C
Chapter 4 first presents a relational mode1 that handles conceptual graphs then discusses
some key points of the object-oriented modelling technique. Issues of applying an object-
oriented technique in the management of conceptual graphs are thoroughly discussed.
The advantages of object-oriented modelling approach to conceptual graphs over the
approach of relational modelling are also presented.
Chapter 5 demonstrates the design and implementation of the conceptual graph
knowledge base system. Some key issues, such as the representation of conceptual graph
objects both in rnemory and on disk. indexing methods, querying of the knowledge base
etc.. are also discussed.
Chapter 6 shows an example and illustrates how the conceptual graph knowledge base
system w-orks. Facts that affect the search result are also discussed. A conclusion based
on the example is given at the end of the chapter.
Chapter 7 is a summary. It contains conclusions and a discussion of future research
direction.
CHAPTER 2
Review of Current Approaches to Information Processing
A few decades ago. before the invention of the computer- the method used to locate
individual texts was to read large collections of newspapers. books. reports and articles,
and make notes in an attempt to try to remember the contents for later retrieval.
Obviously. when the number of collections exceeds the ability of manual methods and
the limits of human memory. this method will not work. With the invention of the
computer. the storage and retrieval of such information was achieved with the help of
various information retrieval systems. In this way, it \vas possible to extend the limits of
information storage and retrievai system by orders of magnitude.
The under lying method that these information retrieval systems use is searching through
the entire collection of information to find words and phrases that identi@ a text
containing the information being searched. The crucial problem of this traditional
information retneval technology is that the system relies solely on the presence or
absence of a word. Ofien the searcher does not know the "rneaning" of the words and
phrases the- are searching. This limits their ability to distinguish relevant and irrelevant
tests. Sorne research efforts have attempted to improve retrieval performance by indexing
on phrases rather than on words. by adding synonym information. and by using frequency
of words. However. the gains fkom these refinements have been limited. The limits of
word-based retrieval systems have been previously expiored in P e s k 851 and wetzler et
al. 841'.
This problem exists because there is no perfect correlation between matching words and
matching meaning. Hence, a possible solution to the problem of improving retrieval
performance is to give up on matching words and to match concepts instead. Further
more, in traditional information retrieval systems, the connection between information
objects are mostly boolean values such as AND, OR. NOT etc. As we know the relations
between information objects are rnuch richer than these boolean values. In knowledge
base systems such as those based on conceptual graphs, the information objects are
modelled as concepts and the connections between these concepts are modelled as
conceptual relations. Thus, a knowledge base system should be able to effectively mode1
the logic expression of natural languages. In order to effectively store and retrieve natural
languages. three things are required:
I . a conceptual language to structure and encode natural languages
2. a knowledge base to store the information that is represented in the conceptual
language
3. a matcher that determines whether one statement in the conceptual language is the
same as or an instance of a second statement in the conceptual language
' M. E. Lesk. "SIGIR 85". ACM SIGIR Forum, Vol. 18, No. 2-4, Fall 1985. pp. 10-15. D. P. Metzler, T. Noreault, L. Richey and B. Heidorn, "Dependency Parsing for Information Retrieval,"
Ressarch and Development in Information Retrieval. C. J. Van Rijsbergen.ed.. Cambridge University Press. July 1984. pp. 3 13-324.
This chapter is divided into two parts. In the first part, we briefly review the traditional
methods of information retrieval systems. In the second part, we present some knowledge
representations of natural languages. With the help of these knowledge encoding schema,
we are able to process naturai ianguage texts intelligently.
2.1 Inverted Fite Methods
The most successful and relevant traditional information retrieval systems are based on
inverted files. The basic idea is to m d e storage space for retrieval tirne. The database is
viewed as a collection of files. An aiphabetized list of words is created. For each
occurrence of a word in a file. an entry is created on the list for that word with a pointer
back to the file. In some systems. the pointer indicates the position in the file where the
w-ord occurs. while in other systems the pointer merely indicates that the word appears in
the file one or more times. Cornmon words such as "the" and "of' are excluded from the
indesing. Figure 2.1 is an example of an inverted file.
At retrieval tirne. the words in the query are looked up in the inverted file. Then the lists
of documents containing the words are intersected to produce the list of texts matching
the query. Since the list lookup c m be done in constant time using hash tables. or
logarithmic time using sorted lists or trees. the time required to process a query depends
mainly on the number of documents containing each search tem.
File 1 Inverted Index File#, Position
The quick brown fox jumped over the Brown ( 1 3 ) ( 2 , 2 lazy dog. code ( 3 , 1 5 )
dog ( 179) ( 2 . 7 ) evaluation ( 3.2 )
File 2 fox ( 1 . 4 ) ( 2.4) My brown Volkswagen Fox is no dog improve ( 3 . 4 ) when it comes to performance. jwped ( 1 . 5 )
~ W P S ( 3-12 ) lazv ( 1 - 8 ) ( 3 - 1 )
File 3 pe~onnance ( 2. 12 ) ( 3 . 6 ) Lazy evaluation can improve system quick ( 1 . 2 ) performance. and reduce the number reduce ( 3 . 8 ) of jumps in your code. volkswagen ( 2 ,3 )
Figure 2.1 An Example of Inverted Files
Thzre are some variations of this method. One variation is to add the notion of proximity.
usually implemented as an aaacency operator. or a wirhin operator. For systems that
store the position of each occurrence of the index terms. the adjacency operator can be
implemented by checking that the position of the second term is esactly one more than
the position of the first term. For example. if we have query:
[ brown ADJ fox ]
File 1 in the Figure 2.1 will pass the adjacency test, but file 2 fails since the words are at
positions 2 and 4. The same idea can be used to implement a WITHIN operator that is
true only if the first and second terms are within some specified numbers of words of
each other.
The other variation is to add "boolean keyword queries". That means that the set of te-
matching each term in the query are combined using the set operations of intersection,
union. and complementation to produce a final set of retrieved documents. For exarnple.
if we perfonn the query:
fiump$ or perform$ AND (NOT code) 1
on the three files in Figure 2.1. The files I and 2 will be retrieved. but not 3. The "$"
operator indicating that any word with a given prefix should be matched.
The main advantage of this method is fast retrieval and easy implementation. Thus. the
inverted tïle methods are used in rnany commercial systems. such as Stairs. Dialog, Le-ris
c m / Sratus.
For detailed discussion of this technique. please see Salton's Modern Information
rerriewl [Salton & McGill 831.
2.2 Adaptive Methods
Another word-based technique is called "adaptive methods". The main idea for this
method is afier an initial query the user selects the relevant articles from the retneved set
and similarity measures are recalculated. The revised measures are used to query the
database again. resulting in a new set of documents, and the process continues until the
search converges or the user is satisfied with the documents retrieved up to that point.
The query reformulation process is thus based on the following two operations:
1. Terrns that occur in documents previously identified as relevant by the user are added
to the original query vectors. or altematively the weight of such terms is increased by
an appropriate factor in constnicting the new query statements.
2. Ai the sarne time. terms occurring in documents previously identified as irrelevant by
the users are deleted fiom the original query statements. or the weight of such terms is
appropriately reduced.
The effect of such a query alteration process is to move the query in the direction of the
relevant items and away from the irrelevant ones. Thus the user is able to retrieve more
wanted and fewer un\vanted items in later searches.
2.3 Similarity Measures, Clustering
This method uses a similaril measure based on word frequency to determine whether a
document is similar to other documents known to be relevant to the user's query. To find
the degree of similarity between two documents, the method is to:
1. deterrnine frequencies of each index term in the document collection;
2. for any wo documents, view the frequency lists for those documents as vectors in
rnultidimensional space. and calculate the cosine of the angle between the two
vectors.
The problem with this approach is that a typical user query will not have enough words to
eive a statistically meaningfd fiequency vector. Thus. this method only works for C
measuring the diEerences between two documents. The Smart information retrieval
system from Corneil [Salton & McGill 831 uses this method. It incorporates some suffix
removal rules to calculate frequencies based on the stem rather than the whole word.
Word frequency systems have been proposed as aïternatives to boolean keyword search,
and in some cases have demonstrated improved recall and precision performance,
2.4 Issues in Representation of Knowledge
The existence of the NOT operator in boolean keyword query is a clue to problems with
keyword based queries --- it is a partially successfiil anempt to deal with output overload.
For esample. to find texts about fighting and war. but ignore texts about crime. one might
try the folIowing query:
[ \var AND (NOT dmgs )]
to avoid seeing stories like:
Oirr governmenl vorvs lo fighl a rvar againsi drzrgs.
But then. the following would be missed:
Many people died during the second world war because of the Iack of drugs.
The complexity of natural language, including ambiguity, synonymy and metaphor
combine to reduce the effectiveness of today's keyword-based retrieval systems. Thus, in
order to effectively retrieve information in natural languages, a knowledge representation
msthod mut be used. As Coldstein and papert2 said in the article, 'ilrtificial Infelligence.
Lunguage, and the Strtdy of Kttowledge. "
The frtndumental d~flclrlties facing researchers in the jield roday are not
limitations drre ro hardware, bur rather qrresrions aborrt how to represenf large
arnotrnls of knoivledge in ivays rhat srill allow the eflective rrse of individual faczs.
No consensus has yet been reached on the best method for representing knowledge, and
research is continuincg in order to develop more efficient ways to store information to
conserve rnemory and processing. In the followïng sections. we \vil1 review some
kno~vledge representation methods that are used to extract knowledge from natural
language.
Ira Dotdstein and Seymour Papert. "Anificial Intelligence. Lanpuage. and the Study of Knowledge."
Cognitive Science. Vol. 1. No. 1 ( 1977).
2.5 Syntactic Methods
Synta. analysis focuses on the relationship between linguistic expressions and is
concemed with the rules (grammars) for the interaction between various natural language
units like words. phrase etc. Early artificial intelligence efforts to produce "question
answering" systems used some knowledge of English syntax and semantics to retrieve
information from databases in response to queries in natural l a n p g e -
Raphael's SIR program for "Semantic Information Retrieval"[Raphael 681 used an
interna1 mode1 based on words and word associations linked in a "general manner so that
no panicular relations are more significant than others." Relations used were:
Set-inclusion
Part-kvhole relationship
Numeric quantity associated with the part-whole relation
Set rnembership
Left-to-right spatial relations
Ownership
One advantage of using a syntactic method is that we can preserve the syntactic relation
bet~veen words. For exarnple. if we wish to retrieve documents about cornputer science,
with previous mentioned "invened" method. we might try this query:
[ cornput$ AND science ]
which would match at least the following phrases:
rhe compter science department
the discipline of comptrting science
the science of compzrting
but ~vould also match across phrases. such as:
the use of comptrters in malerials science
Just using the query:
[ cornput$ ADJ science 1
\vil1 exclude materials science. but miss science of compzrting. What we reaily want to
say is that the word "compute" must modifi the word "science". so that we restrict the
sciences the query should match.
~ a ~ a n ' has investigated the use of syntactic information to identiQ phrases for indexing.
He cornpared statistically denved indexing phrases with phrases derived using the PEG
English grammar and the PLNLP programming language [Jensen 861. He concluded that
aIthough phases selected by using fiequency and CO-occurrence methods did not
consistently improve retrieval performance. syntau-based selection methods can generate
more usefùl phrases that do improve retrieval performance. by improving recall precision
a small arnount.
- -
' J. L. Fagan. "Autornatic Phrase Indexing For Document Retrieval: An Examination o f Syntactic and Non- slmactic rnethods." Proceedings o f the Tenth Annual Intemationai ACMSIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery. New York. 1987. pp. 91-101.
J. L. Fagan. "Automatic Phrase lndexing For Document Retrieval: A Cornparison of Syntactic and Non- Syntactic Methods. PhD dissertation. Cornell. Sepmieber 1987.
2.6 Semantic Methods
The idea of semantic rules is to include "semantic markers" in the definitions of each
sense of the words in the dictionary. The semantic markers attached to a word would be
used to restrict the ways in which it could combine with other words. The "RUBRIC"
retrieval system is a "semantics only" method for deciding whether a document is
relevant to a given query.4 RUBRiC uses document/query pairs. The approach is to use
rules that provide evidence for relevance or irrelevance. Such a system can deal with
constructions that contUse syntax-oniy systems.
For example. consider a query about terrorists where one does not want information
about \var to be retrieved. Figure 2.2 shows an AND/OR tree for a rule one might find in
a semantics-only classification system for the concept of "terrorist". The branches of the
tree are Iabeled with certainty factors between O and 1 that indicate how strongly the sub-
trees are related to the root concept. These certainty factors are assigned by the system
designer. Using the figure shown. if a text contains the word "terrorist" it gets a 0.8 score
for being about terrorists. For the word "hijack" it gets a 0.6. If it contained both words.
the score would be 0.92 = 0.8 + 0.6 - 0.8 x 0.6.
-
' R. M. Tong. L. A. Appelbaum. V. N. Askman and 1. F. Cunningham. "RUBRIC III : An Object-Onented Expert System for Information Retrieval." Second Annual Conference on Expert Systems In Governrnent, McLran. VA. October 1986.
Semantic marken have proved to be very useful in natural language processing. n i e y
have been widely applied and. in many cases, they select appropriate word senses
successfully.
letter homemade
l
Figure 2.2 AND/OR tree for text classification
2.7 Frame-Based Methods
The notion of frame [Minsky 753 ' is a method for understanding vision. natural langage
and other areas o f AI. Frames provide a convenient structure for representing objects that
are typical to a given situation such as stereotypes. The basic characteristic of a h e is
' M. Minsky. "A Frarnework for Representing Knowledge." in The Psychology of Computer Vision. P. Winston. ed.. McGnw-Hill, New York, 1975.
that it represents related knowledge about a narrow subject. which has much default
knowledge. In frame theory. the knowledge base is decomposed into pieces of
knowledge, which are the data structures that represent stereotypical situations. The basic
cornponents of a frame-based representation facility include:
1. Structure. The frarne can capture the basic organizationd principles. It contains the
hierarchies of objects (components) and the attributes of abjects- which can be
inherited from other fiames. It incorporates sets of attribute descriptions called slots-
These structures c m be used to uni@ and denote a loose collection of objects, related
ideas, concepts, facts and experiences.
2. Processing Feature. When we process the natural language text we "understand it" by
f i l h g in the appropriate slots in the fiames. The slot provides space for computation.
I t can contain a default value. a restriction of value to be added. a procedure activated
to compute a needed value. or a rule activated when certain conditions are met.
Properties. relationships, and events c m be fitted into slots of an object from
conditions and situations: restrictions can be attached to the slots to trigger a sequence
of actions by the prograrn.
3. Reasoning Services. The frarne-based representation c m perfonn inferences as part of
its assertion and retrieval operation.
One of the more widely used frarne-based systems is schank's6 conceprual dependency
rheory. ofien abbreviated "CD". A conceptual dependency graph is a relation between
%. C . Schank. N. M. Goldman, C. J. Rieger and C. K. Riesbeck, Conceptua1 information Processing, North-Holland. Amsterdan, Fundamental Studies in Cornputer Science. Vol. 3. 1975.
18
primitive objects that are either actions, States. or noun-like "picture producers." CD
theory7 is based on two principles:
1. CD representations should allow effective inference, by associating a fixed set of
inference rules with each CD primitive;
2. CD representations should be independent of any particular human language.
To illustrate the first principle, if given the following sentence:
JohnJIe\ïfi-orn Toronro to Shanghai
One would want to be able to tell fiom the representation that before the event John was
in Toronto and afienvard he \vas in Shanghai. and that the same was true of the airplane.
To fulfill the second principle. Schank outlined a short list of primitive actions and
showed how to represent sentences as graphs built from theses primitives. For example.
the CD primitive ptrans. which stands for "physical transfer." indicates motion of a
physical object fiom one place to another. Using the representation of case frarne. the
various components of the CD graph are related using the following cases or "slots":
ACTOR the initiating agent of an action
OBJECT the thing affected by the action
INST the instrument or means by the action is effected
FROM the source of the action
DEST the destination of the action
- - - -
' E. Rich. Arrificial Intelligence. McGraw-Hill, New York. McGraw-Hill Series in Artificial Intelligence, 1983.
So the above sample sentence would be represented by the following CD graph:
r' Toronto 1 D
john <-> ptrans 4- airplane +
f O 1-w Shanghai
john
Which literally means "John physical ly transferred himself from Toronto to Shanghai
using an airplane as conveyance." (where ¢3 denotes the relation between actor and
action: + O indicates the object of an action; + 1 indicates the instrumental
conceptualization for an action; + D and < indicates the direction of an object
within an action.)
No one actually draws such graphs any more. since we can represent the same CD graph
as a case frarne as follows:
(cd (actor (john) )
(¢C' (ptrans))
(actor (john) )
(inst (airplane) )
(from (Toronto) )
(dest (Shanghai) ) )
The details are beyond the scope of this thesis. For more information of CD theory and
frame-based method. please refer to: m. C . Schank. N. M. Goidman. C. J. Rieger and C.
K. Riesbeck, Conceptual Information Processing, Fundamental Studies in Cornputer
Science. Elevsis Press] .
2.7 Graph-Based Metbods
Another approach to representing knowledge is through use of graph structures. There are
many varieties of graph-base representations. such as semantic nets. conceptual graphs.
Semantic nets were first developed for AI as a way of representing human memory and
language understanding [Quillian 681 '. Quillian used semantic nets to analyze the
meanincg of w-ords in sentences. Since then- semantic nets have been applied to many
problsms involving knowledge representation.
The structure of a semantic net is shown graphically in terms of nodes and the arcs
connecting them. Nodes are ofien referred to as objects and the arcs as links or edges.
The links of a semantic net are used to express relationships. Nodes are generally used to
represent physical objects. concepts, or situations. For detailed discuss of semantic nets.
refer to [QuiIlian 681.
M. R. Quillian. "Semantic Memory". Semantic Information Processing. ed. By Marvin Minsky. The MIT Press, pp227-270. 1968.
2 1
Although semantic nets c m be very usefui in representing knowledge, they have
limitations such as the lack of link name standards. This makes it difficult to understand
what the net is really designed for and whether it was designed in a consistent mariner.
For a semantic net to represent definitive knowledge. that is. knowledge that can be
defined. the link and node narnes must be rigorously defined.
The last. but not least. knowledge representation method we are going to present is the
Conceptual Graphs [Sowa 841 9. This knowledge representation method is widely use in
the natural language encoding and it also has many advantages in the language encoding.
We will use this method to extract meanings of natural Ianguage texts in this thesis and
w i l l discuss it in more detail in the following chapter.
J * F. Sowa, Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley. Reading. MA. 1984.
CHAPTER 3
Conceptual Graphs as Knowledge Encoding Schema
Conceptual structures. as developed by John Sowa (1984). provides a rich knowledge
representation schema intended to incorporate many concepts found in natural and formal
ianguages. A conceptual graph is an abstract representation for logic with nodes cailed
concepts and conceptual relations, linked together by arcs. The direction of the arcs
determines the relations between two objects they connect. Within the graphs. concept
nodes represent entities. attributes, States. and events. while relation nodes show how the
concepts are intercomected. We will explain some of the notations that are used in
conceptual graph as following.
3.1 Percepts
Perception is the process of building a working model that represents and interprets
sensory input. The rnodel has two components: a sensory part fonned from a mosaic of
percepts. each of which matches some aspect of the input: and a more abstract part called
a conceptual graph. which describes how the percepts are combined together to form a
mosaic of percepts. Percepts are fragments of images that fit together Iike the pieces of a
. . jigsaw puzzle. A conceptual graph describes the way percepts are assembled. Conceptual
relations specifj the role that each percept plays: one percept may match a part of an icon
to the right or left of another percept; a percept for a color may be combined with a
percept of a shape to f o m a graph that represents a colored shape.
3.2 Concepts and Conceptual Relations
The term "concept" is defuied as: "a node in a conceptual graph that refers to an entity, a
set of entities. or a range of entities". A concept is a basic unit for representing an entity
or statr. Every concept has a concept type t and a referent r. The concept types are
organized in a hierarchy according to levels of generality. The referent is basically the
entity or entities that a concept references. For example. "person". "country". "city" etc.
are concept types. "John". "Canada", "Guelph" are referents. I f we map concepts and
referents. we have specific concepts: Iperson: 'John1], [country: 'Canada'] and [city:
'Guelph'].
A conceptual relation always connects two concepts. it shows that some relationship
holds betw-een their referents. For example.
[PERSON: 'John1] t (AGNI') t WADING] i (OBJECT) i PEWSPAPER:
'Toronto Star'].
This linear expression of conceptual graph represents the sentence John is reading the
"Toronro Srur". The relations in this conceptual graph are AGNT and OBJECT. The
AGNT relation shows that John is the agent of READING, OBJECT shows that the
"Toronto Star" is the object of reading.
3.3 Referent
The concept bos is divided in two parts: a type field, on the lefi and a referent field, on
the right. The concept [PERSON: 'John'] is an individual concept with type PERSON and
referent John. The concept [READiNG] is called a generic concept. because it does not
identi- a particular individual; both specie oniy the type. not the individuai. The
referent r of a concept c is a pair <q, dB, where q is called the quantifier of cl and d is
called the designator of c.
3.1 Conceptual Graph
A conceptual graph is a bipartite graph that has two kinds of nodes called concepts and
conceptual relations. A conceptual relation link specifies the role that each percept plays.
Figure 3.1 shows a conceptual graph that describes the sentence John is reading the
"Tor-onro Star" ivirh u microfiche.
Microfiche Reader
Person: John
Figure 3.1 is the graphic representation of a conceptual graph. It uses boxes to represent
concepts and circles to represent conceptual relations. The advantage of this
representation is readability. but it is hard to type and difficult for computer to process
and also takes a lot of spaces. So more ofien. we use the linear notation which uses
square brackets for the concepts and rounded parentheses for the conceptual relation.
k
Reading . Ne wspaper : Toronto Star
i
A conceptual graph "g" with "n" conceptual relations can be constnicted frorn n star
rrraphs. one for each conceptual relation in g. Since Figure 3.1 has three conceptual C
relations. it could be constructed fiom the following three star graphs, which are
represented in the linear form (LF):
[Person: John] t (Agent) t peading]
[Reading] + (Object) i [Newspaper: Toronto Star]
[Reading] + (Inst) + [microfiche Reader].
These three star graphs constitute a discomected conceptual graph. To forrn a c o ~ e c t e d
CG. they couid be joined by overlaying the three identical concepts of type [Reading] to
form the conceptual graph of Figure 3.1.
[Reading]---
(Agent) 3 [Person: John]
(O bject ) + [Newspaper: Toronto Star]
(Inst) + [Microfiche Reader].
The arrows on the arcs indicate the expected direction for reading the graph. For
conceptual relations whose narnes are nouns or abbreviations of nouns. the following
conventions are commonly used:
1. When a graph is read in the direction of the arrows. the arc pointing towards the circle
is read as "hm a''. and the one pointing away fiom the circie is read "which is".
2. When a graph is read against the flow of the arrows. the arc pointing away fiom the
circle is read "is a", and the one pointing towards the circle is read "of '-
So according to this rule, the above conceptual p p h can be read:
1. read in the direction of arrow:
Reading has an agent. which is John;
Reading has an object. which is "Toronto Star";
Reading has an instrument. which is Microfiche Reader.
2. read against the flow of the arrows:
John is an agent of Reading;
"Toronto Star" is an object of Reading;
Microfiche Reader is an instrument of Reading.
Conceptual graphs are independent of the surface language. Thus no matter how the
sentence is phrased or what language is used, it should be represented by the sarne
conceptual graph. For instance. the conceptual graph show in figure 3.1 also represents
the sentence: John is rdng a itlicrojiche Reader to read the "Toronro Slar".
3.5 Canonical Graphs and Canoaical Formation Rules
A conceptual graph is a combination of concept nodes and relation nodes where every arc
of every conceptua1 relation is Iinked to a concept. But not al1 such combinations make
sense. Some of them include absurd combinations like the following:
[CAR] i (STATE) + [LAUGHTNG].
This is an odd. unusual? or perhaps meaningless graph that may be read "A car has a state
of laughing". To rule out such sentences, Katz and Fodor(1963) developed a theory of
semant ics that imposes selectional constraints on permissible combinations of words.
To distinguish the meaningfiil graphs that represent real or possible situations in the
ssternal world. certain graphs are declared to be canonical. Through experience. each
person develops a world view represented in canonical graphs. One source of the graphs
is obsemation: the assembler may combine certain concepts in perception- Since that
combination is true of a reaI situation. it must be canonical. Another source is the
derivation of new canonical graphs from other canonical graphs by formation rzrles. The
formation rules are the rules of copy. restrict. unresû-icted. join. simpli@ and detach.
The join mle merges identical concepts. Two graphs may be joined by overlaying one
craph on top of the other, so that the two identical concepts merge into a single concept. C
As a result. dl the conceptual relations that had been linked to either concept are linked
to the single merged concept.
When w o concepts are joined. some relations in the resulting graph may become
redundant. One of each pair of duplicates can then be deleted by the rule of
simplification: when two relations of the same type are linked to the same concepts in the
same order. they assert the same information; one of them may therefore be erased-
For example. if we have the following two canonical graphs:
MAN EAT
Figure 3.2 Two canonical graphs
FAST
I
Figure 3.2 shows two canonical graphs. The tirst one may be read "A man is eating fast";
and the second "A person. John. is eating apple". Then we can use the formation rule to
join the two graphs as show in figure 3.3:
PERSON: John EAT
MAN: John FAST
OBJEC ~ G I
Figure 3.3 Join of the two canonical graphs in figure 3.2
-4fter simplifj4ng the above joined graphs according to the simplification rules. we now
have a new canonical graph:
&pi APPLE
Figure 3.4 The simplification of figure 3.3
J
FAST . r
MAN: John EAT
3.6 Contexts
A contelct is a concept that contains one or more nested conceptual graphs that describes
the referent. The concept of type "Situation" is an example of a context. Figure 3.5 shows
the conceptual graph that expresses the sentence " 1 suggest that you take the exam.".
Proposition:
L
Person: You T k e Esam. I h
Figure 3.5 A conceptual graph containing a context
3.7 The CGIF Represeatation of Conceptual Graphs
A conceptual graph can be represented in several ways. The above mentioned
representations are good for humans in that they are readable. However. they are not
computer readable. Another representation of conceptual graphs is introduced in order to
solve this problem. that is the CGIF.
CGIF. which stands for the Conceptuai Graph Interchange Form. is a representation for
conceptual graphs intended for transrnitting conceptual graphs across networks and
betkveen IT systems that use different internai representations. Al1 features in the formal
CG definitions are represented in CGIF. and the comment fields permit informal
extensions. such as formatting information for graphical displays. The primary design
goal for CGIF is high-speed generation. transmission. and parsing of conceptual graphs
sent between computer systems. The CGlF syntax ensures that al1 necessary syntactic and
semantic information about a symbol is available before the symboi is used: therefore. al1
translations can be performed during a single pass through the input Stream. When a
conceptual graph is represented in CGIF. the grarnmar rules permit several different
options. al1 of which are logically equivalent. For exarnple. the above CG in Figure 3.1
John is reuding rhe "Toronto Star" wiîh a microfiche. can be represented in CGIF as:
[Reading *x](Agnt ?x[f erson 'John'])(Object ?x mewspaper 'Toronto Star']) (Inst
[Microfiche Reader])
3.8 Why Use Concephial Graphs
In the previous sections, we bnefly overviewed some of the main aspects of the
conceptual graphs. As we can see, the concepnial graph is a knowledge rich
representation of natural language. It has the expressive power that is sufficient to encode
any fact or concept that is encodable in any other formal, symbolic systems. This means
that conceptual graphs may serve as a common medium of representation for diverse
kinds of knowledge. The conceptual structures that encode information may themselves
serve as a guide for idormation retrieval. From a given node. nodes representing related
entities are found simply by following pointers from the node to its neighbors. In this
kvay. a conceptual graph provides its own meaning-bearing indexing system. that is the
indexes are no longer based on key words but based on concepts or relations between
concept nodes. Labels on arcs and nodes are meaningfûl to graph-manipulating
procedures. they provide guidance to help traverse the conceptual graph in search of
information relevant to a task.
The conceptual graph's eâse and expressiveness for general and specitic concepts is a
major attraction over other formalisms. such as rules and Iogic for building linowledge
bases. Several efforts have demonstrated the power of conceptual graphs to perform
natural language processing and build domain-focused knowledge based systems
[Fargues et al. 86: Sowa & Way 86; Garner & Tsui 86; Morton & Baldwin 851. A
conceptual graph-based knowledge system is believed to be a flexible and powerfûl
approach in building a foundational knowledge base of general concepts [Berg-Cross &
Pice 891. As conceptual graphs have various means of representations to meet various
need. it is convenient both for human reading and for cornputer processing. For these
reasons. we chose conceptual graphs as o w natural language encoding schema and build
a knowledgr base system that is able to store and retrieve conceptual graphs.
3.9 The 'Notio' Java Package for Modelling Conceptual Graphs
The Notio' is a Java class Iibrary for constnicting and manipulating conceptual graphs. It
is designed and implemented by Finnegan ou the^" at the University of Guelph.
Currently, the package provides facilities for operations on single graph or pairs of
eraphs. It provides support for the management of individual graphs. not large groups of u
craphs. Most of its operations act on only one or hvo graphs. As such. it is ideal as a t
b a i s for CG editors or as a representation for data retrieved fiom a large-scaie system.
This thesis wiil make use of the "Notio" in the building of the conceptual graph
knowledge base system.
IO Finnegan southey. lCCS 1999 "Notio-A Java API for developing CG toots". University of Guelph, Computing and information Science, Guelph ON. Canada
35
CHAPTER 4
An Object-Oriented Knowledge base for Conceptual Graphs
In chapter 3. we discussed the basic concepts about conceptual graphs. Conceptual p p h s
are a graphic representation of logic that is used to structure knowledge embedded in
natunl languages so that the knowledge can be processed by a cornputer. With the help
of this knowledge encoding system, we are able to extract and encode meanings from
natural language text. Having this knowledge representation tool is not enough. We still
need a knowledge base to store and manage the conceptual graphs.
4.1 Why Do We Need a Knowledge base for Conceptual Graphs?
It is obvious. that for tinancial institutes like banks. they need a database system to store
information such as account number, encrypted password, customer name. address.
telephone number and account balance, etc. For retrieval purposes. there is also need to
build indexes that are based on account nurnber or name and address combined. A
traditional relational database system can do a very good job for this type of data
management.
Ho\vever. for research purposes, it is desirable to have a database (i.e. repositories) to
store research articles that are written in natural language texts. Can we also use a
traditional relational database to do the job? The answer is both Yes and No. For the
answer Yes, we may employ the methods discussed in chapter two. such as inverted file
methods. adaptive methods and similarity measures. For the answer No, it is basically
because the retrieval mechanisms for these methods are al1 basically "key word"
matching. Thus the retrieval result is usually unsatisfactory; either irrelevant information
is retrieved or useful and relevant information is missed.
To overcome the "keqword barrier", we employ knowledge representations. such as
conceptual graphs. to mode1 and structure natural language texts. Thus. we are able to
extract and encode meaning form the texts and make intelligent search possible. So
instead of storing natural language texts directly in the database. we may store conceptual
eraphs that contain the meaning of the texts. Thus we need a knorvledge buse systern ro - store rhe conceptml gmphs and an indexing mefhod to inde-r the conceptiral graphs for
iriter eficienf retrieval.
Can a normal relational database do the job? Some research has been done on how to
apply a normal relational database to conceptual graphs. Brian Bowen and Pave1
~ o c u r a " have showed that conceptual graphs can be stored in a relational database and
managed by the relational database system. I t is worth having a look at their research
since we are going to do similar work but with a different approach. The following are
synopsis of their work.
I l B.A.Bowen and PKocura. "Irnplementing Conceptual Graphs in a RDBMS", ICCS 19%. Loughborough University, Department of Computer Studies. UK.
4.2 Relational Data base for Conceptual Graphs
The way in which conceptual graphs are physicaily stored is of obvious importance in
terms of their efficiency of retrieval, as well as of the efficiency of operations upon them.
Conceptual graphs can consist of many concepts and relations and they are variable in
size. which creates problems when attempting to store them in fixed-field relational
tables. To solve the problems, Brian Bowen and Pavei Kocura first fragment the graph
into concepts and relations. then store the concepts in a table and the relations in another
table.
Al1 graph-string tables in their system are bi-tables. and have a number of domains in
common. In the concept tables, the domains common to al1 uses are:
TYPE A type label;
INDVMARK A marker. either generic or individual:
In the relation table. they use FROMPOS (FROM POSITION) and TOPOS (TO
POSITION) to represent the direction of the arrow in the conceptual graph. The common
domains in the relation table are:
FROMPOS The marker in the From position:
RELATION A marker. either generic or individual:
TOPOS The rnarker in the To position;
The above are the core domains that will allow us to store simple dyadically connected
graphs. The following is an example that shows how their methods store a conceptual
graph in a relational database. Consider the conceptual graph in figure 4.1 :
INTELLIGENT k
WOMAN: Peua
BEAUTIFUL
PERSON: Clovis
Figure 4.1 an example o f conceptual graph for storage in a RDB
A
4 MAN: Brian
which would be fragrnented into two relational table as show in the following:
/ Indv Mark
MAN Brian
l
Petra
Clovis
1 INTELLIGENT
From
1 Brian
Petra 1_1.
Relation
CHILD OF Clovis
CHILD OF Clovis
ATTR
ATTR
The numbered asterisks represent intemal markers assigned by the system in order to
ensure that the graph is logically connected when hgmented.
Those core domains constitute the core graph's database, which is used to store general
information: most of the tables in their system also have extra domains that allow them to
store extra information that is necessary for the data that is being stored. The extra
information may include TYPEDEFN and RELDEFN which store type and relation
definitions respectively. Each tuple has an extra domain to record the definition that the
entry is part of.
4.3 Problems with Relational Database for Conceptual Graphs
Brian Bowen and Pave1 Kocura do proposed a way to solve some of the problems when
t ~ i n g to apply relational database technology to conceptual graphs. However. they do not
solve al1 the problems. By nature. a normal relational database has difficulty in
modelling the data domains of conceptual graphs. Such dornains share some common
features such as:
1. they contain complex objects that are very dificult to represent in a relational
database(RDB):
2. they require more manipulative power than the relational mode1 can provide. The
DMLIdatabase management language) is primarily concerned with efficient querying
and maintenance of the database. but has little expressive power: the standard
operations of the relational mode1 just don't have the expressive power required; e.g.
the RDMS is unable to perform matchs based on graphs, concepts and relations.
. such domains ofien require the modelling of cornplicated interrelationships and
constraints associated with the objects k i n g modelled; the constra.int mechanisms of
RDBs are completely unable to cope with such requirements.
Most importantly. the method they proposed is not able to store al1 kinds of conceptual
graphs. It is restricted to simple dyadic comected graphs. -
4.4 Object-Oriented Technology - A Better SoIution
Although relational databases can be applied to conceptual graphs. they do not naturally
fit. and some restrictions have to be imposed. Now we nrni to the other alternative --- the
object-oriented technique. 1s the object-oriented knowledge base better for conceptual
graphs? The answer is "Yes". In the following few sections, we will show the rasons
w-hy this approach is better.
4.4.1 Basic Features of Object-Oriented Modelling(00M)
There are four basic features in object-onented modelling: Abstract Data Typing. Object
Identity . Inheritance. and Encapsulation. We present these four features below as we
will see later these features are very similar to the features of CG'S. Thus, OOM can be
applied very nicely and naturally to CG'S issues.
1. dbsrracr Dam TypinglADV models various classes in object-oriented knowledge
base applications. where each class instance has a protocol: a set of messages to
which it can respond. With abstract data types there is a clear separation between the
external interface of a data type and its intemal implementation. The implementation
of an abstract data type is hidden. Hence. alternative implementations could be used
for the sarne abstract data type without changing its interface. This provides a rich
mechanism for recording design information for related data and behavior. W s can
use the notion of Abstract Data Type to encapsulate data and behavior so that we
export only external services while hiding the implementation details for these
senices. Abstract data typing allows the construction of complex software systems
through reusable components - - the classes. Thus. through abstract data typing.
programrning becomes modularized and extendible. Abstract data typing supports a
much more natural representation of real-world problems: the dominant components
are the objects rather than the procedures. Abstract data typing allows objects of the
same stnicture and behavior to share representation (instance variables) and code
(methods).
Abstract data typing is a useful feature when we use object oriented method to mode1
conceptual graphs. since conceptual graphs have concept types. with OODB1s ADT
feature. any concept types can be modelled with a corresponding ADT.
2. Objecr Idenriry is the property of an object that distinguishes each object from al1
others. With object identity, objects c m contain or refer to other objects. Object
identity allows the same object to be referenced through attributes of many other
objects. This is cailed referential sharing. Object identity is the property of an object
that distinguishes it from al1 other objects. In programming languages identity is
usuall y realized through memory addresses. In databases. identity is realized through
identifier keys, User-specified names are used in both languages and databases to
give unique names to objects. Each of these schemes compromises identity. Object
identity clarifies. enhances, and extends the notions of pointers in conventional
programming ianguages. foreign keys in databases. and file names in operating
systems. Using object identity. programmers c m dynamically consmct arbitrary
graph-stnictured composite or complex objects and objects that are constructed fiom
sub-objects. Objects can be created and disposed of at run time. In some cases objects
can even become persistent and be reaccessed in subsequent programs.
lnheritance is a technique that lets us speciS, some parts of a system incrernentally. It
means subclasses can inherit the instance variables and methods of super classes. It
captures an "is a" relationship. Through inheritance new software modules (e-g.
classes) can be built on top of an existing hierarchy of modules. Inheriting behavior
enables code sharing and reusability. Most existing object-oriented systems allow
developers to extend an application by specializing existing components (in most
cases. classes) of their application. Specialization is a top-down approach to the
development of object-oriented database applications. Generalization is the
complement of specialization. It uses a bottom-up approach by creating classes that
are generalizations (or super-classes) of existing subclasses. Generalization is a
bottom-up approach for object-orîented database development. There are three facets
of inheri tance that c haractenze most of the approac hes used by object-oriented
languages:
a) Visibility of inherited variables and methods: some object-oriented languages allow
the direct manipulation of instance variables. Other languages distuiguish between
public and private instance variables. With inhentance. there is a third alternative
called subclass-visi ble.
b) Method Ovemdingj Subclass c m ovemde an inherited method. In other words, a
method called "MW in class "C" can be ovemdden by a totally different method; also
called "Mn. in a subclass of "C".
C ) Multiple inheritance: multiple inheritance is a mechanism that allows a class to inherit
from more than one immediate parent. The class inheritance hierarchy for multiple
inheritance for which a class can have more than one immediate predecessor becomes
a directed acyclic graph (compare with single inheritance for which the class
inheritance hierarchy is a tree).
Inheritance is also a usehl feature when we use object oriented methods to model
conceptual graphs. In conceptual graphs. al1 concept types are organized in a
hierarchy tree according to Ievels of generality. The inheritance feature can be used to
model this concept type hierarchy.
4 . Encapsrdation protects object integrity because encapsdation limits access to only
those services explicitly exported for an object. Encapsulation utilizes implementatîon
independence to hide the implementation for an object. Encapsulation thus ailows
implementation detaits to change without requiring any change to programs that
access obj ects through the exported services. This sarne principle also allows objects
within an object set to have different implementations. This leads to appropriate uses
for overriding. overloading dy namic binding and pol ymorphism in ïnheritance
hierarchies, in which more specialized objects can have more eficient
impkmentations for operations. Encapsulation also serves as interface. which lets the
encapsulated object control which services are available and when they are available.
These fundamental features of object-oriented modelling technique are coincidental 1 y
mapped directly to the features of conceptual graphs very well. We discuss these
similarities in the foIlowing:
4.4.2 Abstract Data Typing Maps Concept Type
The first coïncidence is that the abstract duta ryping in object-onented modelling is
naturally mapped to the concept type of conceptual graphs. For example, in order to
mode1 the concept type "person". we may create an abstract data type -- class "person".
In that class. we may include some attributes that describe "person". Such attributes may
include: narne. gender. age. height. weight. etc.
1.4.3 Object Identity Maps Individual Marker
The second coincidence is that the objecr identity in object-oriented modelling can be
exactly mapped to the individual marker. They both are unique identifiers of an
individual or an object in a system. They are both generated intemally by the system, and
they are not usable outside the system. For externally printable references. an individual
may also have a name or serial number that would appear afier the type label in the
concept box. For example. in Java the way we introduce a new object into a system is by
using the keyword "new":
Car rny FirstCar = new Car(seria1Number): // assume we already defined the class "Car"
In this way. a new object of type "Car" is created and a unique identifier is assigned to it
by the system. The "myFirstCarW is an extemal reference to the object.
The above car can aIso be represented in a conceptual graph like this:
[Car: serialNumber]
The serial number is an extemal printable reference. It refers to the pariicular car in the
system.
4.1.4 Inheritance Maps Canonical Formation Rule
The third coincidence is that the inheritance mec hanism of the O bject-oriented modeIIing
is already a fundamental feature of conceptual graphs that are supported by the canonical
formation d e s . A canonical graph can be derived form another canonical graph by
fornrcrrion rtrles. The formation rules are the niles of copy. restrict. unrestrict, join.
sirnplifq. and detach.
For example, a concept of type "tiger" can be derived from the concept "animal". if we
add some restrictions on the "animal" such as: "live on land". "with four legs". "eats
meat" etc. In object-oriented modelling, the object class "tiger" can be derived from class
"animal". The restrictions in the conceptual graphs may become the attributes of the
derived class.
4.4.5 Encapsulation Maps Context
The fourth coincidence is that the encapszrlation mec hanism of the object-oriented
mode1 1 ing c m be matched to confext of conceptual graphs. Contests encapsulate object
descriptions in a way that esactly reflects the structure of the object-oriented modelling.
We may already be familiar with encapsulation of object-oriented modelling! and
probably new to contexts. Let's look at the following example that shows how contexts
encapsulate an object. Figure 4.2 is a conceptual p p h indicating a meeting has k e n held
on 1 Oct. 1998.
L . MEETING DATE: 1 Oct. 1998
Figure 4.2 the conceptual graph for a meeting
The concept box with the label MEETING says that there exists a meeting, but it does not
specify any details of what happened. The OCCUR relation indicates that it occurred on
the date 1 Oct. 1998. To see the details of the Party. it is necessary to open the box and to
look inside. The box may be expended as show in Figure 4.3.
The expended box says that there are 20 attendants in the meeting and the chairman of
the meeting is John. He gives a speech to al1 the attendants.
r DATE: 1 Oct. 1998
J
MEETING:
i
CHAIRMAN: ATTENDANT:
John {* )@O
@-i-1 SPEECH
Figure 4.3 expanded view of the meeting context
4.4.6 Representing Objects and Object Classes
Other aspects of object-oriented modelling can also be matched to those of conceptual
eraphs. One of the aspects in object-oriented modelling is the distinction between an C
object class and the instances of each object. This feature can also be found in conceptual
graphs. For example. in Java. the way we define an abstract object class is like this: C
public class Car (
private string model;
private string enginehmberj
private int wheelsize;
private string chassisNurnber
// the model of the car
// engine has a serial number
// the size of the wheels
// chassis has a serial number
public Car(string model, string engine#. int wheelsize. string chassis#)
this-mode1 = model;
this-engineNurnber = engine#;
this.wheelSize = wheelsize;
this.chassisNumber = chassis#:
1
............................ // other methods
In conceptua1 graphs. the definition of a class "Car" may tooks like this:
Car: V *C MODEL: *m
ENGINE: *e WHEELSIZE: (*)@16
Figure 4.4 class definition for Car
Figure 4.4 shows a sample definition for the object class Car. The object definition has a
universal quantifier V to show that it applies to every car *C. Inside the definition, the car
?C is a kind of mode1 *m, and it has as parts an engine *e' a set of 16 inch wheels *W.
and a body *b. The concepts in the class definition are generic concepts that Say that
some engine or body must exist for each car but they do not speciQ their narnes or other
identifiers.
An instance of the object class Car can be created in Java by using the keyword "new".
For esarnple:
Car aCar = new Car("MustangW, "V6". 4. "JKL333");
The sarne instance can also be specified in conceptual graphs as the following diagram:
Car: CA 163 1998 *C
MODEL: Mustang L
ENGINE: 728EClZS *e 1
WEHEEL: {*)@16 1 1 CHASSIS: JKL.333 *b
Figure 4.5 an instance of car CA 163 1998
Frorn the above discussions we draw the conclusion that an object-oriented approach
naturally fits to conceptual graphs and thus it is better to use an object-oriented database
to store conceptual graphs than to use a relational database.
CHAPTER 5
Design and Implementation of the System
In chapter 3 and chapters 4: we reviewed some basic concepts conceming conceptual
graphs and object-oriented databases. In this chapter? we wiil discuss the design and C
irnpIementation issues of the object-oriented knowledge retrieval system. We use Object
Modelling Technique (OMT) method to design the system and use Java to irnplement it.
It has been tested on JDK1.1.6. The executable program is named as 'CGBase'. an
abbreviation for the "Conceptual Graph Knowledge base". The reason why we choose
Java is that:
1 . it is an Object-Onented Language; in chapter 4 we have discussed the advantages to
use object-oriented modelling techniques for conceptual graphs
2. it is portable to any platform
3. currently most CG tools are implemented in Java
The guide lines for designing the system are:
1. The system should provide efficient means of storing and searching conceptual
eraphs. The search should accurately retrieve knowledge based on user inputs. C
2. A user-fnendly interface should be used. which enables both experienced and
ine'cperienced users to use the system.
Based on the above guidelines, the following is the detailed system design that reflects
these guide lines.
5.1 Overview of the System
The system is divided into three components namely:
1. an interface between the user and the system- this interface contains a manual that
pnnts the options availabte for the user to select. as show in Figure 5.1.
* 1 . Create a new conceptual graph knowledge base * * 2- Open an existing conceptual graph knowledge base * * 3. Load a conceptual graph knowledge base * * 4. Search the conceptual graph knowledge base * * 5- Update a conceptual graph * * 6. Show al1 conceptual graphs * * 7. Close a knowledge base * * 8. Close al1 knowledge bases * * O. Exit * * * *************************************************
Figure S. 1 the Selection Manual
2 . a conceptual graph management kernel, the kernel consists of four object classes that
includes (a) conceptual graph loader that loads conceptual graphs into the knowledge
base: (b) index builder that extracts usefiil information from the graphs and builds
indexes with these information; (c) graph searcher that searches the knowledge base
for the rnatched graphs with the help of the indexes; (d) graph update editor that
updates a conceptuai graph.
3. a b-tree management system that hoMs and manages indexes.
In this project. each conceptual graph knowledge base consists of two files. One file is
the knowledge base file with a file name extension '.cgdl. This file contains al1 conceptual
graphs witten in CGIF format. The other file is the knowledge base index file with a file
name extension '.indl. This is the file that maintains al1 the index information about the
knowIedge base.
In Our approach. we first create a knowledge base for the conceptual graphs that are in
CGIF format, then populate the knowledge base w i t h the input conceptual graphs. While
the conceptual graphs are on the way to the knowIedge base. we parse the graph and
estract the usehl information and build indexes with the information.
The system also provides two ways to retrieve the conceptual graphs stored in the
knowledge base. namel y:
1. "terms" based search; user may enter any concepts or relations or combination of
concepts and relations. The system wili search for any conceptual graphs that
contains this information. The search is based on individuai concept and conceptual
relation match.
2. knowledge based search; user may enter a conceptual graph. The system will search
for any conceptual graphs that match the graph user entered. The search is based on
graph match.
For example. if the knowledge base contains three conceptual graphs:
( 1 ) [DOG* x'Mollyl] [MAN* y'Arthur'](LOVES?x?y)
(2) [DOG*x'Molly'][M~*y'Arthufl] [BONE*z](LOVES?x?y)(THROWS?y?z)(CATCH
ES?x?z)
(3) [DOG*x](LOVES?x[BObE])
Nol;. if the user just enters: dog
The system will find al1 conceptual graphs that contain concept 'dog'. In this case a11
conceptual graphs will be printed. The user. however. may narrow down the search by
providing more information. For example, the user may enter: dog + loves + arthur. In
this case only the first two conceptual graphs will be found and printed.
For the exact search. the user must enter a conceptual graph in CGIF format. The system
\vil1 take the input graph and search the knowledge base using graph matching.
We also want to update an existing conceptual graph. In this case, we first search the
conceptual graph that we want to update, When we enter the new conceptual p p h , the
system wi I1 update the old graph with the new one.
5.2 Design of a Conceptual Graph Object
Objects in memory and objects in a database differ semantically. Historically. object-
oriented systems and languages assumed that al1 the objects reside in a large virtual
memory and, as such. never bothered to develop concepts for managing objects in
database (Stefik and Bobrow. 1986). Like relational database. integrity constrains such as
uniqueness of objects. admissibility of nul1 values. domain type of attributes and
relationship between objects has also to be applied to object-oriented database. The
design of conceptual graphs in memory and on disk is different- In memory. a conceptual
graph is an object that consists of concepts. relations and comments. which are also
objects. Figure 5.2 shows the conceptual graph in memory.
Conceptuai Graph
L
Concepts Relations Comments i
Figure 5.2 the conceptual graph object in memory
The above diagram shows that a conceptual graph consists of zero or more concepts. zero
or more conceptual relations and zero or more cornrnents.
At first gIance. to store objects on a disk is easy, since Java supports direct wi t e of an
object to a file. However. objects written in this way c m not be retneved randomly. since
Java does not support random access objects in a file. Random access to the information
stored on a disk is very basic and is a must for any knowledge base systems. Thus. we
have to find other ways to store objects (conceptual graphs) on a disk.
The solution is as follows: since everything on the disk is in the form of byte streams, it is
impossible to retain data structures and pointers on disks. Thus, We have to store
conceptud graphs as byte streams. The length of each conceptual graph byte strearn is
variable. it is in sharp contrast with that of relational database. In relational database, al1
records consist of a fixed number of fixed-length fields. Thus, a waste o f disk space is
inevitable. In some situations. the waste of disk space can be very significant. since the
size of conceptual graphs can be significantly different. This is one of major drawbacks
of the relational database solution.
5.3 Design of a Concept Object
Concepts consist of four components namely: a type. a quantifier. a designator and a
comment. The latter two components constitute the referent of the concept. Quantifiers
are represented by using a class that impiements the Macro intedace. Macros c m either
use a simple placeholder object with a name. or can actually provide an esecutable
operation that 'esecutes' the macro and changes the graph accordingly. Figure 5.3 is the
graphical representation of a concept object.
Concept I
1 .'
Type Quantifier designator comment L
Figure 5.3 the concept object in memory
5.4 Design of Relation Object
Relations consist of a type and arguments (arcs) which are an ordered Iist of Concepts.
Figure 5.4 shows the relation object-
. Relation
Figure 5.4 the relation g a p h object in memory
5.5 Indesing
The process of constructing document surrogates/tags by assigning identifies to text
items is known as indexing [Saltion 831. Indexing provides a means to organize and
facilitate the retrieval of information. Some systems use a single index. The draw back
for this index method is obvious. namely the users have to provide information on that
particular index in order to retrieve information. For example. if a database stores
information about al1 ernployees for a Company it can use a single index on names. For
such an index system. a user must know an employee's name in order to draw
information about himher. Any other information will not help. It is nice to have more
than one index. however the trade-off is disk space.
Indexes for this knowledge base system are deliberately designed to provide users with as
many ways to retrieve conceptual graphs as possible. The system uses a triple index
system. It constnicts the indexes system based on "concept type", "name desiyator of a
referent" and "relation type". The indexes are organized into a B-tree structure. Each
node in the b-tree consists of three objects namely: "data". "pointer" and "addrVectorW.
The "data" detennines the position at which the node should be inserted. The "pointer" is
an array of pointers that point to its sub-nodes. The "addrVector" is a vector that contains
the addresses at which the searched information stored in the knowledge base. The disk
representation of a B-tree node is show in figure 5 -5.
The Nurnber of addresses (integer)
Length of the value
of the 'data'
(integer)
The value of the -data'
(String)
Figure 5.5 a ' BTreeNode' object on disk
address value (long integer)
address vaIue (long integer) .
5.6 Query of Conceptual Grapbs
Query is an important part of a knowledge base system. It is actually the interface
betwern user and the system. In designing of the format of the query for this system, we
take the following fact into consideration. The user can be either inexperienced or
experienced with conceptuai graphs. For inexperienced users. they may not know
conceptual grrrph ternis, such as concepts, relations. referent etc. For them. natural
language is the primary means of communication. Thus. they want to use natural
language to query the conceptual graph knowledge base. However. for experienced users.
they rnay know conceptual graphs quite well. They rnay wish to use their own "language"
that is "conceptual graph" to do the query. Such users tend to believe the results are
usually more accwate if they use conceptual graph terminology in the query language.
Based on the above discussion. the system implements two kinds of query:
1. query with natural language. it actually provides a fuzzy match
3. query with conceptual graph. it actually to provides a more exact graph match
For the first kind of query. a user may enter any information about conceptual graphs that
they want to retrieve. The information may include concept type. relation type and name
designator. The system will take the query and then search the indexes try to find
matched elements and finally read the conceptual graphs fiom the knowledge base and
prints them on the screen.
For the second kind of query, a user may enter a conceptual graph they want to retrieve.
The graph may be entered either fiom the keyboard or tiom a file. The system will take
two steps to finish the "retrieve" process. The first step is to take the graph and extract
useful information. It then searches the indexes to try to fmd al1 the matched elements. In
this case. many graphs may be found. For the second step, the system refmes the results
by performing graph matching. Thus. only those graphs that represent the same meaning
as the input one will be found.
5.7 Graph Matching Mechanism of the System
Graph matching between conceptual graphs is an essential feature- Different systems
provide different types of matching capabilities. ofien depending on the way that they
store the graphs. Applications may require many different forms of matching in order to
estract the information required.
Since the main focus of this thesis is to build a conceptuai graph knowledge base system.
the topic of graph matching is not deeply explored here. This conceptual graph
knowledge base system provides users with one form of matching. The matching scheme
the system uses is to match concept types. relation type and name designators of
concepts. For this matching scheme. if a user queries the knowledge base system with a
conceptual graph. only those graphs that match al1 these three components will be
retrieved. We chose this matching scheme because it represents the core meaning of most
sentences we use. For example:
"Man clrrhzw loves his dog kfolly. "
In this sentence, there are two concepts and one relation involved namely: Flan:
'Arthur']. [dog: 'Molly'] and the relation is (loves). The core meaning of the above
sentence is captured by those simple concepts and relations. A sentence like:
':?Man A rrhtrr Ioves dog b.folly. "
will also be matched to the above sentence according to our matching scheme and we
may see the meaning of the above two sentences is the same. But a sentence like:
"Dog Mo lly loves man Arthur. "
will not match since the meaning is quite different.
5.8 Design of the System
Object-oriented design is a new way of thinking about sohvare based on abstractions that
exist in the real world. The object-oriented design emphasizes the objects and the
relat ionshi p between these objects. An object incorporates both data structure and
beha\-ior. This is in contrast to conventional programming in which data structure and
behavior are only loosely connected. The design of the system can be show in Figure 5.6.
contains
v uses BTree
contains I Search
r
m
BTreeNode
Figure 5.6 the system class diagram.
As we can see. the system consists of six object classes. The class 'CGBasef is the
interface between the user and the system. It is able to accept the instruction or inputs
from the user. ln case the user want to perform management operations on the knowledge
base. a 'CGIFLoader' object will be created; or a 'Search' object will be created in case the
user
wants to search or update a graph. The class 'CGBasef may create and contain one or
more 'CGIFLoader' object. It may also create one or more 'Search' objects.
The class 'CGIFLoader' is the main class of the system. This class is able to create a
conceptual graph knowledge base. populate the knowledge base and build indexes on the
knowledge base with the help of the 'BTree' class. It also performs file management
functions such as "open" or "close". both for the knowledge base file and the index file.
Each 'CGIFLoader' object contains one 'BTree' object that is a B-tree structure that holds
the index of the knowledge base that this 'CGIFLoader' manages.
There are two main functions of the cIass 'Searcher'. The first main function is to find the
rnatched graphs and print them on the standard output. The second one is to update an
esisting graph according to a user's wish. This class may get the 'BTree' object fiom the
'CGIFLoader'. and then search through the index tree to find the addresses of the matched
graphs- and then randomly access the knowledge base file. read the matched graphs to the
memol and print them out on the standard output. If the user wants to update the graph.
it wil l alIow the user to enter the new graph and insert the new graph into the knowledge
base. Meanwhile. it \vil1 update the index as well.
The class 'BTree' dong with the 'BTreeNode' and the 'BTreeElernent' forms the index
structure. This structure is able to read the index file fiom the disk and build the index
structure automatically. It is also able to write itself to the disk. Each 'BTree' object
consists of zero or more objects of the 'BTreeNode' class, and each 'BTreeNode' object
consists of exactly one object of the 'BTreeElement' class.
CHAPTER 6
An Example
6.1 The Example
This example illustrates how the conceptuai graph knowledge base management system
~vorks. The example is an abstract of an article fiom the "Proceedings of Thirteenth
International Conference on Data Engineering". The reason we choose this exarnple is
because the example is just a common article wriaen in English. It does not have its own
specific features. It is common to most English articles. This means the conceptual graph
knowledge base management system can be applied to any common English articles.
Here is the article:
Objecr-orienred databuse sysrems (OODBMS) offer powerficl modelling concepts
as reqccired by advanced application domains like CAX or oflce automation.
Typicaf applications have ro handle large and complex strtrctured objects which
fieyzrenrly change rheir d u e and rheir structure. As the strucîzcre is described in
the schema of the database, support for schema evolurion is a highly required
fearzlre. There fore, a set of schema update primitives must be provided which can
be tcsed ro perform the required changes, even in the presence of poprtlared
darabases and rztnning applications.
In rhis paper. we use the versioning approach ro schema evoluîion ro slipport
schema repdutes as a c o m p k design task The presenred propagation mechanism
is based on conversion finciions fhat map objects between dzrerent Wpes and can
be zrsed ro slcpporf schema evolution and schema integraiion.
6.2 Building of Conceptual Craphs from the Example
In order to capture the meaning of this article, we build seven conceptual graphs based on
the article. Here are the conceptual graphs:
Object 0 I
Application- domain Require
Modelling- concepts OODBMS
Figure 6.1
Offer .
. Large Complex S tructured
A A L
Structure Value
Figure 6.2 Change >
. Schema ' Describe
L
Database cl Figure 6.3
Object 9 Structure el
d Schema-
Support ' evolution
Fsature Required
Figure 6.4
Sc hema-update- - primitive A
A
~
Perfonn
Object
Change G' Condition O Required A Figure 6.5
Approach + Target
Agent A
Object 9 C haracter 9 Complex rl
Figure 6.6
Object 0 evolution
Agent O Object 0 Object Q
Character (2 1 Different 1 c(S-) t 1 Twe 1
Figure 6.7
6.3 Conceptual Graphs in CGIF Format
Since the conceptual graph knowIedge base system only accepts conceptual graphs in
CGIF format, we also create CGIF format of the a b v e conceptual graphs as the
fol lowing:
1. [ O f fer * X I (Agent ?x [OODBMSI } (Object ?x [Mode l l i ng Concep t s *u] ) [ R e q u i r e *y ] (Agent ?y [ A p p l i c a t i o n - dornain] ) ( 0b j e c t - ?y?u )
2 . [Handle *ml [La rge *oj [Cornplex *pl [ S t r u c t u r e d *q] [Change * t ] (Agent ? r n f A p p l i c a t i o n ] ) ( O b j e c t ? m [ o b j e c t *n] ) ( C h a r a c t e r ?n?o ) ( C n a r a c t e r ?n?p ) ( C h a r a c t e r ?n?q ) (Have ?n [Va lue * S I ) (Have ? n [ S t r u c t u r e * X I ( C h a r a c t e r ? s ? t ) ( C h a r a c t e r ? x ? t )
[ S t r u c t u r e *x] [Describe 'y] [Schema * z ] [ D a t a b a s e *u] (Agent ?y?z ) ( O b j e c t ? y ? x ) ( I n ?z?u)
4 . [ S u p p o r t *u ] [Schema - e v o l u t i o n * V I [ R e q u i r e * X I [ F e a t u r e * y ] ( O b j ect ?u?v} ( D e f i n i t i o n ?u?y) ( C o n d i t i o n ?y?x )
5 . [Schema u p d a t e p r i m i t i v e *u] [ P r o v i d e *v] [ P e r f o r m * w ] [Change *x] [ ~ o ~ u l a f e d *y] T ~ a t a b a s e * z ] [Running *pl [ A p p l i c a t i o n *qJ [ R e q u i r e d + O ] ( O b j e c t ?v?u ) (Agent ?w?u) ( O b j e c t ?w?x) ( C o n d i t i o n ?x?o) ( C o n d i t i o n ?w?z) ( C o n d i t i o n ?w?q) ( S t a t e ? z ? y ) ( S t a t e ?q?p)
r o. [ V e r s i o n i n g * u ] [ a p p r o a c h *v][Schema e v o l u t i o n *w] [ S u p p o r t *x ] [Schema u p d a t e *y ] [Cornplex * z ] [~esign t a s k *s] (Method ?v?u ) ( ~ a r c ~ e t - ? v ? w ) (Agnet ?x?v ) ( O b j ect ? x ? ~ ) (character ? s ? z ) ( C l a s s i f y ? s ? x )
7. [ P r e s e n ~ *pl [Propagation mechanism *q] [Conversion functions * S I [Map *t] [Object *u] [~iffërent * V I [Type *w] [Be use2 *x] [Support * y J [Scherna-evolution * z ] [Schema integration *r] (Stato ?q?p) (Based on ?q?s) (Agent ?t?s) (Ebject ?t?u) (Character ?u?w) tat te ?w?v) (Patient ?x?q) (Target ?x?y) (Object ?y?z) (Object ?y?r)
We ais0 popdate the system with other CGIFs:
1 . CGIF for figure 3.1 :
[Person *x'John'][Reading *y](Agent ?y?x)(Object ?y
[Newspaper 'Toronto - Star' ] ) (Instrument ?y [Microfiche - Reader] )
2. CGIF for figure 3.4:
[Man 7x'Sonn1] [Eat * y ] [Fast *z] [Apple *u] (Agent ?y?x)
(Manner ?y?z) (Ob jeci ?y?u)
3. CGIF for "Dog Molly loves bone throwed by man Arthur.":
[Dog *x'Mollylj [Man *ylArthur'](Loves ?x[Bonetz])
(Throwed - by ? y ? z )
6.4 Searching of Conceptual Graphs
We first create a knowledge base and populate the knowledge base with the above ten
conceptual graphs. Then we are going to search conceptual graphs. There are two kinds
of search mechanism available: loose match and exact match.
6.4.1 Fuzzy Match
For fuzzy match, the search result is based on the mappïng of individual concepts and
relations. We enter queries:
I . Support + schema - evolution
Result: conceptuai graph: #4, #6 and #7 are retumed.
2. Support + schema - evolution + schema - integrution
Result: only conceptual graph #7 is retumed.
As we c m see, the search resuits totally depend on individual concepts and relation
matches.
6.4.2 Knowledge Based Match
For know!edge based match, the search resuit is based on the match schema discussed in
chapter 5 section 7. Two graphs are considered to be match oniy when both graphs
represent the same meaning. Let's look at the queries:
Query 1 : Find information about object-onented database systems that offer rnodelling
concepts which is required by application domains?
With this input query Ianguage, we can build a conceptual graph query:
[OODBMS *O] [Offer *p][Modelling - Concepts *q][Require
[Application-dornain *s](Agent ?p?o ) (Object ?p?@ (Agent ?r ?s) (Object ?r?@
Use this query to query the system, we get gridph #1 as the result.
Query 2: Find information concerning support schema evolution when a required feature.
Now we build a conceptual graph query:
[Suppor f jcl (Object ?x [Scherna-evolution]) (Definition ?x[Fearure 91)
(Condition ?yfRequire])
This query yields graph # 4.
Query 3: Find information about using the versioning approach to schema evolution that
supports schema updates as a design task?
We build a conceptual graph query as foliowing:
[A pproach %] (Method ?v[Versioning7) (Target ?v[Schema-evoiution])
(Agent [Support 3r]?v) (Object ?x[Schemaup&te]) [Desgn-task *.Y]
(Characr er ?s[Complex]) (ClasszB ?s?x)
Now the system retums graph #S.
We can enter queries to query other graphs just as the way we did in the above. Then we
can get the exact graph that we want to h d .
When we query the knowledge base, only one CG that matches the query CG is retumed
for each query. We can see the query CGs are ail different fiom the CGs stored in the
system. Yet, the system can still find the matched one. This is becaw although the query
CGs and the CGs in the system looks different, they al1 represent the same meaning. It is
reasonable, not to expect al1 people to derive the same set of conceptual graphs liom an
article, but we can expect people to wite different sets of conceptual graphs that
represent sarne meaning based on an given article. For a detailed test example please
refer to appendix B.
6.5 Conclusion
For this example, the IÜzzy search provides more loose match. It is suitable for users who
try to find articles with a set of sirnilar topics. It is ais0 designed for inexperienced
conceptual graph users? since the queries are written in English words. If the user want to
find a specific article, he/she can use the exact match.
The knowledge based match is based on core graph match and also it is designed for
experienced conceptual graph users since it only accepts conceptual graphs as it query
language. It is much more accurate compared with the previous one- The system yields
very high accuracy in the search with knowledge based graph matching. However. in the
real bvorld. it may not be able to search conceptual graphs at this high accuracy. The
major facts that largely affect the search accuracy tvill be:
1. people's understanding of an article that he/she will derive conceptual graphs fiom
2. the ability to correctly derive conceptual graphs from the article
To solve the second problem. first we need a standard for conceptual graphs. With a
conceptual graph standard, we will be able to derive conceptual graphs fiom tests in a
consistent way. Second. we need a software that is able to automatically build conceptual
craphs from a given text. Since. conceptual graphs derived by human are inherently b
lacking consistency. One text can be represented in different conceptual graphs by
different people even though they al1 follow the standard.
As Lve can see frorn the example, the system is able to find the right conceptual graph
even the input conceptual graph does not looks exactly the same as the conceptual graph
stored in the knowledge base. This verifies that the system does allow different
representations of conceptual graphs as long as the meaning that two conceptual graphs
representing are same.
CHAPTER 7
Summary
7.1 Summary
Conceptual structurest as developed by Sowa, is a very rich knowledge representation
ianguage intended to incorporate many concepts in nahuai and formal Ianguages. The
conceptual graphs stored in computers must be made more readily accessible and
manageable. This thesis is right for this purpose. The thesis has explored and
demonstrated some of the issues involved in building a conceptual graph knowledge base
system with object-oriented design and programming technology. Such issues include
knowledge representation methods, modelling and encoding techniques of natwai
Ianguages. knowledge retrieval techniques. knowledge base indexing techniques. The
object-oriented technology has increasingly been applied in modelling and building
comples information and knowledge systems. Thus, some object oriented design and
modelling techniques are also presented.
7.2 Conclusion
ï h e research in this thesis demonstrates a knowledge base system that is able to encode
knowledge in conceptual graphs and also demonstrates the use of knowledge through
simple query language. The conceptual graph is a promising technique in knowledge
representation especially in natural language stnicturing and encoding. It is a graphical
language designed for the interchange of knowledge between humans and computers. It
can be processed effectively by a computer. It also can be stored and managed by an
object-oriented knowledge base system; in fact, an object-oriented technology is a more
natural fit to the conceptual graphs than a traditional relational database.
The accuracy in retrieving conceptual graphs stored in the knowledge base system largely
depends on the consistency in the building of query CGs and the CGs in the knowledge
base. Theoretically, if we guarantee 100% consistency in the building of query CGs and
the CGs in the knowledge base, the system will yield high accuracy in the retrieving of
CGs. The result fiom our experiment is confirm this accurate.
7.3 Future Work
The format of the conceptual graphs used in the thesis project is in CGIF. However, a
conceptuai graph can be expressed in many formats, such as display fornt @F) that is a
graphic format much easier for hurnan to understand but dificuit for computers to
process. and linear form (LF) that is a more compact notation to the display form. Future
research can be expected to build and include a "Conceptual graph format translator",
which takes in one of these formats and translates it to the any of the other formats. Thus,
the conceptual graph knowledge base management system. collaborate with the
translator, will be able to store and manage conceptual graphs in any of these formats.
Future research can also be expected to build conceptual p p h modeis of legacy DB's
and use data rnining techniques for knowledge extraction and explore new algorithrns and
techniques for CG matching. So. instead of providing just knowledge based match. the
system will also provide difierent layers of graph matching, such matches c m be based
on sub-graphs. individual concepts. relations, concept or relation type etc.
Appendix:
A. Object Diagrams of the System
openedDBFileVector: Vector
1oaderTable: Hashtable
Figure 1. class diagarn for TGBase'
parseGraph(Know1edgeBase. TranslationContext, String)
- - --
cgifloader: CGIFLoader
key: Vector
dbFile : RandomAccessFile
root: BTreeNode - -
resultvector: Vector
searcho
findMatch(Vector, Vector)
printGraph(Vector)
update(Vector. int)
Insert(String. long)
deleteOld(long, long)
Figure 3. class diagram for -Search'
BTree
root: BTreeNode
size: integer
fileName: String - -
indesile: File
insert(B TreeE 1 ement)
readIndex()
1 data: BTreeElernent
addrvector : Vector
lefi: BTreeElement
ri&: BTreeElement
totalElements: int
insert(BTreeElement, long, long)
isNew(Vector. long. long) - -
delete(1ong. long)
Figure 5 . class diagram for ' BtreeNode'
BTreeElement -
value: String
Figure 6. class d i a m for 'BTreeElement'
B. Detailed Example Test Script
1. Create a Knowledge Base
We run the system by typing:java CGBase from prompt. The following manual appears:
* 1. Create a new conceptual graph knowledge base * * 2. Open an existing conceptuai graph knowiedge base * * 3. Load a conceptual graph knowledge base * * 4. Search the conceptual graph knowledge base * * 5. Update a conceptual graph * * 6. Show al1 conceptual graphs * * 7. Close a knowledge base * * 8. Close al1 knowledge base * * O. Exit * * *
Your selection is: 1
Please enter a knowledge base name: example
2. Poputate the knowledge base
M e r we enter a knowledge base name. the above main manual appears again.
Your selection i s : 3
Please enter t h e knowledge base name you want to
populate: êxamgle
Pleese enter the file name from which to load the CGIF: cg1
(w-e assume a conceptual graph is already saved in the file: cgl)
If successfiil we \vil1 see: Populate knowledge f in i shed .
If the conceptual graph is not in nght fomatter we will see: t he input
conceptual graph i s not i n right format ter .
If the input file does not exkt we wili see: The input f i l e does n o t exist .
3. Search the Conceptual Knowledge Base
Your s e l e c t i o n is: 4
Please e n t e r t h e knowledge base name in which you want to
search: e x a m p l e
* 1. Saggy search * * 2 . Knowledge based search * *********************************
we perform exact search by entering : 2
* 1. E n t e r a graph £rom keyboard * 2 - E n t e r a graph £rom a f i l e
If xve choose enter a graph from keyboard: 1
Please enter t he graph i n CGIF fomatter:
[00DBMS *o][Ofier *pJ[Modelling-Concepts *ql[Reqrtire *r]
plica cari on - domain *s](,4gent ?p?o ) (Object ?p?q) (Agent ?r?s) (Object ?r?q)
Assume the knowledge base is already populated with ten conceptual graphs mentioned
in chapter 6 . Then the system will display the search results:
There is 1 graph(s) found to be match:
1. [ o f f e r * X I (Agent ?x [OODBMS]) (Object ?x[Modelling - Concepts *u]) [Roquire * y ] (&gen t ? y [Application - domain] ) (Ob j e c t ?y?u)
C. Reference:
Finnegan Southey and Jim G. Linders 1999
Notio -- A Java API for developing CG tools
ICCS 1999
T. Mueck, M. Polaschek 1997,
The Multikey Type Index for Persistent Object Sets. The 13" international
Conference on Data Engineering -- ICDE97.
K. Peltonen. 1997
Adding Full Text Indexing to the Operation System. . The 13" International
Conference on Data Engineering --- ICDE97.
A. Sistla. O. Wolfson. S. Chamberlain. S. Dao. 1997.
Modeling and Querying Moving Objects. . The 13" International Conference on
Data Engineering --- ICDE97.
D. Konopnicki. 0-Shmueli 1997
W3QS --- A systern for WWW Querying. . The 13" International Conference on
Data Engineering --- ICDE97.
David W. Embley 1997.
Object Database Development, Concepts and Pnnciples. Addison Wesley
Longman.. Inc.
Douglas K. Bany 1996
The Object Database Handbook: How to SelecS implement, and use Object-
Oriented Databases. Katherine Schowaiter Press.
J. D. Ullman 1988
Principles of Database and Knowledge Base Systems. Vol. 1. Computer Science
Press. 1988.
J. D. Ullman 1998
Principles of Database and Knowledge Base Systems. Vol. 2. Computer Science
Press. 1988.
G. M. White 1990
Natural Language Understanding and Speech recognition. Communications of the
ACM. Vol. 33, August 1990.
J. FSowa 1984
Conceptual Structures: Information processing in Mind and Machine, Addison-
wesley. Reading, MA. 1984.
J. F. Sowa 1990
Ehowledge representation in Database? Expert system, and Natural Language,
Artificial Intelligence in Database and Information System, Edited by R. A.
Meersman et al, Noth-Holtand, 1990.
Timothy E. Nagle. Janice and Lawiel. Gerholz 1992
Conceptual Structures Current Research and Practice. First Pressed by: Ellis
Honvood Limited. 1992. Editord by: Timothy E. Nagle, Janice and Lauriel.
Gerholz 1992.
E. R. Tello
Object-Orientec 1 Prograrnming for Artificial Intelligence: A guide to Tools an<
System Design. Addison-Weslley.
IV. Kim. 1991
Object-Oriented Database System: Strengths and Weakness. Journal of Object-
Oriented Programming. July 1 99 1.
J. C. Giarratano. 1989
Expert Systems: Pnnciples and Programming, PWS-KENT Pub. Co. Boston.
1989.
K. R. Dittrich. 1986
Object-Oriented Database System: The Notion and the Issues, Proceeding of
International Workshop on Object-Oriented Database System, edited by Rt K.
Dittrich and U. Dayai. IEEE Company Society Press, 1986.
Guy W. Mineau. Bernard Moulin. John F. Sowa 1993
Conceptual Graphs for knowledge Representation. First International Conference
on Conceptuai Structures. ICCS'93. Quebec City, Canada August 1993.
Proceedings. Edited by G. Goos and J. Hartmanis.
Willian M. Tepfenhart, Judith P.Dick, John F. Sowa 1994
Conceptual Structures: Current Practices. Second International Conference on
Conceptuai Structures. ICCS'94. College Park. Maryland. USA. August 1994.
Proceedings.
Gerard El1 is. Robert Levinson. William Richm John F. Sowa 1 995
Conceptual Structures: Applications. Implementation and Theory. Third
International Conference on Conceptual Structures, ICCS'95. Santa Cruz, CA.
USA. August 1995. Proceedings.
Dickson Lukose, Harry Delugach. Mary keeler. Leroy Searle. John Sowa. 1997
Conceptual Structures: fùlfilling Peirce's Dream. Fifih International Conference
on Conceptual Structures, ICCS'94. Seattle, Washington, USA, August 1997.
Proceedings.
G. Salton. M. Lesk 1986
Cornputer Evaluation of Indexing and Text Processing, ACM. Vol 29, No. 7. July
1986.
M. L. Mauldin, 1991
Conceptual Information Retrievai: A Case Study in Adaptive Partiai Parsing.
KIuwer Academic Publishers. Norwell. MA. 199 1.
C. Faloutsos. H. V. Jagadish. 1992
B-Tree Indices for Skewed Distributions. 18Lh VLDB Conference. Vancouver.
BC. August 1992.
C. Faloutsos. H. V. Jagadish, 1992
Hybrid Index Organizations for Text Database. EDBT '92. March 1992.
Chen. P. 1985
Entity-Relationship Approch: The Use of the ER Concept in Knowledge
Representation. North-Holland. Amsterdam. 1985-
Recommended