17
Information retrieval with semantic memory model Action editor: Minho Lee Julian Szyman ´ski a , Włodzisław Duch b,c,a Department of Computer Systems Architecture, Gdan ´ sk University of Technology, Poland b Department of Informatics, Nicolaus Copernicus University, Torun ´ , Poland c School of Computer Science, Nanyang Technological University, Singapore Available online 13 February 2011 Abstract Psycholinguistic theories of semantic memory form the basis of understanding of natural language concepts. These theories are used here as an inspiration for implementing a computational model of semantic memory in the form of semantic network. Combining this network with a vector-based object-relation-feature value representation of concepts that includes also weights for confidence and sup- port, allows for recognition of concepts by referring to their features, enabling a semantic search algorithm. This algorithm has been used for word games, in particular the 20-question game in which the program tries to guess a concept that a human player thinks about. The game facilitates lexical knowledge validation and acquisition through the interaction with humans via supervised dialog templates. The elementary linguistic competencies of the proposed model have been evaluated assessing how well it can represent the meaning of lin- guistic concepts. To study properties of information retrieval based on this type of semantic representation in contexts derived from on-going dialogs experiments in limited domains have been performed. Several similarity measures have been used to compare the com- pleteness of knowledge retrieved automatically and corrected through active dialogs to a golden standard. Comparison of semantic search with human performance has been made in a series of 20-question games. On average results achieved by human players were better than those obtained by semantic search, but not by a wide margin. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Natural language processing (NLP) techniques that pro- vide effective algorithms for search of relevant information in a huge amount of text documents available in machine readable form are in a growing demand. Search techniques have for a long time been based mainly on keywords. Sin- gle keywords or a few keywords (user queries) work well for small repositories of documents that belong to a single domain. More advanced NLP methods are required if search is made in large repositories containing documents from diverse domains. This is due to the strong ambiguity of keywords, leading to low precision, that is returning many unwanted documents, and to the idiosyncratic use of words by different authors, leading to low retrieval rates of relevant information. Effective NLP methods for infor- mation retrieval must rely on some basic knowledge about properties of language, and in particular about the seman- tics of concepts. The knowledge base should approximate relations between lexical elements as a prerequisite to achieve high linguistic competence. The use of such knowl- edge will be an important step forward towards automa- tion of the process of natural language understanding. Reading and understanding texts people employ addi- tional background knowledge stored in different types of their memory. Thanks to the recognition memory small mis- takes in the texts are ignored, semantic memory associates words with their general meaning in a given context, and episodic memory allows to build model of discourse or narrative. This leads to a rich conceptual view of texts being read that is usually beyond capabilities of NLP systems. A 1389-0417/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.cogsys.2011.02.002 Corresponding author at: Department of Informatics, Nicolaus Copernicus University, Torun ´, Poland. E-mail addresses: [email protected] (J. Szyman ´ ski), [email protected] (W. Duch). www.elsevier.com/locate/cogsys Available online at www.sciencedirect.com Cognitive Systems Research 14 (2012) 84–100

Information retrieval with semantic memory model

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information retrieval with semantic memory model

Available online at www.sciencedirect.com

www.elsevier.com/locate/cogsys

Cognitive Systems Research 14 (2012) 84–100

Information retrieval with semantic memory model

Action editor: Minho Lee

Julian Szymanski a, Włodzisław Duch b,c,⇑

a Department of Computer Systems Architecture, Gdansk University of Technology, Polandb Department of Informatics, Nicolaus Copernicus University, Torun, Polandc School of Computer Science, Nanyang Technological University, Singapore

Available online 13 February 2011

Abstract

Psycholinguistic theories of semantic memory form the basis of understanding of natural language concepts. These theories are usedhere as an inspiration for implementing a computational model of semantic memory in the form of semantic network. Combining thisnetwork with a vector-based object-relation-feature value representation of concepts that includes also weights for confidence and sup-port, allows for recognition of concepts by referring to their features, enabling a semantic search algorithm. This algorithm has been usedfor word games, in particular the 20-question game in which the program tries to guess a concept that a human player thinks about. Thegame facilitates lexical knowledge validation and acquisition through the interaction with humans via supervised dialog templates. Theelementary linguistic competencies of the proposed model have been evaluated assessing how well it can represent the meaning of lin-guistic concepts. To study properties of information retrieval based on this type of semantic representation in contexts derived fromon-going dialogs experiments in limited domains have been performed. Several similarity measures have been used to compare the com-pleteness of knowledge retrieved automatically and corrected through active dialogs to a “golden standard”. Comparison of semanticsearch with human performance has been made in a series of 20-question games. On average results achieved by human players werebetter than those obtained by semantic search, but not by a wide margin.� 2011 Elsevier B.V. All rights reserved.

1. Introduction

Natural language processing (NLP) techniques that pro-vide effective algorithms for search of relevant informationin a huge amount of text documents available in machinereadable form are in a growing demand. Search techniqueshave for a long time been based mainly on keywords. Sin-gle keywords or a few keywords (user queries) work wellfor small repositories of documents that belong to a singledomain. More advanced NLP methods are required ifsearch is made in large repositories containing documentsfrom diverse domains. This is due to the strong ambiguityof keywords, leading to low precision, that is returning

1389-0417/$ - see front matter � 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.cogsys.2011.02.002

⇑ Corresponding author at: Department of Informatics, NicolausCopernicus University, Torun, Poland.

E-mail addresses: [email protected] (J. Szymanski),[email protected] (W. Duch).

many unwanted documents, and to the idiosyncratic useof words by different authors, leading to low retrieval ratesof relevant information. Effective NLP methods for infor-mation retrieval must rely on some basic knowledge aboutproperties of language, and in particular about the seman-tics of concepts. The knowledge base should approximaterelations between lexical elements as a prerequisite toachieve high linguistic competence. The use of such knowl-edge will be an important step forward towards automa-tion of the process of natural language understanding.

Reading and understanding texts people employ addi-tional background knowledge stored in different types oftheir memory. Thanks to the recognition memory small mis-takes in the texts are ignored, semantic memory associateswords with their general meaning in a given context, andepisodic memory allows to build model of discourse ornarrative. This leads to a rich conceptual view of texts beingread that is usually beyond capabilities of NLP systems. A

Page 2: Information retrieval with semantic memory model

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 85

lot is know now about brain mechanisms responsible forunderstanding language (Feldman, 2006; Lamb, 1999;Pulvermnller, 2003). Action-perception circuits in the brainare activated by phonological and visual inputs, and distri-bution of brain activity provides natural basis for represen-tation of concepts (Duch, Matykiewicz, & Pestian, 2008).These activations change quickly in time and stronglydepend on priming by the context (McNamara, 2005) andon previous neural activity, therefore it is not easy toapproximate them by knowledge representation schemesused in natural language processing.

The first step to improve NLP methods requires focus-ing on a better understanding basic concepts representedby words. Understanding here means the ability to givethe word its proper meaning in agreement with the contextit appears in, and to be able to answer questions thatdepend on correct interpretation of properties associatedwith the concept a given word points to. A related aspectof understanding the meaning of the word is the abilityto create correct associations relevant in the context ofthe linguistic episode, giving responses based on informa-tion that has not been explicitly given, but may be retrievedfrom episodic and semantic memory of the cognitive agent.These two basic steps are essential to process natural lan-guage in a similar way as humans do.

In the next section psycholinguistic models of semanticmemory are briefly reviewed, in the third section ourapproach to the knowledge representation by semanticmemory is presented, Section 4 describes the semanticsearch algorithm, and Section 5 shows one particular appli-cation of this algorithm to the 20-questions game. Section 6introduces active dialogs for knowledge acquisition, andSection 7 compares results of our algorithm to thoseachieved by humans playing the same game. The final sec-tion contains discussion and presentation of plans forfuture research.

2. Psycholinguistic models of semantic memory

The idea of semantic memory has been introduced byTulving, Bower, and Donaldson (1972). He proposed to dis-tinguish memory involved in the cognition process that isused for organization of different types of human experi-ence. Within long term memory structures he distinguishedtwo kinds of memories, called the episodic and the semanticmemory. Episodic memory refers to personal experiences,events from the past that may be recalled. Everyone hasunique episodic memories, allowing us to understand idio-syncratic references to past events. Although they areformed from similar types of experiences specific configura-tions of these experiences are always unique. The second isrelated to the human language system, and is roughly com-mon for all users of a given language, enabling communica-tion process.

Of course any division of brain processes into separatecomponents is only approximate. Both types of memoryare simultaneously active. Experiences are stored in epi-

sodic memory that engages not only cortical, but also hip-pocampal structures. Through consolidation processrelations and properties of objects are turned into abstractrepresentations, stored in the semantic memory. Semanticmemory works as a mental lexicon (Gleason & Ratner,1997), a dedicated knowledge base storing basic lexical ele-ments – concepts, or “units of knowledge”. According tothe idea of the Triangle of Reference (Ogden, Richards,Malinowski, & Crookshank, 1949) concepts are used forthinking about the referent. Within the semantic memorystructures words serve as labels for concepts that describeelements of generalized experience. Words invoke brainstates that encode these elements, enabling communication.Isolated concepts have little meaning – semantic memorycontains information about relations between them, sothey form conceptual network of elements connected witheach other by different kinds of associations. They enableto capture the meaning of words extrapolated from rela-tions to other concepts.

Semantic representation of symbols in the brain hasbeen a matter of extensive research (Pulvermnller, 2003)and thanks to various neuroimaging methods a lot isknown about action-perception networks that give anintrinsic meaning to simple concepts. Analysis of fMRIscans shows that for different concepts activation withinbrain areas devoted to perception, motor manipulation,spatial representation, emotional and self-related regionssignificantly differs. Despite large individual variance offMRI signals a prototype brain state of many peoplemay be predicted sufficiently well to distinguish it fromabout a hundred other concepts (Mitchell et al., 2008).Reading simple stories leads to brain activity that revealsplaces, characters, subjects and objects of actions, goals,representations for visual exploration and motor activity,simulating in the imagination the events of the story as ifthey had been perceived (Speer, Reynolds, Swallow, &Zacks, 2009). This in a long run gives a chance to createa natural brain-based basis for representation of conceptsin semantic memory (Duch et al., 2008).

A few psycholinguistic models of semantic memoryexist. They describe how lexical elements are stored andprocessed by human brains. Below the main approachesthat can be used as an inspiration for building computa-tional model are presented.

2.1. Hierarchical model

Hierarchical model (Collins & Quillian, 1969) presentedin Fig. 1. is perhaps the simplest and most natural methodto organize concepts. In this approach the predefined is_arelation type organizes concepts (representing naturalobjects) in the form of a taxonomy tree. Other types ofrelations (e.g. can_a, has_a) that are useful for buildingadditional associations between nodes, may also exist,but in the hierarchical model they have only informativerole. The is_a relation introduces inheritance, with proper-ties of the concepts from higher levels of the taxonomy

Page 3: Information retrieval with semantic memory model

Fig. 1. Hierarchical model of semantic memory.

Fig. 2. Spreading activation model of the semantic memory.

86 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

(parents) being propagated to their children nodes. Mostontologies are based on this type of representation.

The hierarchical model enables to find connectionsbetween nodes, and thus provides answers about propertiesof concepts described by their features. In the simplest casethe presence/absence of links is interpreted as the yes or noanswer. Studies of response times to simple questionsrelated to properties of objects (e.g. ”Is a canary a bird?”)revealed that answering questions about typical propertiesthat are directly linked is faster then asking questions aboutproperties that require analysis of links going through theupper taxonomy relations. This is presumably related tothe transitions between brain states representing differentconcepts.

Although ontologies build on hierarchical models havemany applications where taxonomy is sufficient, it is clearthat the memory structures are not static. Thinking aboutnew relations between two or more concepts that areplaced far from each other in the hierarchical tree createsshortcuts or direct associations between these concepts ina way that cannot be accommodated in the hierarchicalmodel. A more realistic approximation to brain processesresponsible for acquisition of new semantic knowledge isneeded.

2.2. Spreading activation model

The spreading activation model (Collins & Loftus, 1975)of the semantic memory, depicted in Fig. 2, organizes con-cepts in the form of a lexical network.

Links between nodes of this network describe variousrelations, including semantic similarities between conceptsstored in the network. Concept that is analyzed at a givenmoment (current thought) is considered to be active, sym-bolizing coordinated neural activity of many brain areas. Ifthe activation is strong enough it will spread further to sev-eral associated concepts triggering their activity. Usuallythe winner-takes-most neural processes inhibit alternativeconcepts that could also be activated, leaving only a few

that are involved in a sequential thinking process. Spread-ing activation creates a subnetwork of active concepts asso-ciated with the primary concept. In real networks this is ahighly non-linear process. Activation of some nodes mayresult from weak associations with a number of conceptsin the analyzed sentence. According to the Hebbian princi-ples frequently used pathways are activated more easily,modifying association strength. Spreading activation toassociated concepts depends on the number of hopsthrough intermediate concepts and on their associationsstrengths, providing a certain distance measure betweenconcepts. Only concepts that are close to the concepts ana-lyzed receive sufficient activation to have some influence onthe semantic interpretation. Nodes have finite maximumactivations and energy is conserved, therefore nodes withmany links may spread only a weak activation, while afew strong associations will lead to larger activations.

Page 4: Information retrieval with semantic memory model

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 87

This model can describe non-taxonomic similaritiesbetween concepts in a better way than hierarchical modeledis able to do. Better approximation to the functions ofhuman semantic memory is seen in tests that analyze con-cept similarities, where semantic distance between conceptsdoes not increase the time of an answer for false sentenceseg.: “all fruits are vegetables” or “fruits are flowers”.Empirical studies show that the time of concept activationis related to the semantic distance, measured by intermedi-ate associations that the activation must pass through(Warren, 1977). EEG and fMRI experiments with humansshow that associations between closely related conceptsarise in 30–100 ms (Mitchell et al., 2008). For distant con-cepts this time significantly grows (>700 ms), as demon-strated in tests with semantic priming using conceptspairs (McNamara, 2005).

The spreading activation theory may be criticized onseveral points. First, it lacks different types of relations

Fig. 3. Semantic memory implemented by a feed-forward

between concepts, Second, it does not keep cognitive econ-omy, each definition of the concept should be complete,with all important associations defined directly. Third,early models of spreading activation did not includedinhibitory associations that suppress concepts associatedwith a given word, but not suitable in the particular con-text. Inhibitory associations restrict activation flow tothose nodes that represent concepts relevant to the mean-ing of the whole sentence. This extension of the modelhas been successfully used for disambiguation of medicalconcepts in the graph of consistent concepts (GCC)(Matykiewicz, Pestian, Duch, & Johnson, 2006).

Dynamical aspects of biological memory are thus cap-tured to some degree in the spreading activation model,although the processes of forming episodic memories thatcontribute to the formation of new semantic representa-tions is neglected. A more faithful representation of mem-ory should include also a process of adding and removing

neural network (after McClelland and Rogers, 2003).

Page 5: Information retrieval with semantic memory model

88 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

links as a function of new experiences and forgetting linksthat have not been activated for a long time.

2.3. Connectionist distributed models

Modern approaches to understanding language pro-cesses descend from connectionist models introduced inthe parallel distributed processing (PDP) book (Rumelhart& McClelland, 1986). In this approach semantic memory ismodeled by a neural network, where the meaning of con-cepts results from the network dynamics that depends onthe connections between neurons involved in distribute rep-resentations. In such models information is processed byinteractions of many simple elements connected with eachother via inhibitory or excitatory links. Distributed repre-sentations provided by neural network functions sharemany characteristics of human memory: they can deal withincomplete or distorted information, display content-basedaddressing sensitive to context, allow for automatic gener-alizations, producing similar activation patterns for similaroutputs. This allows for application of various types ofattributes in retrieval of information stored in memorystructures, with features that have strong impact only inprecise contexts, for example keywords bird, does not fly,

cold climate are sufficient to activate the concept penguin,while keywords bird, does not fly, hot climate the conceptostrich or emu. In contrast to the connectionist networksinformation is not localized in a single network node, butis contained in coherent patterns of neuronal activations.This leads to a true sub-symbolic representation of knowl-edge as single neurons do not represent microfeatures.However, in some simplified neural models identificationof a subset of network nodes with microfeatures may bedesirable, as it is done in the neural model of human mem-ory shown in Fig. 3.

This model, developed by McClelland and Rogers(2003) for description of natural categories from plantand animal domains, encodes relations of 4 types (ISA,IS, CAN, HAS) between objects and their properties. Afeed-forward neural network with two hidden layers learnsdistributed representations of input objects, using as inputplant and animal names and one of the relations, and asoutputs properties of these objects. Simple sentences, like“Robin can fly” are parsed to determine inputs, relationsand outputs. The final structure stores the knowledge inthe form object – relation type – feature. In the learningprocess layers named representation and hidden developinternal representations of objects processed by the neuralnetwork. This may be seen in the dendrograms showingdistributions of hidden layer activity after training. Activityvectors are similar (measured by cosine distance) for eachgroup: trees, flowers, birds or fish, reflecting similarity ofobjects withing the group, and progressively large differ-ences between plants and animals. Such network showsspontaneous generalizations of information that, althoughnot given explicitly may be induced through similar fea-tures of the presented objects. It may also build new asso-

ciations between categories that have not been given duringthe learning process. Thus it shows how episodic knowl-edge, based on collection of facts in form of simple asser-tions, is converted into semantic knowledge with specificstructure.

2.4. Approaches to estimation of meaning

Talking about psycho-linguistic semantic memory mod-els the question how the meaning of concepts is determinedby the human brain should be considered. The simplesttheories try to explain categorization of natural objects.Thus understanding is simply replaced by assigning a givenconcept to a proper category, assuming that the meaning ofthese categories has been established. Two mainapproaches should be mentioned here.

(A) Theory of semantic features (Smith, Shoben, & Rips,1974).

This theory is based on defining a concept as a list of itsfeatures. The features can be divided into two sets:

1. defining features – determining the meaning of theconcept,

2. characteristic features – determining the typicality of theconcept.

This model takes into account common and differentiat-ing features used during retrieval of similar concepts in thedecision process. According to (Smith et al., 1974) conjec-ture comparison of features is a two stage process. In thefirst step quick rating of general and typical features isdone, allowing for fast decisions. If in this phase similarityof input with the known concepts has not been successfullyestablished slower, more detailed second stage of analyzingdefining features is performed. For example, a question “Iscanary a bird?” leads to a strong activation based on anal-ysis of characteristic features, allows or quick verificationof the truth of this sentence. Verification of the sentence“Is penguin a bird?” is not so direct because features thatare characteristic to most birds are missing, therefore a sec-ond stage of the defining features comparison should beperformed.

Theory of semantic features accounts well for the typi-cality effects – judgments for typical members of a givencategory are faster then for unusual members, leading toslower response times (eg. penguin is a bird, vs. canary isa bird). This is explained by the need for a two stage com-parison process before the answer is given.

However, empirical studies show some gaps in this the-ory, called category size effects (Forster, 2004). Analyzingthe sentences such as a “poodle is a dog” and a “squirrelis an animal” it has been shown that people evaluatesentences for objects that belong to a narrow (more precise)category faster, despite the fact that precise categories con-tain more features than higher, more abstract categories

Page 6: Information retrieval with semantic memory model

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 89

(new specific features are added to the inherited notions).According to the semantic feature theory more compari-sons should be performed in this case, taking longer time,contrary to experimental results. Another drawback ofthe theory is the lack of cognitive economy. It also doesnot take into consideration the types of relations betweenobjects that may influence the similarity.

(B) Theory of prototypes.

Instead of focusing on features this theory of categoriza-tion of concepts is focused on the whole objects. Someobjects are more typical than others, eg. chair is a typicalelement of the semantic category furnitures, more centralthan for example a wastebasket. The prototypes theory(Rosch, 1973) come from psychological research on naturalcategories, and takes a different approach than traditionalthinking in terms of sufficient and necessary conditions.The concept here is not defined by its features but ratherby its similarity to a prototype for each category, withunequal category membership status for different objects(e.g. canary is a better prototype of the bird category thanpenguin). Thus the prototype theory assumes existence ofsome archetypes representing semantic categories. Objectsare assigned to categories using similarity measures per-formed on different processing levels.

Some evidence of organizing human cognition in theform of prototypes has been given in the research on build-ing artificial conceptual categories (Posner & Keele, 1968).It is clear the prototypes must result from generalization ofexperience, but it has not been shown how exactly theyarise. Similarity functions are usually calculated using aset of feature values, although brains may simply evaluatesimilarity of distributions of neural activities. A single pro-totype for each category is not feasible. A set of sufficientlysimilar examples may be generalized to create prototypescorresponding to similar distributions of neural activity.In the (McClelland & Rogers, 2003) model this may corre-spond for example to average distributions of activity ofhidden layer for general concepts, such as “bird”, thatthe network has not been explicitly trained on. Vectors thatdescribe activity for particular birds will be close to thisprototype, with untypical features removing them furtherfrom the prototype. Still untypical birds are closer to theprototype for “bird” than for “fish”, and both are far from“trees” and “flowers”.

These approaches for capturing the meaning of the con-cepts by the brain are the inspiration for building compu-tational model for processing lexical data. Neural modelsgive certainly very interesting results (Miikkulainen, 1993)but do not scale well. There is still a strong need to createsimplified models that capture important properties of neu-ral models but are easier to use from computational pointof view. A prerequisite for processing lexical data is therepository for storing lexical knowledge. In the next sectionknowledge representation used for our implementation ofcomputational model of semantic memory is described,

retaining functionalities postulated by psycholinguistictheories.

3. Knowledge representation for semantic memory model

Knowledge representation is one of the basic themes inartificial intelligence. It determines the way how informa-tion within machine is stored and processed and what kindof inferences can be performed on it (Davis, Shrobe, &Szolovits, 1993). From the human point of view naturallanguage is the most flexible method for expressing knowl-edge. It is also the most difficult to formalize in artificialsystems. The problem of knowledge representation for nat-ural language is still unsolved and recent trends to connectconcepts with action–perception in embodied cognitive sys-tems (Ansorge, Kiefer, Khalid, Grassl, & Knig, 2010)shows how difficult this task may be.

No computer system is able to use language in the wayhumans do, but there are some implementations that helpto improve human – computer interaction. Chatterbotsare programs designed to maintain dialog with people.Most of them only mimic linguistic competences withoutany understanding of the meaning of concepts, thereforethey fail to give meaningful answer even to simple ques-tions. Question/answer systems, or information retrievaltasks require more advanced approaches than just tem-plate-matching or statistical correlations. Despite a lot ofmarketing hype behind Wolfram Alpha computationalknowledge engine and other such systems answering ques-tion is still far from satisfactory.

Flexible method for representation of some aspects oflanguage is based on triples in the form of object – relationtype – feature. This method has been employed for model-ing data with first order logic (Guarino & Poli, 1995), andhas been formalized in popular RDF schemes for ontologyimplementations (Staab & Studer, 2004). Triples have alsobeen used for building semantic networks (Sowa, 1991) andlexical machine readable dictionaries. Below an extendedversion of triples will be used for implementation of com-putational semantic memory model which is in agreementwith psycholinguistic observations presented in previousSection 2.

In the standard RDF form learning is possible only byadding or removing triples, making it hard to representuncertain knowledge. Triples may be considered as linksbetween objects, represented by nodes of semantic net-works, and features of these objects, with relation deter-mining the type of the link. The simplest way to extendflexibility of triples and enable learning during knowledgeacquisition process is to add weights estimating strengthof relations. Such weights should encode fuzzy knowledge,to which degree some features are present (convenientlyexpressed in terms of fuzzy sets Zadeh, 1996), as well ashandle uncertainty of knowledge, estimating reliability ortypicality of features.

In Fig. 4 the elementary atom of knowledge in thevwORF representation used for implementation of

Page 7: Information retrieval with semantic memory model

bird winghas

confidence support

0.850.97

object relation type feature

v wO R F

Fig. 4. Atom of knowledge vwORF used in implementation of our semantic memory model.

1 http://wordventure.eti.pg.gda.pl.2 http://swn.eti.pg.gda.pl.

90 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

Semantic Memory computational model is presented. Thisatom of knowledge consists of five elements which can bedivided in two groups:

Triple of knowledge:

O – the object described, represented by its name, usu-ally a concept name rather than a single word.R – relation type denotes in what way the object isrelated to the feature.F – feature or a given property of the object.Weights:

v – confidence, a real number in the range h0,1i,describes how reliable is the knowledge described by thistriple. The value of v grows to 1 when knowledgeexpressed by the triple is repeatedly confirmed; fornew triples when there is no confirmations this valuemay be set near 0.w – support, a real number in the range h-1, +1i, describ-ing how typical is this feature for this object. Using thisparameter adjectives such as: “always”, “frequent”,“seldom”, “never” can be expressed, for example featureblack as the property of the stork has w = �0.5, meaningthat it is seldom true, and feature white should havew = 0.9, which means that stork is almost always white.

With the use of vwORF knowledge representation themeaning of elementary natural language sentences can beexpressed. In Fig. 4 an example sentence “bird has wing”

has been expressed using vwORF notation. Confidence ofthis triple (v = 0.97) is very high, which means that thisknowledge has been confirmed through many observations(lifetime of the system). The support (w = 0.87) is also high,expressing the knowledge that birds generally have wings.

Note also the limitations of such representation: there isno opportunity here to expresses numerical information,for example that a bird has no more than two wings. How-ever, this knowledge can be added by another triple, con-necting “bird” and “pair of wings”. Some knowledge isexpressed more easily by setting constraints rather thenspecifying precise values. Confidence and support couldbe functions of numerical values, estimating how likely isa given value (for example hight) for a given object (forexample human). We shall not discuss such extensionshere.

The set of weighted triples allows to express relativelywide knowledge, including negative knowledge. The setof triples joined together forms semantic network, denotedhere with the f symbol that represents the whole knowledgestored within the semantic memory model. This knowledgemay be represented in a graphical form by visualization ofthe semantic network. A user friendly interface for naviga-tion over such data using interactive components has beenimplemented by us, allowing to traverse the graph of con-cepts and features. This method of visualization has beenalso used in our other projects1: for building WordNet ina cooperative way (Szymanski, Dusza, & Byczkowski,2007) and integrating it with Wikipedia2.

Presenting knowledge f in the form of semantic networkis convenient for people, facilitating easy modificationusing visual interface. Unfortunately data stored in thisway cannot be efficiently processed by machines. To enablefast numerical operations semantic network is replaced bygeometrical representation called “semantic space” andsymbolized by w. Turning knowledge f contained insemantic network into representation of the semantic spacew transforms each object node C into n-dimensional fea-ture space F, where each object is represented by a point,equivalent to a sparse vector of feature values. ManyvwORF nodes are defined for each object and they are col-lected together in the vector called Concept DescriptionVector (CDV).

Exact mapping used here requires two dimensions foreach feature to store v and w weights. In its simplest formCDV vectors could store only binary information aboutexisting relations; an intermediate solution is to keep a sin-gle real feature value. The number of all features in f islarge and the number of features that are applicable to agiven object is rather small, therefore the vectors are quitesparse. Although some information is lost in such transfor-mation from f to w it is possible to perform some inferenceson knowledge stored in semantic network, thus expandingknowledge that is stored in explicit way in the semanticspace. Inferences are based on the processing of the prede-fined relation types and they add additional features storedin CDV. Four types of relations that appear betweenSemantic Network nodes are processed:

Page 8: Information retrieval with semantic memory model

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 91

1. is_a – The relation introduces in f a hierarchy of con-cepts that facilitates cognitive economy by inheritanceof features. If relation of the O1 is_a O2 type betweentwo objects has been identified features from theCDV(O2) are copied to the CDV(O1). A single weightis stored, obtained by multiplication of the v confidencevalue for the is_a relation by the w support value foreach feature copied. For all types of relations featuresthat already exists in the CDV of the object are notchanged.

2. similar – If O2 is similar to O1 features from CDV(O1)should be copied to CDV(O2), adding additional fea-tures that have not been present in CDV(O2), with theweight factor obtained by multiplication of confidencev related to the relation similar, multiplied by the w sup-port for the O1 features. This relation is not symmetric.However, if v = 1 relation similar becomes same, imple-menting equivalence of semantic memory objects, there-fore processing is performed also from O2 to O1.

3. excludes – Processing of this relation is similar to the onepresented above, except that the w support value of thefeature copied to CDV(O2) is multiplied by �1.

4. entail – If F1 entails F2 feature F2 may be added to theCDV of the object for which F1 is defined, with the w

value of the F2 being the same as F1, and confidence fac-tor v associated with the relation.

As an example consider semantic network constructedfor 172 animals (or more formally, objects from the animalkingdom domain). The 475 features describing them wereselected from relations of these objects that have beenfound in three lexical databases: WordNet (Miller,

29,25

41,0243,8

0

10

20

30

40

50average numberof the

features in

CDV29,25

41,0243,8

20

sem

antic

net

wor

k

is_a

5031 312 48

sem

Fig. 5. Change of the average number of specified CDV

Beckitch, Fellbaum, Gross, & Miller, 1993), MediaMITConceptNet (Liu & Singh, 2004), Microsoft MindNet(Vanderwende, Kacmarcik, Suzuki, & Menezes, 2005).Usage of three different data sources allows to build lexicalsemantic network in the automatic way. To assure thequality of the knowledge v values have been set using con-firmation of particular atoms of information in differentsources. Only those relations that appear in more thanone data source have been imported, with confidence factorv = 0.5 if they are only in two sources, and 0.75 if they arein all three sources. The confidence value associated witheach relations is further increased or decreased as a resultof the interaction with human user. Knowledge acquiredby aggregating three machine readable dictionaries con-sisted of 5031 most reliable relations describing 172 ani-mals with 475 features.

Performing inferences based on these 4 types of relationsenhances CDV representation of objects with new features.Fig. 5 presents how processing a particular relation typeduring f to w transformation influences the average num-ber of the features in the CDV vectors.

4. Semantic search algorithm

Semantic network describing relations between lexicalelements can be useful in many applications. We havesuccessfully applied the knowledge about relations of thenatural language elements, encoded using knowledge repre-sentation proposed in previous Section 3, for improve-ment of text classification (Majewski & Szymanski, 2008).Semantic space allows to perform semantic searches forobjects of interest referring only to their features. This kind

7 45,16 46,277 45,16 46,27

sim

ilar

enta

ils

excl

udes

37 20number of

the relations

relationtype

antic space

features as a function of the processed relation types.

Page 9: Information retrieval with semantic memory model

92 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

of search is useful when a user cannot recall the name of anobject she/he is looking for (consider the Tip of the Tongueproblem Burke, MacKay, Worthley, & Wade, 1991) oreven does not know its proper name. This situation is quitecommon, for example seeing an image of a flower one maytry to identify its name. In such a case typical keyword-based search may not be effective and could be replacedby semantic memory guided search, as described below.

Searching for unknown object in the semantic space isperformed by selecting the most distinctive feature that willdivide the whole space in a balanced way. In the semanticspace w containing M objects Oi with N features Fk thesearch based on a set of keywords should select the bestfeature that gives the maximum amount of information.Information-based measures are frequently used in deci-sion trees (Quinlan, 1986) and information selection. Inour case calculation of information associated with eachfeature in w is done according to the modified Shannon for-mula (1).

IðF kÞ ¼ �XM

i¼1

jwikjM

logjwikjM

ð1Þ

where wik is the support assigned to the relation betweenthe object Oi and feature Fk. Information (1) depends onthe objects contained in the part of w space considered,and this subspace is reduced after each new keyword valueis specified. For large semantic space w it is quite likely thatthere will be more than one feature having the same highestinformation, therefore additional heuristic criteria areneeded for ranking. One heuristics is to select the most pre-valent feature, according to the term frequency stored indatabases derived from general corpora (Hunston, 2001).Another heuristics is to use probabilities p(Oi) from previ-ous searches to guess which objects are most frequentlysearched for. For statistics based on NS previous searchesminimum probability for rare objects selected once isp(O) = 1/NS. Ordering m objects selected out of all M

objects in decreasing sequence of p(Oi) probabilities oneobtains a curve that has roughly exponential shape. Thiscurve can be used to estimate probability of the remainingM � m objects that have not been selected so far. Callingthis probability pr one may then renormalize estimatedprobabilities p(Oi) p(Oi)/(1 + pr) and use them in themodified formula (1):

IW ðF kÞ ¼ �XM

i¼1

pðOiÞjwikj log pðOiÞjwikj ð2Þ

The best separating feature selected on the basis ofIW(Fk) value is used as a keyword. The user determiningits value can narrow the set of the objects which can bethe result of her/his search. In the implementation pre-sented below we allow only “yes”, “no”, “don’t know”,“sometimes” and “frequently” answers, but depending onthe application other answers could be accepted (for exam-ple, a value in a given range). All answers given by the userare collected in the vector A that is used to calculate

distance to all objects in the semantic space and select themost probable (closest) objects. Full representation ofobject features stored in the Vi = CDV(Oi) is used to calcu-late Euclidean distance in a subspace of K known featurevalues:

DðA;OiÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXK

k¼1ðV ik � AkÞ2

rð3Þ

where K is the length of the answer vector A, describinghow many features have been tested so far. CDV vectorsdo not have full information about relations between ob-jects and features, some answers may not be correct anddistances may be influenced by different number of featuresdefined in each CDV vector. Therefore instead of Euclid-ean distance it is better to use cosine measure, a normalizeddot product of the A and V vectors, which has proved to bequite reliable in information retrieval problems (Qian,Sural, Gu, & Pramanik, 2004).

dðV ;AÞ ¼P

iV iAiffiffiffiffiffiffiffiffiffiffiffiffiPiV

2i

q ffiffiffiffiffiffiffiffiffiffiffiPiA

2i

q ð4Þ

In our system knowledge has different confidence factors(v), and has fuzzy support (w), therefore instead of thesesimple measures similarity is computed by:

SðV ;AÞ ¼ 1

K

XK

i¼1

ð1� distðV i;AiÞÞ ð5Þ

where the distance between components is defined by:

distðV i;AiÞ ¼0 if wðAiÞ ¼ NULL

�jwðAiÞj=K if vðV iÞ ¼ 0

vðV iÞjwðV iÞ � wðAiÞj if vðV iÞ > 0

8><>:

where v(Vi) and w(Vi) are confidence and support weightsdescribing relations with feature Fi in CDV, and w(Ai) isthe answer given by the user to the question “is the featureFi true” for the object she/he is searching for. The followingtable shows numerical values that corresponds to the ver-bal answers:

Similarity of the CDV and answer vectors A is calcu-lated as a sum of differences between user’s answers andthe system knowledge. If the user answers “don’t know”

this feature is omitted during calculation of similarity.Additionally the confidence factor v allows to strengthenimportance of CDV components which are more reliableand weaken the influence of the accidental ones.

After k steps (answers) maximum similarity Smax

between the current answer vector A and all CDV vectorsV is achieved in a subspace OðAÞk containing objects that –with high probability – are looked for:

OðAÞk ¼ OijSðA; V ðOiPÞÞ ¼ Smaxf g ð6Þ

Using the maximum similarity or equivalently a mini-mum distance criteria to construct OðAÞ subspace shouldlead to fastest recognition of the searched objects using

Page 10: Information retrieval with semantic memory model

4

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 93

minimum number of questions asked during the search.However, knowledge stored in CDV vectors is not perfectand answers given by the user are not always correct, there-fore such approach would sometimes miss searchedobjects. Moreover, it will miss the opportunity to acquirenew knowledge, as discussed further below.

5. The game of questions

Given a limited description of the object people are ableto identify it because their semantic memory, storing manyrelations between objects in the real world, is able to for-mulate good questions and make inferences that completepartial descriptions. The anticipation and associations cre-ated by the lexical context are one of the most importantprocesses that the system capable of understanding naturallanguage in a similar way to humans should posses. Itrequires good models of semantic and episodic memories.

Semantic search process is a good model of the popular20-question word game, where one person is allowed to ask20 questions trying to guess the concept the opponent hasin mind. The game is relatively simple for the people,because they have wide knowledge about the world. It isquite difficult for computer programs because success doesnot depend that much on computational power (as in chessand other board games) but relies on knowledge about theworld represented as relations between lexical elements.For that reason it has been proposed as a good challengefor computational intelligence (Duch, 2007). This gamemay also serve as a demonstration of the elementary lin-guistic competences based on lexical knowledge, allowingfor a real understanding of the meaning of discourse, andnot just to respond mechanically using templates.

In our implementation of the game the computer, usingknowledge encoded in the form of semantic network, triesto guess the object a human user has in mind. In reply tothe given questions, generated from vwORF knowledgerepresentation, a human can give answers only in the formspecified in the Table 1. What makes this approach3 differ-ent compared to the ones already available in the Inter-net4,5,6 is the flexibility of the knowledge representationused. All knowledge is stored in the semantic network andconverted to the vector-based semantic space to increasecomputational efficiency. This makes our approach largelyindependent of particular applications. Various applica-tions may use knowledge contained in the semantic net-work. For example, automatic generation of riddles forcrosswords is easily achieved by selecting small subsets offeatures that allow for a unique identification of objects.A very large number of such subsets exist even in knowledgebases of modest size.

3 http://diodor.eti.pg.gda.pl.http://www.20q.net.

5 http://www.braingle.com/games/animal/index.php.6 http://en.akinator.com/.

Ability to use linguistic data in many applications allowsto place our approach in the field of artificial general intel-ligence (Voss, 2005). Alternative approaches to wordgames encode the knowledge in a fixed form, using a matrixof objects and questions, which makes it easier to processby computers, but is only a superficial imitation of naturallanguage abilities, although still better than the famousELIZA template-based approach (Weizenbaum, 1966).Another difference is the way in which questions are gener-ated during the game. Other approaches used hand-codedquestions, while in semantic search questions are automat-ically generated using atoms of knowledge in vwORF rep-resentation. Formulating questions that are grammaticallycorrect is a challenge in itself because there are many forms(depending on the relation type) in which question could becast.

The third difference between semantic search approachand other systems is the way knowledge is acquired. Thisis the main bottleneck of most knowledge-based systems(Cullen & Bryman, 1988). Our implementation bootstrapsitself on knowledge from available machine readable dic-tionaries and other electronic sources, and thus can berun on a large scale, while other projects mostly exploitthe interaction with the users to learn correct answers. Ofcourse human–computer interaction is a very useful wayof acquiring knowledge, but it is also very time consumingand needs “the snowball effect” to bring enough players,which requires strong marketing. Focusing on automaticdata acquisition interaction with humans is used here onlyfor validation and correction of the results, as discussedbelow in the Section 6.

To make the game of questions more attractive somemodifications to the semantic search algorithm areintroduced.

1. Questions are generated selecting vwORF atoms fromthe semantic space according to the formula (1) or (2).If the same user repeats the game searching for the sameobject several times the deterministic system would askthe same questions, and that could be annoying. Thissituation is a good opportunity from the knowledgeacquisition point of view (see Section 6). To preventchoosing many times the same question stochastic ele-ments are introduced, selecting features randomly withprobability related to the information calculated accord-ing to (1) or (2). This method is analogous to the popu-lar genetic algorithms roulette reproduction approach(Goldberg, 1989). Such modification makes selectingfeatures (questions) a bit less effective, but in the testsit has not shown significant negative influence on theaverage number of questions.

2. In the classic version of the semantic search algorithmsubspace O(A) containing most probable objects thatare used to estimate information has maximum similar-ity or a minimum distance between the current answervector and all vectors in O(A) (Eq. (5)). If there is somesignificant discrepancy between these answers and

Page 11: Information retrieval with semantic memory model

Fig. 6. Avatar used in the implementation of the game of questions seen inthe Internet Explorer.

7 http://www.haptek.com.8 MS SpeechAPI http://www.microsoft.com/speech/speech2007/

94 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

stored knowledge (due to the errors in answers, semanticnetwork, or both) the object of interest may be left out-side of O(A). To prevent this situation larger subspace istaken, with objects accepted with probability given bymodified Boltzman distribution:

pðDd; kÞ ¼ 2

1þ exp kDdc

� � ð7Þ

where k is the step of the game (number of questionasked so far), Dd = dist(V(O),A) P 0 is the distance ofthe CDV vector representing some object O from the an-swer vector A minus dmin, and c is a scaling constant, setto five in all experiments. All objects at minimum dis-tance are always included (p(0,k) = 1), while objects thatare further then dmin are included with decreasing prob-ability, and may be different then those selected in theprevious step. Selecting objects to the subspace O(A)with a little less restrictive criteria than dmin makes thegame longer, but it can only be observed for popular ob-jects that can be found asking a few questions. For long-er games k in the Eq. (7) grows, and only objects at aminimal distance have a chance to be selected. If all an-swers are correct and match knowledge in the semanticspace dmin = 0, but the subspace O(A) may still containmany objects and more questions will be generated.

3. Algorithm stop condition: three cases when that algo-rithm may stop are considered:� The algorithm stops when there is only one object left

in the subspace O(A). It is the most desired situation,but it is relatively seldom because the CDV vectorsare sparse – the knowledge relating features withobjects is usually far from complete. Also expandingthe subspace O(A) using Eq. 7 brings into consider-ation less probable objects.

� The algorithm stops after asking the maximumnumber of questions allowed. Assuming only binaryanswers to the questions, and minimal differencesbetween objects (objects differ only by one feature),20 steps (questions) should distinguish 220 =1,048,579 objects. These assumptions are of coursenot true, in practice knowledge in CDV is incomplete,vectors differ in more than one feature, features arenot binary, and more than 20 features are used todescribe objects. Notwithstanding these issues, 20questions seems to be a reasonable maximum numberof questions for one game.

� When the number of objects left in the O(A) subspaceis relatively small heuristics may be used to identifysearched object. The implemented heuristic is basedon the observation that if there exists an object whichsignificantly differs from other objects in the O(A)subspace, and it stands out during successive ques-tions that it may be the object of interest. The imple-mentation of this heuristic is based on fulfilling thecondition described by (8).

default.mspx.

dp ¼ Dðdminþ1 � dminÞ > stdðOðAÞÞ ð8Þ

where dmin is the minimal distance in the O(A) be-tween CDV and the answer vector, dmin+1 is the sec-ond minimal distance, std(O(A)) is the standarddeviation of distances in the O(A) subspace.Tests of this heuristic show that it considerablydecreases the number of the questions, but in somecases leads to a wrong guess. The tradeoff betweenthe number of questions and precision of findingthe object is analogous to that between precisionand retrieval – the two measures behave in oppositionto each other and there is a problem of optimizingthem simultaneously (Buckland & Gey, 1994).

Technical implementation of the game has been done inform of a server controlling a web page with interactiveuser interface in the form of HIT (Humanised InTerface).Due to the MS ActiveX technology used for the Avatar,full interaction is possible only under Internet Explorer.Interface in the form of a humanoid taking head is depictedin Fig. 6. This implementation serves as the testbed forintegration of technologies making the web applicationsmore user-friendly (Szymanski, Sarnatowicz, & Duch,2007). The Haptek7 3D head has been integrated with thetext to speech engine (TTS) and endowed with the speechrecognition8 (due to the unacceptably high consumptionof server’s computational resources available only in theconsole version). The problems faced with implementationof these attractive technologies on a large scale shows thatalthough the HIT functionalities are implementable theyare still not mature enough to be widespread.

Page 12: Information retrieval with semantic memory model

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 95

6. Knowledge acquisition through active dialogs

It cannot be expected that a semantic network f builtautomatically from available data sources will be complete,and each object will be properly related to all features thatdescribe it. A method for validating and correction the dataacquired automatically is needed. The approach describedhere is based on the semantic search algorithm imple-mented in the form of a word game performed by thehuman user, who modifies the lexical knowledge base asa result of her/his search. The interaction with the programhas been limited so far to the answers (in the form definedin the Table 1) accepted to the questions generated usingthe knowledge stored in the system. Human–computerinteraction during the games is enriched through active dia-logs based on the templates of interactions, run in a speci-fied parts of the game. Three such templates are describedbelow:

1. At the end of the game, if the system correctly guessedthe concept, additional question Is that right? has beenadded to verify quality of knowledge stored in thesemantic space. Using the yes/no answer given by theuser to that question precision of the search is definedas Q = Nss/N, where Nss denotes the number of thesearches that finished with success, and N denotes thetotal number of the performed searches.For the initial semantic network constructed in an auto-matic way N = 30 test searches have been performed foran object randomly selected from f set. Selection ofobject for searches has been done with probability distri-bution given by a normalized number of the features inCDV vectors, so objects that are better described andmore popular are favored. The Q = 0.7 result indicatesthat in the limited domain there are some possibilitiesto obtain common sense knowledge in the form of rela-tions between lexical concepts automatically. It alsoshows that the method of integrating semantic datafrom three machine readable dictionaries requires man-ual validation and correction.The user’s answers givento the questions asked by the system allow for correctingand also obtaining new knowledge stored in the seman-tic network. The answer vector is used to perform mod-ifications of the knowledge according to the results ofthe search:

2. If to the last question Is that right? a user gives ananswer yes, the entries in the answer vector are usedfor enriching the CDV representation of the object the

Table 1User answers and their numerical encodings.

1 For the answer “yes”

0.5 For the answer “frequently”

�0.5 For the answer “seldom”

�1 For the answer “no”

0 Denotes answer “don’t know”

system guessed correctly. If some features present inthe answer vector already exist in the CDV the w weightsare modified taking the average value of w associatedwith particular feature in CDV and in the answer vector.In addition the system asks an open question: Tell me

something about hfound objecti. The answer may linkexisting feature to the object through some type of rela-tions, but also may add a completely new feature to thesemantic network that has not existed in the knowledgebase. An automatic search for possible links betweennew feature and stored objects is performed.This procedure requires deep linguistic parser to convertthe sentence in natural language given by the user toknowledge vwORF knowledge representation (Szy-manski et al., 2007). Parsing sentences given by the usersis the opposite process to the generation of questionsperformed by the system during the game.

3. If instead of the confirmation of search results the useranswer is no the system asks additional question: Well,

I fail to guess your concept. What was it?. The name ofthe object the user was thinking about indicates whichCDV should be corrected according to the informationin the answer vector. If the object has not existed inthe semantic memory before a new object is created withinitial features copied from the answer vector. Thisactive dialog allows the system to learn new objects.

To validate active dialog approach five test objects withthe largest number of features defined in CDV have beenselected. After their manual verification (using interactivevisualization of the semantic network) their CDVs weretake as the Golden Standard and used to verify capabilityfor acquiring new knowledge through the active dialogs.The verification has been performed by removing theseobjects from the knowledge base and then learning aboutthese objects through the interaction with users. The aver-aged dynamic of this process, performed for five objects inten games has been presented in Fig. 7. All games havebeen limited to twenty questions.

The process of acquiring knowledge using active dialogshas been monitored analyzing how complete the CDV of anew object (NO) became comparing to the Golden Stan-dard (GS). To analyze the process of acquiring knowledgefour measures are introduced:

1. Sd = Nf(GS)�Nf(O) is the measure of incompleteness ofthe new object, showing how the NO differs from GS interms of the number of features. Nf(GS) is the number offeatures defined in the Golden Standard GS = G(O) forthe concept O, and Nf(O) is the number of featuresdefined for this concept in CDV(O). The Sd value showshow many features are still missing compared to thegolden standard.

2. SGS ¼PNf ðOÞ

i¼1 ½1� dðCDV iðGSÞ � CDV iðOÞÞ� is the mea-sure of similarity based on the co-occurrence of features.It shows more precisely than Sd how close is NO to GS.The sum is only over features with defined yes/no values.

Page 13: Information retrieval with semantic memory model

Fig. 7. Dynamics of the process of acquiring new features; averaged results for five new objects.

96 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

The SGS value is the number of features from O that arefound in the golden standard GS vectors; the reversemeasure (SNO) is defined below. The ratio Sd/SGS ofthe similarity and incompleteness measure shows thepercentage of all features of the golden standard thathas already been defined for the concept O.

3. Difw ¼Pm

i¼1ðjCDV iðOÞ � CDV iðGSÞj=m is the averagedifference for all m feature values that appear in bothO and GS representations. This measure shows howthe feature values differ in O and GS vectors for thosefeatures that are common to the two vectors. It allowsfor observing wrong values associated with relations,while the previous measures allowed only to analyzeexistence of the relations.

4. SNO ¼PNf ðGSÞ

i¼1 ½1� dðCDV iðOÞ � CDV iðGSÞÞ�, analogi-cally to SGS, is the measure of similarity of two CDVvectors based on co-occurring features, with summationrunning over features in the GS. The SNO value is equalto the number of features that appear in description ofthe concept O and are not found in the GS, thus it mea-sures completeness of the Golden Standard.

The difference between CDV(O) and GS representationsis not only due to the lack of knowledge, but also the mech-anism to randomize questions, allowing for more knowl-edge acquisition when the game with the same concept isrepeated several time. Results shown in Fig. 7 prove the use-fulness of active dialogs for acquiring new knowledge. This

can be seen analyzing the graph NOq, (NOq = SNO + SGS)where an average increase of the number of features definedin CDV is shown. It can also be observed that during subse-quent games the number of the features acquired to the NOgrows (graph SGS). For the five test objects the average num-ber of games required to make the system recognize newobject correctly was only Vn = 2.67. It means that aftersearching approximately three times for the unknown objectthe system can identify it correctly. It is also important tonotice that the learning process makes objects stable – afterthe first successful search the object was always correctlyrecognized in the next games.

Sd value is calculated for the average number of featuresin all GS which had Dens(GS) = 55.5. The decreasing trendof Sd indicates that the number of features in NO comesnear to the number of features in GS. It can be expectedthat after playing more games this value could go belowzero indicating that NO has more features than GS. Thisshows that using the active dialogs one can build betterCDV than provided in the Golden Standard which isimperfect, as shown by the SNO graph. The limitations ofthe GS come from the fact that for the semantic space of475 features it is hard to acquire full description in theCDV form, even for limited set of objects. Differences (Difw

values) come mostly from the w weights that are negative,making 96.2% of all features in CDVs. It implies that GS iswell defined in terms of features that are positively relatedto the object.

Page 14: Information retrieval with semantic memory model

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 97

Only a few active dialog templates have been shown hereto demonstrate the ability for acquiring common senseknowledge about language stored in the form of weightedvwORF triples. More templates may be added whichshould lead to improved acquisition of structured knowl-edge through supervised dialogs in natural language withhumans.

7. Comparison to human performance

Results of the semantic search algorithm applied to the20-questions game may be directly compared to the sametask performed by people. In the restricted knowledgedomain that our program is working comparison is donebetween the average number of questions needed to findconcepts in games between two people, and in games whereone of the players is replaced by the computer program.Let Nq be the number of questions used for guessing anobject in the game. Let Nu be the number of unsuccessfulsearches, so that for Ng games Nu/Ng measures the preci-sion of retrieval.

Experiments were done separately with 4 groups of peo-ple of roughly the same size (20–23), or a total of 86 people.First they have been asked to play in pairs the game ofquestions restricted to the animal domain. In this part ofthe experiment 93 games have been completed. 86 of thesegames have been finished in no more than 20 questions andcould be used for evaluation. The average number of ques-tions Nq asked by humans to find the object the opponentis thinking about is presented in Fig. 8, with bars labelinggroups 1–4, summary of results for games played only bypeople, and summary of results for games played by people

Fig. 8. Comparison of the semantic search algorithm for the 20-question gamegames played in each of the group as well average results (denoted with

P),

searches.

with our program. The height of the bar denotes in loga-rithmic scale the number of performed games. For eachgroup the minimum and maximum number of questions,the average value and the standard deviation has been pre-sented. Shorter and darker bars represent the number ofunsuccessful games, requiring more than 20 questions ormissing the target object. Summary of results shows thatonly a small number of games have been unsuccessful.

In the second phase of the experiment people were askedto perform the game of questions with the computer.Semantic search algorithm with the guessing heuristic (8)has been used. The quality of data stored in the semanticnetwork is estimated by the fraction Q = 1 � Nu/Nq ofunsuccessful runs and by the average number of questionsNq required to guess the object, shown in Fig. 8 in the“algorithm” bars.

Knowledge base used in this experiment had 197 objectsdescribed by 529 features, with the average number of50.64 features per object. 227 games performed with peoplegave the average Q = 0.64, which is a bit worse than theresults obtained during tests on Semantic Networkacquired in an automatic way. This is due to the fact thathuman players introduced 46 new objects to the f knowl-edge base that had to be learned by the system. Of course197 objects chosen for automatic network constructiondoes not cover all of the animals domain, but allows for rel-atively frequent opportunity to learn new objects. If these46 cases are not included in the number of failed searchesmuch higher quality Q = 0.81 is obtained. Searching fornew objects caused 56% of all errors, and through the useof active dialogs new knowledge is added and the compe-tence of the system grows.

s played in four groups of humans. The bars present: total number of thenumber of searches performed by algorithm and number of unsuccessful

Page 15: Information retrieval with semantic memory model

98 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

The difference between human and semantic searchalgorithm performance is not that big, on average peopleused nearly 12 questions to make a correct guess whileour algorithm required about 14 questions. The main fac-tor responsible for this difference is probably due to rela-tively poor quality of knowledge acquired through theautomatic data acquisition. However, overall approxima-tion of associative inference mechanisms operating onsemantic memory by such simple knowledge representationand search algorithm is quite remarkable. Of course thetask itself requires very limited linguistic competences. Inmost tasks involving natural language processing peopleuse a wide range of common sense knowledge. It seemsimpossible to obtain such knowledge only from statisticalanalysis of unstructured text corpora, or even from struc-tured resources such as machine readable dictionaries.What is needed is the active cognition aspect – functional-ity that allows to verify obtained knowledge in action,which is reduced here to the interaction of the programwith people in word games. Introducing more human inter-actions into the process of lexical knowledge acquisitionseems to be a necessary to increase natural language com-petence of computer systems.

8. Discussion and future research

Cognitive processes rely on different types of memory:recognition, semantic, episodic, working, and proceduralmemories. We have focused here on the semantic memoryas the basis for understanding the meaning of generalconcepts. Semantic memory as an element of the humancognitive processes has been the subject of many psycholin-guistic theories. They are a rich source of inspirations forcomputational models approximating mental processesused by the brain in language comprehension and produc-tion. Such computational models, beside the algorithm,require also a lot of data to operate on. This data shouldrepresent knowledge about language concepts and thecommon sense associations between lexical concepts andtheir properties. Such lexical data is linked to perceptionand action in embodied cognitive systems (Ansorge et al.,2010) and thus cannot be easily represented without sen-sory percepts, their categories and more abstract construc-tions. At present it is very hard to acquire because it canonly be produced and verified by humans, although infuture robotic experiments may also offer interesting inspi-rations. Only a part of symbolic knowledge involved in thedescription of natural objects and categories may to somedegree be captured in semantic networks and create suffi-ciently rich relations to grant symbols some elementarymeaning. Such representation of knowledge allows forapproximation of natural processes responsible for lan-guage comprehension.

In this paper a step towards computationally efficientmodel of semantic memory has been made. Knowledgerepresentation in the form of weighted triples vwORF,has been used for implementation of functionalities

inspired by psycholinguistic theories of human semanticmemory. Semantic network build from the vwORF atomsof knowledge is a flexible way of storing lexical knowledge.For computational efficiency vector-based semantic spaceis used instead of semantic network. Elementary linguisticcompetence has been demonstrated in word games in thisway, with results that have not been shown so far by moresophisticated linguistic approaches. Word games are agood ground for comparison of human competencies withcapabilities of computational models. Results achieved bypeople and by our semantic search approach based onvwORF knowledge representation in the 20-question gameshow that although real brains are still better than com-puter programs the difference is not so large.

Linguistic competence of programs depends more onlexical knowledge, representation scheme and search algo-rithm than on raw computational power. Research onexpert systems showed the difficulty and the importanceof knowledge acquisition from data, and despite the avail-ability of huge structured and unstructured lexicalresources acquiring lexical information automatically isstill a great challenge. Using three independent lexical dat-abases we have shown possibilities for obtaining automat-ically common sense knowledge in the form of typedrelations between lexical elements. Nevertheless, the dataneed to be validated and corrected interacting with people.Active dialogs introduced here allow for acquisition ofcommon sense knowledge and verification of this knowl-edge in action. This is frequently omitted in constructionof large lexical databases, such as WordNet that has beenbuilt manually with a great effort. Without systematic feed-back from active use of its resources the process of com-pleting missing knowledge and proper stratification ofWordnet synsets in different contexts is very slow.

Semantic search process introduced in this paper may betreated as a general model of decision making based onactive queries, to find particular action (object) appropriatein specific conditions (feature values). Consider for exam-ple the process of medical diagnosis where disease is iden-tified using a series of observations and tests; decisionsupport system should ask a number of questions to iden-tify the most distinctive symptoms. The algorithm hasalready been tested in medical domain using data fromthe “Diagnostic and Statistical Manual of Mental Disor-ders” (DSM IV) (DSM, 1994). Queries generated bysemantic search led to correct diagnosis in fewer steps thanthe original decision tree recommended by DSM IV. Otherapplications include WWW information retrieval (Duch &Szymanski, 2008). Web search engines return a large set ofpages as a result of keyword-based query, and the subset ofmost relevant pages is subsequently identified using thesemantic search algorithm. However, such approachrequires features relevant for concepts contained in allpossible knowledge domains indexed by the search engine.It implies building a very large scale semantic networkwhich is still a great challenge. This vwORF representationof knowledge has also been successfully applied for

Page 16: Information retrieval with semantic memory model

J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100 99

improvement of natural language text processing(Majewski & Szymanski, 2008).

The tests performed here in the limited domain shouldbe treated as the proof of concept. The project will bescaled up and used to improve information retrieval fromWikipedia. Interaction of many volunteer contributorswould be needed to create knowledge for a large scalesemantic network, verified in action during the actualsearches. A good strategy is to start form a limited domain,such as animals or plants, trying to cover the wholedomain, not just a small subsets and has been done here.Identifying an arbitrary plan or animal shown in a photo-graph using a variant of the 20-question game is a challeng-ing task. Going beyond simple nouns and trying tounderstand actions is still farther ahead. In all these tasksneurocognitive inspirations should be our guide.

Acknowledgment

This work has been supported by the Polish Committeefor Scientific Research Grant N516 035 31/3499.

References

Ansorge, U., Kiefer, M., Khalid, S., Grassl, S., & Knig, P. (2010). Testingthe theory of embodied cognition with subliminal words. Cognition,

116, 303–320.Buckland, M., & Gey, F. (1994). The relationship between recall and

precision. Journal of the American Society for Information Science,

45(1), 12–19.Burke, D., MacKay, D., Worthley, J., & Wade, E. (1991). On the tip of the

tongue: What causes word finding failures in young and older adults.Journal of Memory and Language, 30(5), 542–579.

Collins, A., & Loftus, E. (1975). A spreading-activation theory of semanticprocessing. Psychological Review, 82(6), 407–428.

Collins, A., & Quillian, M. (1969). Retrieval time from semantic memory.Journal of Verbal Learning and Verbal Behaviour, 8, 240–247.

Cullen, J., & Bryman, A. (1988). The knowledge acquisition bottleneck:Time for reassessment?. Expert Systems 5(3), 216–225.

Davis, R., Shrobe, H., & Szolovits, P. (1993). What is a knowledgerepresentation? AI Magazine, 14(1), 17–33.

DSM (1994). Diagnostic and statistical manual of mental disorders.American Psychiatric Association.

Duch, W. (2007). What is computational intelligence and where is it going.In W. Duch & J. Mandziuk (Eds.). Challenges for computational

intelligence (Vol. 63, pp. 1–13). Springer.Duch, W., Matykiewicz, P., & Pestian, J. (2008). Neurolinguistic approach

to natural language processing with applications to medical textanalysis. Neural Networks, 21(10), 1500–1510.

Duch, W., & Szymanski, J. (2008). Semantic web: Asking the rightquestions. In Proceedings of the 7 International Conference on

Information and Management Sciences (pp. 1–8). California Polytech-nic State University.

Feldman, J. A. (2006). From molecule to metaphor: A neural theory oflanguage. MIT Press.

Forster, K. (2004). Category size effects revisited: Frequency and maskedpriming effects in semantic categorization. Brain and Language, 90(1–3), 276–286.

Gleason, J. B., & Ratner, N. B. (1997). Psycholinguistics (2nd ed.).Wadsworth Publishing.

Goldberg, D. (1989). Genetic algorithms in search, optimization and

machine learning. Boston, MA, USA.: Addison-Wesley LongmanPublishing Co., Inc.

Guarino, N., & Poli, R. (1995). Formal ontology, conceptual analysis andknowledge representation. International Journal of Human Computer

Studies, 43(5), 625–640.Hunston, S. (2001). Word frequencies in written and spoken english:

Based on the british national corpus. Language Awareness, 11(2),152–157.

Lamb, S.M. (1999). Pathways of the brain: The neurocognitive basis oflanguage (Vol. 170). John Benjamins Publishing Company.

Liu, H., & Singh, P. (2004). ConceptNet. A practical commonsensereasoning tool-kit. BT Technology Journal, 22(4), 211–226.

Majewski, P., & Szymanski, J. (2008). Text categorisation withsemantic common sense knowledge: First results. bSpringerLecture notes in computer science. In Proceedings of 14th int.

conference on neural information processing (ICONIP07) (Vol. 4985,pp. 285–294).

Matykiewicz, P., Pestian, J., Duch, W., & Johnson, N. (2006).Unambiguous concept mapping in radiology reports: Graphs ofconsistent concepts. AMIA Annual Symposium Proceedings, 2006,1024–1031.

McClelland, J., & Rogers, T. (2003). The parallel distributed processingapproach to semantic cognition. Nature Reviews Neuroscience, 4(4),310–322.

McNamara, T. (2005). Semantic priming: Perspectives from memory andword recognition. UK: Psychology Press.

McNamara, T. P. (2005). Semantic priming: Perspectives from memoryand word recognition. Taylor & Francis Group: Psychology Press.

Miikkulainen, R. (1993). Subsymbolic natural language processing: Anintegrated model of scripts, lexicon, and memory. Cambridge, MA:MIT Press.

Miller, G. A., Beckitch, R., Fellbaum, C., Gross, D., & Miller, K. (1993).Introduction to WordNet: An on-line lexical database. PrincetonUniversity Press.

Mitchell, T., Shinkareva, S., Carlson, A., Chang, K., Malave, V., Mason,R., et al. (2008). Predicting human brain activity associated with themeanings of nouns. Science, 320(58–80), 1191.

Ogden, C., Richards, I., Malinowski, B., & Crookshank, F. (1949). Themeaning of meaning. Routledge & Kegan Paul.

Posner, M., & Keele, S. (1968). On the genesis of abstract ideas. Journal of

Experimental Psychology, 77(3), 353–630.Pulvermnller, F. (2003). The neuroscience of language on brain circuits of

words and serial order. Cambridge Uni. Press.Qian, G., Sural, S., Gu, Y., & Pramanik, S. (2004). Similarity between

Euclidean and cosine angle distance for nearest neighbor queries. InProceedings of the 2004 ACM symposium on applied computing (pp.1232–1237).

Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1),81–106.

Rosch, E. (1973). Natural categories. Cognitive Psychology, 4(3),328–350.

Rumelhart, D. E., & McClelland, J. L. (Eds.). (1986). Parallel distributed

processing: Explorations in the microstructure of cognition. Cambridge,MA: MIT Press.

Smith, E., Shoben, E., & Rips, L. (1974). Structure and process insemantic memory: A featural model for semantic decisions. Psycho-

logical Review, 81(3), 214–241.Sowa, J. (1991). Principles of semantic networks: Explorations in the

representation of knowledge. Representation and reasoning. San Mateo,CA: Morgan Kaufmann.

Speer, N. K., Reynolds, J. R., Swallow, K. M., & Zacks, J. M. (2009).Reading stories activates neural representations of perceptual andmotor experiences. Psychological Science, 20, 989–999.

Staab, S., & Studer, R. (2004). Handbook on ontologies. Springer Verlag.Szymanski, J., Dusza, K., Byczkowski, L. (2007). Cooperative editing

approach for building wordnet database. In: Proceedings of the XVI

international conference on system science (pp. 448–457).Szymanski, J., Sarnatowicz, T., & Duch, W. (2007). Towards avatars with

artificial minds: Role of semantic memory. Journal of Ubiquitous

Computing and Intelligence.

Page 17: Information retrieval with semantic memory model

100 J. Szymanski, W. Duch / Cognitive Systems Research 14 (2012) 84–100

Tulving, E., Bower, G., & Donaldson, W. (1972). Organization ofmemory. New York: Academic Press.

Vanderwende, L., Kacmarcik, G., Suzuki, H., & Menezes, A. (2005).MindNet: An automatically-created lexical resource. Proceedings of

HLT/EMNLP on Interactive Demonstrations, 8–19.Voss, P. (2005). Essentials of general intelligence: The direct path to

artificial general intelligence. Artificial general intelligence. Springer,pp. 131–157.

Warren, R. (1977). Time and the spread of activation in memory. Journal

of Experimental Psychology: Human Learning and Memory, 3(4),458–466.

Weizenbaum, J. (1966). ELIZA – A computer program for the study ofnatural language communication between man and machine. Commu-

nications of the ACM, 9(1), 36–45.Zadeh, L. (1996). Fuzzy logic = computing with words. IEEE Transac-

tions on Fuzzy Systems, 4(2), 103–111.