Dual Role of Analogy in the Design of a Cognitive Computer

8/14/2019 Dual Role of Analogy in the Design of a Cognitive Computer

http://slidepdf.com/reader/full/dual-role-of-analogy-in-the-design-of-a-cognitive-computer 1/7

1

Abstract

This paper is about the computer analogy of

the brain and how it can both help and hinderour understanding of the human mind. It isbased on the assumptions that the mind can beunderstood in terms of the working of thebrain, and that the brain’s function is to processinformation: that it is some kind of a computer,as contrasted for example with the heart whichis a pump. It is a computer whose design we donot understand but try to, by analogy; that is,by making a model—a “cognitive com-puter”—based on our understanding of com-puters, brains, and the working of the mind.

Human intelligence and language are funda-mentally analogical and figurative whereaslower forms of intelligence and conventional

computers treat meaning literally. Therefore,the challenge in designing a cognitive com-puter is to find the kinds of information repre-sentation and operations that make figurativemeaning come out naturally. The paper dis-cusses holistic representation, which is uncon-ventional and looks promising and worthy of investigation—it easily encodes recursive (list)structure, for example—and points out a dan-ger in taking too literally cognitive models thathave been developed on conventional comput-ers, such as the following of rules.

Introduction

The human mind is unlike any computer orprogram we know. It is not literal, and whenmeaning is taken literally, the result can befunny or total nonsense. That’s the humor of

. 1. Real World Computing Partnership. Support of this research by Japan’s Ministry of Interna-tional Trade and Industry through the RWCP isgratefully acknowledged.

2. Swedish Institute of Computer Science.

puns. This must mean that the human mind,although capable of being literal, is fundamen-tally figurative or symbolical or analogical.

How else could we judge a literal interpretationas being at once both accurate and wrong?

The growth of the human mind—our graspof things—is largely due to analogical perceiv-ing and thinking. Some things are meaningfulto us at birth or without learning; they aremostly things necessary for survival. The restwe learn through experience. Some learning isassociative, as when we learn cause and effect.This kind of learning is basic to all animals.

To follow an example, or imitate, is a moreadvanced form of learning and is common atleast in mammals and birds. It involves a basic

form of analogy. The learner identifies with arole model—perceives one as the other, makesan analogical connection or mapping betweenoneself and the other.

Full-fledged analogy is central to humanintelligence. We relate the unfamiliar to thefamiliar, and we see the new in terms of theold. This is most evident in language, which isthoroughly metaphorical. New and unfamiliarthings are expressed and explained in familiarterms that are understood not literally but figu-ratively. It is possible that full-fledged analogyand human language need each other and thatour faculties for them have coevolved.

Analogy is such an integral part of us thatwe hardly notice it nor pay it its proper dues.That is, until we try to program a computer toact like a human. AI has puzzled over the pro-gramming of humanlike behavior for threedecades. At first it was thought that program-ming computers to understand language, totranslate, and the like, were just around thecorner, waiting only for computers to get largeand fast enough. Now they are large and fast,

Dual Role of Analogy in the Design of a Cognitive Computer

Pentti Kanerva

RWCP1 Theoretical Foundation SICS2 Laboratory

SICS, Box 1263, SE-164 29 Kista, Sweden

E-mail: [email protected]



2

many things have been tried and much hasbeen learned, but the puzzle remains and wehave no clear idea of how to solve it.

This paper is a personal view of the lessonsthis holds for us. The theme is that we mustrethink computing, put figurative meaning andanalogy at its center, and find computingmechanisms that make it come out naturally.This can be construed as designing a new kindof computer, a “cognitive computer,” that is abetter model of the brain than present-daycomputers are. I will also try to verbalizethings that students of connectionist architec-tures take for granted but that might puzzle

others, the main idea being that implementa-tion matters when we try to understand howthe mind works.

The Computer as a Brain and theBrain as a Computer

Equating computers with brains is an exampleof analogical thinking. Early computers weredubbed electronic brains, computers havememory, and we even say that a programknows, wants, or believes so and so. Suchanthropomorphizing seems natural to us and itserves a purpose. It brings a technological

mystery within the realm of the familiar, sincewe already have an idea of what the brain doeseven if we don’t know just how it does it.

We also talk of the brain as a computer. Itsappeal is in that whereas the mechanisms of the brain are hidden, those of the computer areavailable to us, and through them we couldpossibly understand the brain’s mechanisms.The principle is sound and is the thesis behindTuring’s imitation game: If we can build amachine that behaves in the same way as a nat-ural system does, we have understood the natu-ral system.

Analogies not only help our thinking butthey also channel and limit it. The computeranalogy of the brain or of the mind has cer-tainly done so, as modeling in cognitive sci-ence and AI has been dominated by programswritten for the computer, while philosophicaland qualitative treatment of issues is lookedupon with suspicion.

Many things are modeled successfully oncomputers, such as weather, traffic flow,

strength of materials and structures, industrialprocesses, and so forth. However, there arespecial pitfalls when the thing being mod-eled—the brain—is itself some kind of a com-puter: the danger is that our models begin tolook like the computers they run on or the pro-gramming languages they are written in. Forexample, we talk of human short-term or work-ing memory and think of the computer’s activeregisters, or we talk of human long-term mem-ory and think of the computer’s permanentstorage (RAM or disk), or we talk of the gram-mar of a language and think of a tree-structureor a set of rewriting rules programmed in Lisp.

Of course these are analogical counterparts,but there is a danger of taking them too liter-ally. Human memory works very differentlyfrom computer memory, and the brain is not aLisp machine nor the mind a logic program.Some analogical comparisons have not been atall useful in understanding the working of themind; for example, equating the brain with thecomputer’s hardware and the mind with itssoftware. Finally, there is a worse danger of failing to notice what is missing in our modelsof the mind because it is missing or invisible incomputers. To safeguard against it, we musttreat the subject qualitatively: Our models may

behave as advertised, but is that how peoplebehave; for example, how they use language?

Artificial Neural Nets asBiologically Motivated Models of

Computing

The computer’s and brain’s architectures arevery different. Perhaps the differences accountfor the difficulty in programming computers tobe more lifelike and less literal-minded. Thishas motivated the study of alternative comput-ing architectures called (artificial) neural nets

(NN), or parallel distributed processing (PDP),or connectionist architectures. The hope is thatan architecture more similar to the brain’sshould produce behavior more similar to thebrain’s, which is a valid analogical argument.Unfortunately it does not tell us what in thearchitecture matters and what is incidental, andunfortunately our neural nets are not signifi-cantly more figurative than traditional comput-ers.



3

Neural-net research has made a valuablecontribution by focusing our attention on rep-resentation. Computer theoreticians andengineers know, for example, that the repre-sentation of numbers has a major effect on cir-cuit design. A representation that works wellfor addition works reasonably well also formultiplication, whereas a representation thatallows very fast multiplication is useless foraddition. Thus a representation is a compro-mise that favors some operations and hindersothers.

Information in computers is stored locally,that is, in records with fields. Local representa-

tion—one unit per concept—is common also inneural nets. The alternative is to distributeinformation from many sources over sharedunits. It is more brainlike, at least superficially,and it has been studied and used with neuralnets for a long time. I take distributed represen-tation to be fundamental to the brain’s opera-tion and believe that a cognitive computershould be based on it, and that therefore weshould find out all we can about the encodingof information into, and operating with, distrib-uted representations.

Neural-net research has shown that theserepresentations are robust and support some

forms of generalization: representations (pat-terns) that are similar on the surface—closeaccording to some metric—are treated simi-larly, for example as belonging in the same orsimilar classes. The representations are alsosuitable for learning from examples. The learn-ing takes place by statistical averaging or clus-tering of representations (self-organizing). It isnot very creative but it can be subtle and life-like, which makes it cognitively interesting. Itcan produce behavior that looks like rule-fol-lowing although the system has no explicitrules, as was demonstrated with the learning of

the past tense of English verbs by Rumelhartand McClelland (1986). This is a significantdiscovery, in that it demonstrates a principlethat probably governs the working of the brainin general and should govern the working of acognitive computer. What we see and describeas rule-following is an emergent phenomenonthat reflects an underlying mechanism. How-ever, the rules do not produce the behavioreven if they may accurately describe it.

Description vs. Explanation

The distinction between description and expla-nation of behavior is so central that I will high-light it with an example. Consider heredity.Long before the genetic bases of heredity wereknown, people knew about dominant andrecessive traits and had figured out the basiclaws of inheritance. For example, a plant spe-cies may come in three varieties, with white,pink, or red flowers, and cross-pollinating thewhite with the red always produces plants withpink flowers. The specific rule is that all of thefirst generation is pink, and when pink-flow-

ered plans are crossed with each other, one-fourth of the offspring is white, one-fourth red,and half pink. So we can say that the inherit-ance mechanism works by this rule. However,no mechanism in the reproductive systemkeeps counting the numbers of offspring tomake sure that the proportions come out right:“I have made so and so many white flowers,it’s time to make the same number of red flow-ers.” It is not the rule that makes the propor-tions come out in a certain way. Theproportions are an outward reflection of themechanism that passes traits from one genera-tion to the next. It is significant, however, that

long before chromosomes or genes, or RNAand DNA were discovered, people speculatedcorrectly about a hereditary mechanism thatwould produce offspring in those proportions.Clearly, the laws provided a useful descriptionof the behavior, and accurate description oftenleads to discovery and explanation.

The situation is similar with regard to lan-guage and to mental functions at large. Forexample, we attribute the patterns of a lan-guage to its grammar and we devise sets of rules by which the grammar works. However,it is not the grammar that generates sentences

in us when we speak or write. The regularitiescaptured in the grammar are an outwardexpression of our underlying mechanisms forlanguage—the grammar is an emergent phe-nomenon. This distinction is easily lost whenwe produce language output with computers,for there we actually use the grammar to gener-ate sentences, and we work hard to develop acomprehensive grammar for a language. Andwhen we think of the computer as a model of



4

the brain and use computers to model mentalfunctions, we tacitly assume that the brain usesgrammar rules to generate language. Formallogic as a model of thinking can be criticizedon similar grounds. It may describe rationalthought but it does not explain thinking. Ourunderstanding of the mechanisms of mind isnot yet sufficient to allow us to explain think-ing and language. The best we can do is todescribe them, but as our descriptions improve,our chances for discovering the mechanismsimprove.

The Brain as a Computer for

Modeling the World, and OurModel of the Brain’s Computing

It is useful to think of the brain as a computer if we make the analogy between the two suffi-ciently abstract. So what in computers shouldwe look at? The organization of computationas a sequence of programmed instructions formanipulating pieces of data stored in memoryseems like an overly restricted a model of howthe brain or the mind works. A more usefulanalogy is made at the level of computers asstate machines, the states being realized asconfigurations of matter, or patterns in somephysical medium. Mental states and subjectiveexperience then correspond to—or are causedby—physical states so that when a physicalstate repeats, the corresponding subjectiveexperience repeats. Thus the patterns thatdefine the states are the objective counterpartof the subjective experience. Our senses are theprimary source of the patterns, and our built-infaculties for pleasure and pain give primarymeaning to some of the patterns. Brains arewired for rich feedback, and when the feed-back works in such a way that an experiencecreated by the senses—i.e., a succession of

states—can later be created internally, we havethe basis for learning. With learning, rich net-works of meaningful states can be built.

The evolutionary function of this computeris to make the world predictable: the brainmodels the world as the world is presented tous by our senses. It appears to compute withpatterns of activity over large sets of neurons.To study such computing mathematically, wecan model the patterns by large patterns of bits,

emphasizing the large size of the patterns, asthat gives the models their power. The keyquestion is, how do patterns that have alreadybeen established and have become meaningful,give rise to new patterns; how do existing con-cepts give rise to new concepts.

I have used the binary Spatter Code(Kanerva, 1996) to model computing withlarge patterns. The code is related to Plate’sHolographic Reduced Representation (HRR;Plate, 1994) and allows simple demonstrationsof it. The representation is distributed so thatevery item of information that is included in acomposed pattern—every constituent pat-

tern—contributes to every bit of the composedpattern: the patterns are holographic or holistic.

Computing with Large Patterns

The following description is in traditional sym-bolic terms and uses a two-place relationr ( x, y) and a triplet t = ( x, y, z) as examples.

Space of Representations

All HRRs, including the Spatter Code, work with large random patterns, or high-dimen-sional random vectors. All things—variables,values, composed structures, mappings

between structures—are elements of a com-mon space: they are very-high-dimensionalrandom vectors with independent, identicallydistributed components The dimensionality of the space, denoted by N, is usually between1,000 and 10,000. The Spatter Code uses densebinary vectors (i. e., 0s and 1s are equally prob-able). The vectors are written in boldface, sothat x stands for an N -vector representing thevariable or role x, and a stands for an N -vectorrepresenting the value or filler a, for example.

Item Memory or Clean-up Memory

Some operations produce approximate vec-

tors that need to be cleaned up (i.e., identifiedwith their exact counterparts). That is donewith an item memory that stores all valid vec-tors known to the system, and retrieves thebest-matching vector when cued with a noisyvector, or retrieves nothing if the best match isno better than what results from randomchance. The item memory performs a functionthat, at least in principle, is performed by anautoassociative neural memory.



5

Binding

Binding is the first level of composition inwhich things that are very closely associatedwith each other are brought together. A vari-able is bound to a value with a binding opera-tor that combines the N -vectors for the variableand the value into a single N -vector for thebound pair. The Spatter Code binds with coor-dinatewise (bitwise) Boolean Exclusive-OR(XOR, ⊗), so that the variable x having thevalue a (i.e., x = a) is encoded by the N -vectorx⊗a whose nth bit is the bitwise XOR xn⊗an( xn and an are the nth bits of x and a, respec-tively). An important property of all HRRs is

that binding of two random vectors produces arandom vector that resembles neither of thetwo.

Unbinding

The inverse of the binding operator breaks abound pair into its constituents: finds the fillerif the role is given, or the role if the filler isgiven. The XOR is its own inverse function, sothat, for example, (x⊗a)⊗a = x finds the vec-tor to which a is bound in x⊗a.

Merging

Merging is the second level of composition

in which identifiers and bound pairs are com-bined into a single item. It has also been called‘superimposing’ (superposition), ‘bundling’,and ‘chunking’. It is done by a (normalized )mean vector, and the merging of G and H iswritten as [G + H], where […] stands for nor-malization. The relation r (a, b) can be repre-sented by merging the representations for r , ‘ x= a’, and ‘ y = b’. It is encoded by

R = [r + x⊗a + y⊗b]

The normalized mean of binary vectors isgiven by bitwise majority rule, with ties brokenat random. An important property of all HRRs

is that merging of two or more random vectorsproduces a random vector that resembles eachof the merged vectors.

Distributivity

In all HRRs, the binding and unbindingoperators distributes over the merging opera-tor, so that, for example,

[G + H + I]⊗a = [G⊗a + H⊗a + I⊗a]

Distributivity is a key to analyzing HRRs.

Probing

To find out whether the vector a appearsbound in another vector R, we probe R with ausing the unbinding operator. For example, if R represents the above relation, probing it witha yields a vector X that is recognizable as x (Xwill retrieve x from the item memory). Theanalysis is as follows:

X = R⊗a = [r + x⊗a + y⊗b]⊗a

which becomes

X = [r⊗a + (x⊗a)⊗a + (y⊗b)⊗a]

by distributivity and simplifies to

X = [r⊗

a + x + y⊗

b⊗

a]Thus X is similar to x; it is also similar to r⊗aand y⊗b⊗a, but they are not stored in the itemmemory and thus act as random noise.

The functions described so far are sufficientfor traditional symbol processing, for example,for realizing a Lisp-like list-processing system.Holistic mapping, which is discussed next, is aparallel alternative to what is traditionallyaccomplished with sequential search and sub-stitution.

Holistic Mapping and Simple Analogical Retrieval

Probing is the simplest form of holistic map-ping. It approximately maps a composed pat-tern into one of its bound constituents, asdiscussed above and seen in the followingexample. Let F be a holistic pattern represent-ing France: that its capital is Paris, geographiclocation is Western Europe, and monetary unitis franc. Denote the patterns for capital, Paris,geographic location, Western Europe, money,and franc by ca, Pa, ge, WE, mo, and fr.France is then represented by the pattern

F = [ca⊗Pa + ge⊗WE + mo⊗fr]

Probing F for “the Paris of France” is done bymapping (XORing) it with Pa and it yields

F⊗Pa = [ca + ge⊗WE⊗Pa + mo⊗fr⊗Pa]

(see ‘Probing’ above) and is approximatelyequal to ca:

F⊗Pa ≈ ca

XORing with Pa has mapped F approximatelyinto ca, meaning that Paris is France’s capital.

Much more than that can be done in a single



6

mapping operation, as shown in the followingtwo examples. Let S be a holistic pattern forSweden with capital Stockholm (St), located inScandinavia (Sc), and with monetary unitkrona (kr). This information about Sweden isthen represented by the pattern

S = [ca⊗St + ge⊗Sc + mo⊗kr]

We can now ask ‘What is the Paris of Swe-den?’ If we take the question literally and dothe mapping S⊗Pa, as above, we get nothingrecognizable, so we must take Paris in a moregeneral sense. ‘Paris of France’ gave us a rec-ognizable result above (i.e., approximately ca),

so we can use it: we can map S (XOR it) withF⊗Pa and we get

S⊗F⊗Pa ≈ St

which is recognizable as the pattern for Stock-holm. The derivation is based on distributivityand is similar to the one given under ‘Probing’.The significant thing in S⊗F⊗Pa is that S⊗Fcan be thought of as a binding of two com-posed patterns of equal status, rather than abinding of a variable to a value, and also as aholistic mapping between France and Sweden,capable of answering analogy questions of thekind ‘What is the Paris of Sweden?’ and ‘What

is the krona of France?’Holistic mapping allows multiple substitu-tions at once. What will happen to the patternfor France if we substitute Stockholm for Paris,Scandinavia for Western Europe, and krona forfranc, all at once, and how is the substitutiondone? We create a mapping pattern as above,by binding the corresponding items to eachother with XOR and by merging the results:

M = [Pa⊗St + WE⊗Sc + fr⊗kr]

Mapping the pattern for France with M thengives

F⊗M= [ca⊗Pa + ge⊗WE + mo⊗fr]⊗ [Pa⊗St + WE⊗Sc + fr⊗kr]

= [ ca⊗Pa⊗ [Pa⊗St + WE⊗Sc + fr⊗kr]

+ ge⊗WE⊗ [Pa⊗St + WE⊗Sc + fr⊗kr]

+ mo⊗fr⊗ [Pa⊗St + WE⊗Sc + fr⊗kr] ]

by distributivity, which becomes

[ [ca⊗Pa⊗Pa⊗St + ca⊗Pa⊗WE⊗Sc+ ca⊗Pa⊗fr⊗kr]

+ [ge⊗WE⊗Pa⊗St + ge⊗WE⊗WE⊗Sc+ ge⊗WE⊗fr⊗kr]

+ [mo⊗fr⊗Pa⊗St + mo⊗fr⊗WE⊗Sc+ mo⊗fr⊗fr⊗kr] ]

again by distributivity. That simplifies to

[ [ca⊗St + ca⊗Pa⊗WE⊗Sc+ ca⊗Pa⊗fr⊗kr]

+ [ge⊗WE⊗Pa⊗St + ge⊗Sc+ ge⊗WE⊗fr⊗kr]

+ [mo⊗fr⊗Pa⊗St + mo⊗fr⊗WE⊗Sc+ mo⊗kr] ]

and is recognizable as[ca⊗St + ge⊗Sc + mo⊗kr]

In other words,

F⊗M ≈ S

so that a single mapping operation composedof multiple substitutions changes the patternfor France to an approximate pattern for Swe-den, recognizable by the clean-up memory.

Toward a New Model of Computing

Holistic representation and holistic mappinghint at the possibility of organizing computingaround analogy. However, the examples that Ihave shown are not very strong. This couldmean that large random patterns and the sug-gested operations on them are not a good wayto compute, but it is also possible that they are,but that we are not using them correctly. Whatstands out about the examples is that they arebuilt around established notions of variable,value, property, relation, and the like. Theseare high-level abstractions that help us describeabstract things to each other, but they may bepoor indicators of what goes on in the brain.For example, should a pattern for a variable,such as capital city in the above examples, berelated to patterns that stand for individual cit-ies, and how should those be related to the pat-terns for the countries they are capitals of?There are many questions to answer before wecan decide about the utility or futility of com-puting with large patterns.

What is appealing about large random pat-terns is that they have rich and subtle mathe-



7

matical properties, and they lend themselves toparallel computing. Furthermore, the brain’sconnections and patterns of activity suggestthat kind of computing.

For a computer to work like the humanmind, it must be extremely flexible in its use of symbols. It cannot stumble on the multiplicityof meanings that a word can have but rather itmust be able to benefit from the multiplicity.The human mind conquers the unknown bymaking analogies to that which is known, itunderstands the new in terms of the old. In sodoing it creates ambiguity or, rather, it createsrich networks of mental connections and

becomes robust.My hunch is that after we understand how

the brain handles analogy—how it treats onething as another—and have programmed it intocomputers, programming computers to handlelanguage will be an easy task, but it will not beeasy before.

References

Kanerva, P. (1996) Binary spatter-coding of ordered K -tuples. In C. von der Malsburg,W. von Seelen, J.C. Vorbrüggen, and B.Sendhoff (eds.), Artificial Neural Networks(Proc. ICANN ’96, Bochum, Germany),pp. 869–873. Berlin: Springer.

Plate, T. A. (1994) Distributed Representationand Nested Compositional Structure.Ph.D. Thesis. Graduate Department of Computer Science, University of Toronto.

Rumelhart, D.E, and McClelland, J.L. (1986)On learning the past tenses of Englishverbs. In J.L. McClelland and D.E. Rumel-hart (eds.), Parallel Distributed Processing2: Applications, pp. 216–271. Cambridge,Mass.: MIT Press.

Advances in Analogy Research: Integration of Theory and Data from

the Cognitive, Computational, and Neural Sciences. Workshop.

Sofia, Bulgaria, July 17–20, 1998.

Documents

Dual Role of Analogy in the Design of a Cognitive Computer