34
This article was downloaded by: [University of Cambridge] On: 04 December 2014, At: 02:26 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Connection Science Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ccos20 Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing GEUNBAE LEE a , MARGOT FLOWERS a & MICHAEL G. DYER a a Artificial Intelligence Laboratory, 3532 Boelter Hall,Computer Science Department , University of California , Los Angeles, CA, 90024, USA Phone: 213-825-2303. E-mail: Published online: 24 Oct 2007. To cite this article: GEUNBAE LEE , MARGOT FLOWERS & MICHAEL G. DYER (1990) Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing, Connection Science, 2:4, 313-345, DOI: 10.1080/09540099008915676 To link to this article: http://dx.doi.org/10.1080/09540099008915676 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Embed Size (px)

Citation preview

Page 1: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

This article was downloaded by: [University of Cambridge]On: 04 December 2014, At: 02:26Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Connection SciencePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/ccos20

Learning Distributed Representations of ConceptualKnowledge and their Application to Script-based StoryProcessingGEUNBAE LEE a , MARGOT FLOWERS a & MICHAEL G. DYER aa Artificial Intelligence Laboratory, 3532 Boelter Hall,Computer Science Department ,University of California , Los Angeles, CA, 90024, USA Phone: 213-825-2303. E-mail:Published online: 24 Oct 2007.

To cite this article: GEUNBAE LEE , MARGOT FLOWERS & MICHAEL G. DYER (1990) Learning Distributed Representationsof Conceptual Knowledge and their Application to Script-based Story Processing, Connection Science, 2:4, 313-345, DOI:10.1080/09540099008915676

To link to this article: http://dx.doi.org/10.1080/09540099008915676

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Connection Science, Vol. 2, No. 4, 1990 313

Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

GEUNBAE LEE, MARGOT FLOWERS & MICHAEL G. DYER

We propose a new method for developing dism'buted connectionist representations in order to serve as an adequate foundation for consmcting and manipulating conceptual knowledpe. In our approach, distributed representations of semantic relations f ie. - - propositt%ns) are formed by -recirculating thb hidden lay& in two auto-associa;ive recurrent PDP (parallel dism'buted processind network, and our experiments show that -. the resulting disnibuted semantic representations (DSRs) have many desirable properties such as automaticity, portability, smcrure-encoding ability and similady-based dism'- buted representations. We have consmcted a symbolic/connectionist hybrid script-based story processing system DYNASTY (DYNAmic STory understanding system) which incor- porates DSR learning and 6 script related processing modules. Each module communi- cates through a global dictionary, where DSRs are stored. DYNASTY is able to ( I ) learn similarity-based dism'buted representations of concepts and events in everyday scriptal experiences, (2) perform script-based causal chain completion inferences according to the acquired sequential knowledge, and (3) perform script role association and rem'eval during script application.

1. Introduction

There has been an ongoing debate over whether distributed/holographic or localist/ punctate representations should be used to represent high-level knowledge. Feldman (1986) has given arguments against both extreme punctate and holographic representa- tions. PDP researchers, such as Rumelhart et al. (1986b), have listed numerous advantages that distributed representations have over localist representations. At the same time, a number of techniques, e.g. backpropagation (Rumelhart et al., 1986a) and extended backpropagation (Miikkulainen & .Dyer, 1988), have been developed for forming distributed representations. Such representations also include conjunctive and coarse codings (Hinton et al., 1986), microfeature based representations (Waltz &

Geunbae Lee, Margot Flowers & Michael G. Dyer, Artificial Intelligence Laboratory, 3532 Boelter Hall, Computer Science Department, University of California, Los Angeles, CA 90024, USA. [email protected], [email protected], [email protected]; 213-825-2303. This research was supported in pan by a contract from the ]TF program of the DoD, monitored by JPL, and by an ITA Foundation Grant. The simulations were carried out on equipment awarded to the UCLA A1 Laboratory by Hewlett Packard. Thanks to Risto Miikkulainen for providing a clustering code and to Trent Lange and John Reeves for proof reading a draft of this paper.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 3: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

314 Geunbae Lee, Margot Flowers & Michael G. Dyer

Pollack, 1985; McClelland & Kawamoto, 1986) and tensor product representations (Smolensky, 1987a; Dolan & Smolensky, 1989; Dolan, 1989).

Developing distributed semantic representations (DSRs), which are able to support higher-level reasoning and represent conceptual knowledge, is not an easy task. Whereas symbolic representations start with a random bit string like ASCII code and build structural relationships between meaningless symbols to represent conceptual knowledge, distributed (or so-called 'subsymbolic' (Smolensky, 1988)) representations must encode both structures and semantics below the symbolic level, namely, as a pattern in an ensemble of neuron-like elements (i.e. creating a 'connectionist symbol').

We propose a new method for developing distributed connectionist representations in order to serve as an adequate foundation for constructing and manipulating conceptual knowledge.

In our approach, distributed representations of semantic relations (i.e. proposi- tions) are formed by recirculating the hidden layer in two auto-associative recurrent PDP (ARPDP) networks. Our resulting distributed semantic representations (DSRs) are stored in a global dictionary (GD) and are used in D m A s n (DYNAmic Story understanding system), which processes script-based stories. The DSRs have many desirable properties for distributed connectionist representations such as automaticity, portability, structure-encoding ability and similarity-based distributed representations.

The global dictionary has two entry points, one for the symbolic names and the other for the distributed patterns (DSRs), so that we can retrieve DSR patterns from the symbolic names and vice versa.

The eventual objective of DSR research is to develop distributed knowledge representations that can be utilized in high-level reasoning systems. Just as the von Neumann symbolic representation is utilized as a building block in symbolic A1 systems, we want to use DSRs as building blocks in connectionist or connectionist/ symbolic hybrid models (Dyer, 1990) which are able to support such tasks as natural language processing.

One example is a connectionist script processing system. A script is a knowledge structure of stereotypic action sequences (Schank & Abelson, 1977; Dyer et al., 1987). According to psychological experiments (Bower et al., 1979), people use scripts to understand and remember narrative texts. But proposed symbolic A1 models of script processing (e.g. SAM (Cullingford, 1978)) have many unresolved problems: (1) They are too rigidly defined, so they cannot handle script deviations properly. (2) It is difficult to invoke the right script for the input story fragments using proposed script headers (Schank & Abelson, 1977; Cullingford, 1978). (3) There is no explanation about how the original script might be automatically acquired, so there remain difficult knowledge engineering problems in script formation; e.g. how many tracks (e.g. fast food versus drive-in restaurants) should there be?

A number of neurally inspired connectionist script-processing models have been proposed to overcome weaknesses in the symbolic models (Golden, 1986; Chun & Mimo, 1987; Rumelhart et al., 1986c), but while they have nodes for their objects and events, none of them has the semantics needed for representing constituency of concepts and events in their node representations. Dolan & Dyer (1987) are the first to consider micro-feature based representations in connectionist script processing to make their representations have similarity properties: similar concepts have similar representations. But as noted in Dyer et (11. (1988) and Miikkulainen & Dyer (1988) micro-features are arbitrary, lack recursive/hierarchical structure and create a knowl- edge engineering bottleneck.

In this paper we show how DSRs can be used as a basic concept representation

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 4: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 315

scheme which can be integrated into event representations for script processing. The use of DSRs for object/event representations automatically creates similarity-based representations and therefore exhibits more generalization and fault-tolerance proper- ties.

We have developed a modular distributed connectionist architecture called DYNASTY (DYNAmic STory understanding system) based on automatically learned distributed semantic representations (DSRs). DYNASTY takes simple script-based stories as input, e.g.:

John entered the Chart-House. John ate the steak. John left a tip.

and produces a completed paraphrase, with all script-based events fully expanded.

John entered the Chart-House. The waiter seated John. The waiter brought the menu. John read the menu. John ordered steak. John ate the steak. John paid the bill. John left a tip. John left the Chart-House for home.

In the rest of the paper, we explain how DSRs are formed by using recurrent PDP networks. Then we present a global view of our DYNASTY system, and explain each of the script-based story processing modules in detail.

2. Forming Distributed Semant ic Representations

In this section we show how DSRs may be formed from the input propositions and demonstrate their validity for the task of encoding word meanings.

There are two alternate views on the semantic content of words: (1) The structural view defines a word meaning only in terms of its relationships to other meanings. (2) The componential view defines meaning as a vector of properties (e.g. microfeatures). We take an interim view-that word meaning can be defined in terms of a distributed representation of structural relationships, where each relationship is encoded as a proposition. Examples of propositions are verbal descriptions of action-oriented events in everyday experiences (e.g. John entered the Chart-House).

Below are the notational conventions which will be used throughout the paper. All propositions (or events) will be labelled pl , p2 (or evl, ev2,. . . ), etc. All case-role names will be in small capital letters (e.g. AGENT, ACT, OBJECT). All word concepts (abbreviated as w-concept) for which there is a DSR will be in bold (e.g. milk, straw). All triples will be in square brackets, e.g. [milk, OBJECT, pl]. The semantic content of a sentence, for example, 'The man drinks milk with a straw', can be represented as a proposition, which is a set of [proposition-label, case-role, w-concept] triples such as:

[pl, ACT, drink] [pl, AGENT, Inan] [p 1, OBJECT, milk] [pl , INSTRU, straw]

where each w-concept (e.g. drink) has its own representation.

2.1. Representing DSRs

The intuition behind DSRs is based on our observation that people sometimes learn the meanings of words through examples of their relationships to other words. For example, after reading the 4 propositions below, the reader begins to form a hypothesis of what kind of meaning the word foo should have.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 5: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

3 16 Geunbae Lee, Margot Flowers & Michael G. Dyer

epl: The man drinks foo with a straw. ep2: The company delivers foo in a carton. ep3: Humans get foo from cows. ep4: The man eats bread with foo.

The meaning of foo should be something like that of milk. The interesting fact is that the semantics of too is not fixed; rather it is gradually refined as one experiences more propositions in varying environments. In other words, the semantics of foo is based on the usage of the word foo. T o develop DSRs based on propositions, we have to define the structural relationships between concepts with respect to those propositions. For action-oriented propositions, we use thematic case-roles, originally developed by Fillmore (1968), and extended in several natural language processing systems (Schank, 1973; Schank & Riesbeck, 1981). The case-roles used here are A G E N ~ , O B F ~

(PATIENT), CO-OBJECT, INSTRUMENT, FROM (SOURCE), TO (GOAL), LOCATION, and TIME. For example, the DSR of milk is now defined as the composition of structural relationships, e.g, with respect to the 4 propositions above. These are then combined as follows:

mi lk=F (G(OBJECT, pl), G(OBJECT, p2), G(OBJECT, p3), G(C0-OBJECT, p4))

where milk is the meaning representation of milk; F is some integration function over all propositions involving milk and G is some combination function of structural - - - relationships with respect to the corresponding propositions. In the same way, each proposition itself is defined as the composition of the constituent thematic case-role components that are themselves combhations of smctural relationships with their corresponding meaning representations of other words:

pl = F (G(AGENT, man), G(ACT, drink), G ( O B J E ~ , milk), G(INSTRUMENT, straw))

For the above two formulas, the arguments for the function G are represented as patterns of activation in two banks of a layer of a recurrent PDP network. The function G operates by compressing its arguments into the hidden layer of the PDP network. The function F operates by recycling each compressed pattern (in the hidden layer) back into the input layer. The architecture that implements these operations is described in more detail in the next section.

2.2. Learning DSRs

We use xws (extended recursive auto-associative memories) for automatically learning DSRs (Lee et al., 1989a). x w s are based on w s , originally developed by Pollack (1988). Pollack showed that ws could be used to encode recursive data structures, such as trees and lists, by feeding the compressed representations in the hidden layer back into the input/output layers. w s , however, lack an external storage for each representation formed. In contrast, w s make use of a global dictionary (GD) to store and retrieve these compressed representations. The G D is a distributed lexicon network which contains the concept name along with its DSR pattern. The basic idea of XRMM is to recirculate (Dyer et al., 1989) the developing internal representation (hidden layer of the network) back out to the environment (input and output layers of the network) using a global dictionary as a symbolic memory.

Each w contains a symbolic memory (global-dictionary or proposition-buffer)

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 6: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 3 17

and a 3-layer ARPDP (auto-associative recurrent PDP) network. The input and output layers of each network have 3 banks of units: bankl, bank2, bank3. These banks represent either a proposition [proposition-label, case-role, w-concept] or a word concept [w-concept, case-role, proposition-label] as triples. For example, the proposi- tion (p l ) 'The man drinks milk with a straw' would be represented as the triples [pl ACT drink], [pl AGENT man], [pl OBJECT milk] and [pl INSTRUMENT straw]. Similarly, the word concept man would be represented as the triples [man AGENT pl], [man AGENT p2], [man OBJECT p3], etc. where p2 is some other proposition in which man is the role of AGENT, and p3 is still another proposition in which m a n is an O B ~ C T of the action. After each of the 3 banks is properly loaded with the elements of a proposition, the DSR emerges in bankl by an unsupervised auto-associative BP (backward propagation) learning algorithm (Rumelhan et al., 1986a).'

The DSR learning procedure consists of two alternating cycles: concept encoding and proposition encoding. Below we describe each cycle. In each, all concept and proposition representations start with a don't know pattern (i.e. 0.5 in all units), when the activation value range of each unit in the network is 0.0 to 1.0. The case-role representation (for AGENT, OBJECT, CO-OBJECT, etc.) is fixed, using orthogonal bit patterns for minimizing interference (Figure 1).

PCT

AGBJT

CaJm

co-OW INSTRU

m lo w n a v TIME

D 1 1 1 1 1 1 1 1 1

1 = 1 1 1 1 1 ~ 1 1

1 1 = 1 1 1 l f 1 I

I I I I I I I I I I

1 1 1 1 1 1 1 1 1 1

I I I I I I I I I I

I I I I I I I I I I

I I I I I ~ I I I I

I I I I I I I I I I

Figure 1. Case-role representations.

Figure 2 shows the information flow during the concept-encoding cycle in the DSR learning architecture.

Concept Encoding Cycle

(1) Pick one word concept to be represented, say CON1. (2) Select all relevant triples for CON1. In the milk example, they should be

triples like [milk, OBJECT pl], [milk, OBJECT p2], [milk o ~ j ~ c r p3], and [milk, co-OBJE~, p4). For the first triple, load the initial representation for CON1 into bankl.

(3) Load the case-role into bank2, and load its corresponding filler (i.e. proposi- tion) into bank3. In the milk example, for the first triple, [bankl, bank2, bank31 is loaded with bit patterns for [milk, OBJECT pl].

(4) Run the auto-associative BP algorithm, where the input and output layers have the same bit patterns.

(5) Recirculate the developed (hidden layer) representation into bankl of both the input/output layers and perform step3 to step5 for another triple until all triples are encoded.

(6) Store the developed DSR into the global dictionary and select another word concept to be represented. Perform step2 to step6 for all the word concepts.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 7: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

318 Geunbae Lee, Margot Flowers 6' Michael G. Dyer

.nkl bank2 bank3

cept-encoding net

\ case-triple

[w-concept.case-rolepoposition]

e.g. [ m i l k . CBJECT . pl]

input L -

teaching praposition.buller

L(propositl0n-encoding net

Fig. 2a

m i l k ' coo& p4

m i l k OW p3

Fig. Zb

Figure 2. Concept encoding cycle in the DSR learning architecture. Each number in parentheses in (a) designates a corresponding procedure number in the text. The primes on each 'milk' in (b) indicate that DSRs are constantly changing to reflect the new propositional relations. The black lines in (a) show information flow in the concept-encoding cycle, while the grey lines show the information flow in the

proposition-encoding cycle (illustrated in Figure 3).

hoposi t ion encoding cycle. Basically this cycle undergoes the same steps as the concept encoding cycle except that, this time, we load bankl, bank2, and bank3 with (respectively) the proposition to be represented, the appropriate case-role, and its corresponding concept representation (DSR). The result of the encoding is stored in

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 8: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 3 19

the proposition-buffer. Figure 3 shows the information flow in the proposition- encoding cycle in the same DSR learning architecture.

global-dictionary (GD)

case-triple [proporition.case-roieewwconcept]

Vproposition-encoding net /

. . p t ' INSTRU straw

m i l k Enxde DEccde

h

p l " " ACT dr ink

Flg. 3b

Figure 3. Proposition encoding cycle in the DSR learning architecture.

Notice, to encode a proposition (e.g. pl), the DSRs for w-concepts (e.g. drink, man, etc.) appearing in that proposition must be accessed from the G D and used in the proposition encoding process. Likewise, to form the DSR of a w-concept, the distributed representations of all propositions containing that w-concept must be

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 9: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

320 Geunbae Lee, Margot Flowers 6' Michael G. Dyer

accessed from the proposition buffer. So concept encoding relies on proposition encoding and vice versa. Consequently, the overall DSR learning process is:

(1) Perform the entire concept encoding cycle for all the w-concepts. (2) Perform the entire proposition encoding cycle for all the propositions. (3) Repeat step1 and step2 until we get stable DSR patterns for all the concepts to

be encoded. I

In this process, the composition function F is embodied in the dynamics of the RMM

stacking operation (Pollack, 1988, 1990) and the combination function G is embodied by compressing (in the hidden layer) the concatenation of representations in the three input banks. So what the x w architecture does is form a representation by compressing propositions about a concept into the hidden layer and then use those compressions in the specification of propositions that define other concepts, and then recycle the compression formed for this concept back into the representation of the original concept (doing this over and over until all DSRs stabilize). Thus each DSR has in it the propositional structure that relates it to other concepts, where each of those concepts are also DSRs. The proposition-encoding network provides the neces- sary propositional representations for w-concept encoding, and the proposition-buffer is a temporary storage for these proposition representations. This symbol recirculation method produces what can be viewed as generalizations of Hinton's 'reduced descrip- tions' (Hinton, 1988).

The decoding process2 is the reverse process of encoding. We load the concept representations (DSRs) into the hidden layer of the concept-encoding network and perform value propagation from the hidden layer to the output layer until we get the desired case-role relationship in bank2 and proposition in bank3 of the output layer. Next, we load the resulting proposition representations into the hidden layer of the proposition-encoding network and get back the constituent case-role relationships and concept representations. Figure 4 shows the decoding architecture.

For example, if the DSR for milk is loaded into the hidden layer of the concept- encoding network, then a [milk', co-OBJECT, p4] triple will appear in the output layer. Similarly, if we load that p4 into the hidden layer of the proposition-encoding network, the [p4', Co-OBJECT, milk] triple will appear in the output layer. All concept and propositional information originally encoded can be extracted by recycling. Since we can think of each DSR as a stack of (case-role, proposition) pairs, the decoding operation is like a stack-popping operation. We get constituent pairs in a last-in-first- out (LIFO) fashion. Once the DSR is completely stabilized, the decoding performance does not degrade when the popping position varies from stack top to bottom.

2.3. Experiments for DRS Forming

We conducted a number of experiments to see how well x w networks learn DSRs for nouns and verbs. We used proposition generators similar to the ones used by McClelland & Kawamoto (1986) and made up over 60 propositions replacing each category by proper fillers in the proposition generators. We analyzed each proposition's case structure in order to load them into our network architecture. Table I shows proposition generators with their case structures and Table I1 shows concept categories with their fillers.

In this simulation, both the concept encoding network and proposition encoding network have a 30 unit input layer (each bank has 10 units), a 10 unit hidden layer, and a 30 unit output layer. So the DSR and proposition representation size are 10

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 10: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 321

wncepl-encoding net

DSRS

.rnllk

COP

case-role interim down-load

w-concept e.g. co-oar e.g.rnilk' 1

interim case-role w - c o n c e ~ t

Figure 4. Decoding architecture for DSRs. The examples are according to the tree in Figure 2b (see also the example propositions in the Section 2.1). The decoding

sequence is from the tree root to leaves (the opposite sequence of encoding).

Table I. Proposition generators. The proposition generators are presented with their proposition numbers and case structures. Each category slot ( e g human) can be filled with any of the concepts in Table I1 (e.g. man). The OBJECT role in the case structures is different from the category name

'object' in the proposition generators.

P numb. P. Gen. Case structures

human ate human ate food human ate food with food human ate food with utensil animal ate human broke fragile-object human broke fragile-object with breaker breaker broke fragilc-object animal broke fragile-object fragile-object broke human hit thing human hit human with possession human hit thing with hitter hitter hit thing human moved human moved object animal moved object moved

AGENT-ACT

AGENT-ACT-OBJECT

AGENT-ACT-OBJECT-COOBJ

AGENT-ACT-OBJECT-1NST

AGENT-ACT

AGENT-ACT-OBJECT

AGENT-ACT-OBJECT-INST

INST-ACT-OBJECT

AGENT-ACT-OBJECT

OBJECT-ACT

AGENT-ACT-OBJECT

AGEm-ACT-OBJECT-COOBJ

AGENT-ACT-OBJECT-INST

INST-ACT-OBJECT

AGENT-ACT

AGENT-ACT-OBJECT

AGENT-ACT

OBJECT-ACT

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 11: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

322 Geunbae Lee, Margot Flowers 0 Michael G. Dyer

Table 11. Categories and their filler concepts

Categories Concept fillers

human animal object thing food utensil fragile-object hitter breaker possession

man, woman dog, wolf ball, desk human, animal cheese, spaghetti fork, spoon plate, window ball, hammer hammer, rock ball, dog

units. Figure 5 shows DSRs learned for a number of nouns and verbs. These are snapshots of 120 epochs, where one epoch is 200 cycles of autoassociative BP for each concept and proposition. Notice that the learned representations are similarity-based according to the concept categories, that is, words in the same semantic category have similar representations. Interestingly, words with multiple categories (e.g. dog) develop less similar representations compared with the words with a single category (e.g. wolf). This is because words with multiple categories can be considered to have multiple usages. For example, the word 'dog' is used as both the AGENT and co-OBJECT in the proposition generators.

In order to see this similarity structure more clearly, we have run the merge clustering algorithm (Hartigan, 1975) on the learned DSRs. Figure 6 shows the clustering analysis results. We can see that the DSRs in the same category start to merge together.

Even if the two DSRs are in the same category, the clustering steps are different according to the homogeneity of their usages. For example, cheese and spaghetti are clustered at early time steps since they are mainly used as OBJECTS, but dog and possession are clustered at later time steps because dog is also used as an AGENT (in animal category) as well as a Co-OBJECT (in possession category). The somewhat non- intuitive clustering of human with food can be explained in the same way. The human also has multiple usages; that is, human was used as both AGENT and o~ficr . (Note that human is also a concept filler for the thing category.) But since human and food are not in the same category, they are clustered at a later step (step 13 in Figure 6).

Interestingly enough, the representation of each proposition also exhibits similarity structures (Figures 15 and 16 in Section 4.2), i.e. propositions involving similar case- roles and fillers have similar representations. These representations for the proposi- tions can also be regarded as higher-level representations for event structures. We postulate that this kind of event representation could be used in connectionist schema processing systems such as reported in DoIan & Dyer (1987) and Lee et al. (1989b).

DSRs show many similar characteristics to those reported in Miikkulainen & Dyer (1988, 1989), but unlike FGREP representations, DSRs appear to be more portable because they are encoding propositional content directly. Each DSR can also recon- struct its constituent information through the decoding process. Moreover, DSRs are learned independent of any panicular processing task, so the representations should be useful in any task requiring access of the propositional content of word meanings.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 12: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 323

HUMAN -1 h l 1 1 I I I I human - -

I I 1 1 1 1 11 I I l m a n

I I I I3 I I I I I jwoman

ANIMAL I I -1 I I = I I l animal

I I I I I I - I l l A % wolf

I I I I I I I l l @

THING 1 < * P l 11 I I I I I I I th,ng

I I, I I I 1 I I l human

I I w 1 I I l I I animal

UTENSIL Isav*l I I I 1 I I I

I I I I I. I I I I 1 hammer

1 a I I I I I lroCk

Figure 5. Learned DSRs of nounsherbs with their concept category. The experiment was done using momentum accelerated backpropagation (Rumelhart et al., 1986a, p. 330). Learning rate varies from 0.07 to 0.02; momentum factor varies from 0.5 to 0.9. There are 120 epochs used for learning each concept and proposition; one epoch is 200 cycles of auto-associative backpropagation. The value range is 0.0-1.0 continuous

which is shown by the degree of box shading.

DSRs also show many similar properties to the recursive distributed representations (RDR) of Pollack (1990) with respect to recursiveness and structure encoding/decod- ing. But unlike RDRs, DSRs have word-level semantics in them so that they can be utilized not only in syntactic-level applications (Chalmers, 1990) but also in concep- tual-level applications such as script-based story processing.

3. DmAsTY: a Distributed Connectionist Script Applier

Here we describe a modular distributed connectionist architecture called D m A s n

(Lee et al., 1989b) which utilizes automatically learned distributed semantic represen- tations (DSRs). D m h s n takes simple script-based stories as input, such as going to a restaurant, attending a lecture, grocery shopping, visiting a doctor, etc. and produces a

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 13: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

324 Geunbae Lee, Margot Flowers & Michael G. Dyer

Figure 6. Merge clustering the learned DSRs. The numbers designate the time step. At each step, the clusters with the shortest average Euclidean distance were merged.

completed paraphrase, with all the script-based events fully expanded (e.g. the full restaurant story in Section 1).

There are three major tasks that DYNASTY must solve in order to handle these examples: (1) DYNASTY must learn distributed semantic representations (DSR) for both concepts and events automatically from its input script data (DSR-learner and event-encoder module), (2) DYNASTY must learn sequential knowledge to do causal- chain completion inferencing (script-recognizer and backbone-generator module), (3) DYNASTY must bind script roles with their fillers for later retrieval of role bindings (role-binding module). Each of these tasks is performed by a separate network module.

3.1. System Architecture

DYNASTY has two different types of PDP modules in its system architecture: ARPDP

(auto-associative recurrent PDP) and HRPDP (hetero-associative recurrent PDP) modules. Each module is not an entirely new architecture. For example, Pollack (1988) used an ARPDP architecture, called RMM, to generate recursive distributed representations of stacks and parse trees. HRPDP architecture has been used by many researchers (e.g. Hanson & Kegl, 1987; Allen, 1988; Elman, 1988; St John & McClelland, 1989) for several applications: natural language question-answering (Allen, 1988), parsing (Hanson & Kegl, 1987) and sentence comprehension (St John & McClelland, 1989). Here, our HRPDP architecture is mostly similar to the one used by Elman (1988).

What is new is that DYNASTY uses these different PDP sub-architectures as

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 14: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 325

modular components, communicating via a global-dictionary (GD) of DSRs to achieve a high-level task, namely script-based story processing.

Besides being modular, DYNASTY also employs a functional decomposition ap- proach, i.e. each module is classified according to its functiodtask in the system. Figure 7 shows the overall DYNASTY architecture during the training phase.

scr-type, event (backbone) sequence pair

8 event-triple ~ackbone- ene era tor ) I DSRs ::;;he + Script-Recognizer 1 - 7

event-triple vent-~ncoder) f ii::trn

event (instance) sequence, scr-type pair

Figure 7. DYNASTY system architecture during training phase. The ovals represent PDP modules, while the box represents a DSR memory. The lines designate uni-/bi-directional data flow. Each module is separately trained with its own training data. The grey lines show the provided training data. Object-triple is of the form [w-concept, case-role, event-

label] and event-triple is of the form [event-label, case-role, w-concept].

The global dictionary (GD) is implemented as a static symbol table and has as entries (symbolic-name, DSR) pairs. The G D allows the system to retrieve a DSR pattern when given a symbolic name and retrieve a symbolic name when given a DSR pattern as an input.

The DSR-learner consists of two XRAAM modules, and their functions are to develop distributed semantic representations (as described in Section 2.2) for w- concepts in an input script-based story. The went-encoder consists of one ARPDP

module, and develops event-level representations in a manner similar to the proposi- tion-encoding network in the DSR-learner. The s&t-recognizer consists of one HRPDP module, and its function is to recognize the correct script from the input event sequences. The backbone-generator consists of one HRPDP module, and its function is to generate complete event sequences from a chosen script. The event-encoder, script- recognizer and backbone-generator are described in the next section.

3.2. Description of Each Module during Training Phase

3.2.1. Event-encoder module. This module produces similarity-based event representa- tions from DSRs and event case-triples3 for the script events. Figure 8 shows the event-encoder module.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 15: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

326 Geunbae Lee, Margot Flowers 0 Michael G. Dyer

event case-triple

COPY

r v , (evlo') l0 Chart-House - (evlo") AGW John

E m m (evlO"') KT entered

Figure 8. Event-encoder module. The event encoding/decoding procedures are iden- tical to the proposition encoding/decoding procedures described in the Section 2.2. The primes in the 'evlO' indicate that the event representations keep changing until

stabilized to reflect the accumulation of encoded event triples.

Basically this module functions in a similar way to the proposition-encoding network in the DSR-Learner module and the training procedure is identical to the one previously described (see the proposition-encoding cycle in Section 2.2). But we can not use the proposition-encoding network in the DSR-learner directly for this purpose, because the representations for the propositions keep getting affected by the continu- ously changing DSRs. Here we need to build the event representations using the stabilized DSRs so that we can decode the event representations back into the constituent DSRs and case-role representations.

The same decoding procedure (e.g. see Figure 4) is used to decode the constituent triples from the event representations. For example, from the evlO representation, the decoding procedure extracts the constituent triples the [evlO, TO, Chart-House], [evlO, AGENT, John] and [evlO, ACT, entered].

3.2.2. Scripr-recognizer module. The script-recognizer's function is to recognize a complete, single script type from specific partial event sequences. Figure 9 shows the architecture with the orthogonal script representations used in DYNASTY.

This module is trained to associate the complete even sequences in several script instances with that script-type representation. In the training phase, the inputs are specific script instances. By script instances, we mean the full event sequences with

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 16: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

context

Learning DSRs 327

event instance pattern

evlO: John entered Chart-House evl 1 : waiter seated John

i n p u t ev3: waiter brought menu 0 0

ev18: John left Chart-House for home

teaching input bank3

script type pattern e.g. (output) $restaurant

script-type pattern

on $restaurant $attending-lecture $shopping

EM $visit-doctor

Figure 9. Script-recognizer module. The event representations come from event- encoder module, and script-type representations are orthogonal to each other for best performance. The $ in the script-type pattern designates a script name. Each event is an event instance, i.e. each script-role (e.g. customer) has been replaced by its instance (e.g. John). Note also that the event number is not contiguous, since an event number

is arbitrary and does not affect the processing.

each and every script role replaced by proper instances. The teaching inputs are the correct script representations.

Training procedure (for one script):

Load bankl with initial 'don't know' pattern. Choose one complete event sequence to be a script and load bank2 with the first event representation. Load bank3 with the script pattern we want to be associated with the chosen event sequence. Do hetero-associative BP. Recirculate the hidden layer pattern to bankl. Load bank2 with the next event representation in the sequence. Repeat steps 4 to 6 for all the events in the script. Repeat steps 1 to 7 for all the script instances.

We use orthogonal bit vectors for each script to minimize the interference between script patterns. These are the representations of script types, not their instances. We do not maintain any explicit representations for each script instance (script plus its bindings). Script instances are the sequence of instance events and are later obtained from the backbone event sequence using a role binding scheme. For example, from the input event sequences such as the representations of 'John entered the Chart-House.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 17: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

328 Geunbae Lee, Margot Flowers Michael G. Dyer

John ate steak. John left a tip': the script-recognizer produces the restaurant script- type representation (shown in Figure 9) during the performance phase.

3.2.3. Backbone-generator module. The backbone-generator is trained to associate a script-type representation with the complete event sequence in the script, and there- fore to produce the script backbone event chain (e.g. the main events of a restaurant script but without any bindings) when a script-type representation is given as an input. Figure 10 shows the architecture, which uses the same HRPDP architecture as used in the script-recognizer module, but with different input/output.

context \ ScriDt-type pattern

9.g. (input) $restaurant

pattern e.g. (output)

evl : customer entered restaurant-name ev2: waiter seated customer ev3: waiter brought menu

a 0

ev9: customer left restaurant-name for home.

Figure 10. Backbone-generator module. This. module uses the same script-type representations as in the script-recognizer. Each event produced is a backbone event,

i.e. the script roles are not replaced yet with their bindings.

The training procedure is similar to the script-recognizer module, except that, at this time, we load a script-type into bank2 and event representations into bank3. For example, after the script-recognizer produces the restaurant script-type representation in the above section, the backbone-generator produces the representations for the backbone events (from this script-type representation) such as 'Customer entered restaurant-name. The waiter seated customer. The waiter brought menu. Customer read menu. Customer ordered food. Customer ate the food. Customer paid the bill. Customer left a tip. Customer left restaurant-name for home'. These event repre- sentations for the backbone-events are decoded into the triple forms using the event-encoder module. The script roles (e.g. customer, restaurant-name, etc.) are also w-concepts and the representations are learned using the DSR-learner module in the same way tha the other w-concepts are learned. In other words, the training data for DSR-learner contains propositions 'Customer entered restaurant-name' as well as 'John entered Chart-House'.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 18: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 329

3.3. Story Processing during the Poformance Phare

After training each module in parallel with its own training data, the next step is to perform the actual story processing task. Figure 11 shows the DYNASTY architecture during performance phase with the actual information flow during story processing.

output event triple (5) output

(role) event triple

(~ackbone- en era tor ) Parser scr-type A

' n e v e n t - t r i p l e Script-Recognizer pattern (4) decode event 1 L p 9 ;;;;;rn ra t tern 1 1

input story output input story event-triple

Figure 11. D Y N A S ~ system architecture during performance phase. Each number in parentheses designates the actual performance steps taken in the text (Section 3.3).

The Parser and surface-generator are not yet implemented.

The following steps show how DYNASTY processes story input, such as the restaurant-going story in section 1:

(1) Get the event-triple forms using the Parser module. (2) Look-up each symbol in the G D and access its DSR pattern. (3) Build event representations using event-encoder. (4) Recognize a correct script using the script-recognizer. (5) Generate a complete back-bone event sequence of the script using the

backbone-generator (filling-in missing events). (6) Decode each event representation into its constituent case-roles and concept

DSRs and make each into an event-triple form. (7) Do the script-role binding procedure (section 3.4). (8) Look-up DSRs in the GD, and change DSR patterns to their symbolic forms. (9) Generate output story from event-triple forms using the surface-generator

module.

Steps 1 and 9 assume that the Parser module and surface-generator module (currently not implemented) take symbolic forms as input. Table 111 shows the sample perform- ance trace of DYNASTY with the input and their interactive outputs for each step in the above procedure.

3.4. Role-binding during the Performance Phase

Script role-binding is the process of associating the script roles with proper fillers, so

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 19: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

330 Geunbae Lee, Margot Flowers & Michael G . Dyer

Table 111. Sample trace with given input. P(x) designates pattern of symbol x. Each triple is of the form [event-label, case-role, w-concept]. Each case-role is represented

in capital letters

Input

Stepl

John entered Chan-House. John ate steak. John left a tip

[evlO, ACT, entered], [evlO, AGENT, John], [evlO, TO, Chart-House] (ev14, ACT, ate], [ev14, AGENT, John], [ev14, OBJECT, steak] 1 ~ 1 6 , ACT, left], [ev16, AGENT, John], [ev16, OBJECT, tip]

same as above with each symbol replaced by its DSR except 'ev' symbol

pattern of evlO, ev14, ev16

pattern of restaurant script type

pattern of evl, ev2, ev3, ev4, ev5, ev6, ev7, ev8 and ev9 (restaurant script backbone events)

Step8 same as above with each pattern replaced by its symbol

Step9 (output) John entered Chan-House. Waiter seated John. Waiter brought menu, John read menu. John ordered steak. John ate steak. John paid bill. John left a tip. John left Chan-House for home.

that correct paraphrasing can be produced. This process is more restricted than general variable binding in pure symbolic systems, in the sense that the script roles are not content-free symbols, but are themselves concepts, e.g. 'customer' in the restaurant script.

Figure 12 shows our role-binding architecture. The script-role-binder is a control program which executes the role-binding procedure (described below). Comparison of each DSR is done by accessing the GD, that is, checking whether the two DSRs access the same symbol. The procedure utilizes the fact that there must exist a matching output backbone-event in the backbone-generator output corresponding to each input event from the input story. But the reverse is not true since we can recognize a script from partial event sequences. In other words, the mapping function from input event space to output event space is irreversible.

Role-binding procedure:

( I ) Fetch the first input event triple (in the input story) from the event-encoder, say [evlO, ACT, entered], [evlO, A G E N T , J O ~ ~ ] and [evlo, TO, Chart-House].

(2) Takc the first output backbone event, say evl. (3) Decode evl into its case triple forms such as [evl, ACT, entered][evl,

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 20: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 33 1

Backbone-Generator 7 output backbone-event pattern

Event-Encoder

input event

decoded output event triple

(scr-role)

Binding-Table script role

output event triple (instance)

n m m (customer)

Figure 12. Role binding architecture during story processing.

(John)

AGENT, customer] and [evl, TO, restaurant-name]. In this backbone event, the w-concept customer and restaurant-name are the script roles.

(4) Compare the case-role am's filler in the input and output events. In this case, the entered in evl and entered in evlO are compared.

(5) If the two fillers are the same, make a (script-role, instance) pair by extracting the fillers for the same case-roles in the two events, and store them into the binding-table. For example, in this case, the pair (customer, John) and (restaurant-name, Chart-House) are stored. Now fetch the next input event triple.

(6) If the two fillers are not same, skip that output event. (7) Take the next output backbone event and repeat steps 3 to 6. (8) Replace each script role in the output backbone events with its instance by

accessing the binding-table.

For example, in Table 111, the first input event's ~ c r filler is entered. Since this is the same as the first output event's ~m filler, the pairs (customer, John) and (restaurant- name, Chart-House) are stored in the binding-table. When the second output event is compared with second input event, since their ~ c r fillers are different (seated vs ate), that output event is skipped. After all the comparisons are performed, the binding- table maintains the completed (script-role, instance) list: (customer, John), (restau- rant-name, Chart-House) and (food, steak). When the script-role-binder accesses the binding-table, it matches the minimum Euclidean distance pattern for each script-role in the backbone-event, and fetches the instance of that pattern. Then the script-role is replaced with that instance pattern.

When DYNASTY processes the new story, the old binding-table is cleared. So if John is at some point associated with the customer script-role during story processing, the binding (customer, John) can only propagate until the end of that story processing. When the system reads a new story, the previous binding will be forgotten. Therefore, if later DYNASTY reads a story about 'Fred going to Sizzler', it will bind Fred to

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 21: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

332 Geunbae Lee, Margot Flowers &Michael G. Dyer

customer, Sizzler to restaurant-name at this time, and forget about 'John going to Chart-House'.

4. Simulations

4.1. Script Training Data

We selected 4 scripts: going to a resrauranr, taking a lecture, grocery shopping, and visiting a doctor from Bower er al. (1979) and constructed event-triple forms for each. From these 4 scripts, we extracted 73 w-concepts (DSRs) including script roles, and extracted 110 events (propositions) to train the DSR-learner and event-encoder modules. For each script role we have two instances, and by replacing each role by its instance, we constructed 32 script instances (8 instances for each script). Among them, we used 16 script instances to train the script-recognizer and backbone-generator modules, and left the remaining 16 instances for the generalization test.

In this simulation, the two networks in the DSR-learner have the same size as the previous simulation (Section 2.3). The event-encoder has 30 unit input and output layers (each bank has 10 units) and a 10 unit hidden layer. The script-recognizer has a 40 unit input layer (30 units of context bank and 10 units of event bank), a 30 unit hidden layer and a 4 unit output layer. The backbone-generator has a 34 unit input layer (30 units of context bank and 4 units of script bank), a 30 unit hidden layer and a 10 unit output layer. Each DSR and event representation is 10 units, and each script representation is 4 units (one unit 'on' for each script). Each case-role representation is also 10 units.

omasn is trained with complete input (i.e. all events of a script, shown below). During performance, DYNASTY was given instantiated versions of only the starred events (see below), which represent fragments of complete scripts. Then, DYNASTY is tested to see if it can recognize a complete script with all the bindings filled in. Below is our actual script training data with their script roles and instances together. Both of the script-roles and instances are w-concepts (represented using DSRs). Below we use capital letters for the script-roles to distinguish them from their instances. All other w- concepts which are not instances in this training data (e.g. entered, menu, etc.) are also represented using DSRs.

Roles: CUSTOMER, RESTAURANT-NAME, FOOD

Instances: John, Jack, Chart-House, Korean-Garden, steak, short-rib

CUSTOMER entered RESTAURANT-NAME*

waiter seated CUSTOMER

waiter brought menu CUSTOMER read menu CUSTOMER ordered FOOD

CUSTOMER ate FOOD* CUSTOMER paid bill CUSTOMER left a tip* CUSTOMER left RESTAURANT-NAME for home

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 22: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 333

Roles: STUDENT, CLASSROOM, PROFESSOR

Instances: Jay, Susan, roomlol, auditorium, Dr-Minsky, Dr-Turing

STUDENT entered CLASSROOM* SNDENT sat down on the seat* STUDENT took out a notebook STUDENT listened to the PROFESSOR

STUDENT took-notes* STUDENT checked the time STUDENT left the CLASSROOM for home*

Roles: SHOPPER, STORE, ITEM

Instances: Mary, Edward, Safeway, Vons, meat, vegetable

SHOPPER, went to STORE* SHOPPER get cart SHOPPER picked out mEM* SHOPPER waited in line SHOPPER paid to cashier* SHOPPER left store for home

Roles: PATIENT, MAGAZINE, DOCTOR, OFFICE

Instances: Alex, Michael, Newsweek, TV-Guide, Dr-Kim, Dr-Jonson, Dr-Kim's- office, Dr-Jonson's-office

PATIENT went to OFFICE* PATIENT checked in with receptionist* PATIENT sat down on the seat PATIENT read MAGAZINE* PATIENT entered exam-room nurse tested PATIENT

=OR examined PWIENT* PATIENT left OFFICE for home

4.2. Simulation Results

DYNASTY learned to correctly process all 16 trained script instances; in other words, it successfully produced the fully expanded script stories and correctly bound the script roles to the fillers. For the remaining 16 new script instances (4 for each script), DYNASTY also performed correctly, so it generalized appropriately for these 16 instances. For the DSR-learner and event-encoder, the results were taken after 150 epochs of training, where one epoch is 300 cycles of BP training for each and every training data. For the script-recognizer and backbone-generator, the training was stopped at 500 epochs, where one epoch is 10 cycles of BP for each and every data.

The system's success critically depends on the similarity-based distributed repre- sentations (high-level event representations as well as w-concept DSRs) it developed

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 23: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

334 Geunbae Lee, Margot Flowers & Michael G. Dyer

during the training phase. Figure 13 shows some of our learned DSRs for the restaurant script. In the figure, the instance concepts (e.g. John, Jack) for the same script role (e.g. customer) develop similar representations.

RESTAURANT [ I - - , *

Acm customer .. .. I l ^ i a ordered waller L.-B";~Y ate John L-~ seated Jack ... ** tett

res,a"rant.name m I~" l ' . " . : t "~" l I,-,. I . entered Chart-House LI-I E:*~(*I?*-IC**I I I m paid Korean-Garden lr .aI , I

-A- 1 i. lell-lor

food OBJECT sleak short-rib a== .In* a,V, menu

Figure 13. Learned DSRs of concepts (nouns/verbs) for the restaurant scripts. Learning rate=0.07 to 0.02; momentum factor=0.5 to 0.9; 150 epochs for each

concept and event; one epoch=300 cycles of auto-associative backprop.

Figure 14 shows their similarity structures by using a merge clustering technique (Hartigan, 1975). This figure shows 3 major categories of w-concepts such as script role/instance concepts (e.g. John, Jack, customer, etc.), script actions (e.g. entered, paid, etc.) and the remaining concepts used in the script (e.g. menu, bill, etc.). In each category, the most similar concepts (e.g. Chart-House and Korean-Garden) in the script context started to merge t o ~ e t h e r . ~ - -

For the task of script paraphrasing, the high-level event representations as well as the low-level w-concept representations should also develop similarity-based represen- . - tations. Figure 15 shows parts of high-level event representations selected from all 4 scripts. Notice that the similar events (events in which the same script roles are replaced by different instances, e.g. evlO and ev28) developed similar representations.

Figure 16 shows the clustering analysis results on the eventrepresentations. In the figure, similar events in the different script instances (e.g. entering scene in the restaurant script, such as evlO and ev28) started to merge together.

Event-level similarity plays an important role in both paraphrasing and generaliza- tion. For example, when the system is only trained for the event 'John entered Chart- House', and later processes the new event 'Jack entered Korean-Garden', this event- level similarity enables DYNASTY to correctly recognize the restaurant script even though DYNASTY is not trained for this new event. The reason why two events are similar is that every corresponding DSR in the two events (e.g. John and Jack, Chart- House and Korean-Garden) is similar (see Figure 13), so the event-encoder develops the similar representations for the new event on the fly.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 24: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 335

Jack customer waiter '( 1 5 restaurant c h a r t - ~ o u s e v 1 7 Korean-Garden

( 2 2 )

left-for home

entered paid I 1

ordered

seated left

Figure 14. Merge clustering of the learned DSRs.

5. Discussion

5.1. Propm'es of DSRs

DSRs have several desirable properties compared with previous distributed representa- tions.

(1) Automaticity. DSRs are automatically learned by using XRAAMS, rather than built by hand using explicit nodes and links as in the systems of Schank & Riesbeck (1981). In addition, DSRs are better than the hand-coded microfeature-based representations (McClelland & Kawamoto, 1986) in which a PDP knowledge engineer must define each microfeature in advance and hand-code each representational vector.

( 2 ) Portability. DSRs are learned without any dependence on any particular task, so their encoded propositional contents can be ported to any application environment. In other words, DSRs are global rather than locally confined to one training environment, and the DSRs learned in one task environment can be applied in another task environment. T o show this kind of portability more clearly, we also developed a distributed connectionist goal/plan-based story understanding system DYNASTYII (Lee, 1990b) using the same DSR scheme which has been used for DYNASTY (Lee, 1990a). In contrast, the internal representations developed in Hinton's (1986) family tree example cannot be used & another task environment (they are not global) even though they are automatically learned.

(3) Structure encoding. DSRs encode propositional structures with constituencies. Since DSRs are learned by stacking case-role and proposition pairs, we can extract the used case-role patterns and proposition patterns from each DSR. These propositions again can be decoded to return the constituent case-roles and concepts. Therefore DSRs support the answering of structural questions about concepts and events (Feldman, 1986). Because of these structure-encoding abilities, DSRs are composi-

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 25: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

336 Geunbae Lee, Margot Flowers &Michael G. Dyer

TAKE-LECTURE

SHOPPING

[ I l l -

evlO: John enterd Chart-House ev28: Jack entered Korean-Garden

ev l I: waiter seated John ev19: waiter seated Jack

ev41: Jay entered room101 ev48: Susan entered auditorium

ev42: Jay sat down on the seat ev49: Susan sat down on the seat

ev61: Mary went to Safeway ev67: Edward went to Vons

ev62: Mary got a cart ev68: Edward got a cart

' ev8l Alex went to Ihe Dr Km's I I I I I ev89. Mchael aenl lo lho Dr Jonson's

, 1 I I , r, .. , I ev82: Alex checked in with the receptionist I t I I " : , . I ev90: Michael checked in with the receptionisl

Figure 15. E v e n t representations. In each sc r i p t group, events from t h e same scene

(e.g. e v l O and ev28 from the res taurant sc r i p t en te r i ng scene) are adjacent to each

other, a n d deve lop a lmos t i den t i ca l representations.

ev62: Mary got a cart ev68: Edward got a cafl ov82: Alex checked in with the receptionisl ev90: Michael checked in with tho r e c e p t i o n i s 2 evl I : Waiter seated John ev19: Waiter seated Jack ? 6 ) ev42: Jay sat down on ev49: Susan sat down ev81: Alex wenl to Dr. Kim's

ev61: Mary went to Safeway ev67: Edward wenl to Vons

ev89: Michael went to Dr. ~ o n s o n ' s q 5

ev41: Jay entered room101 ev48: Susan entered auditorium I evlO: John entered Chart-House ev28: Jack entered Korean-Garden 1' )

Figure 16. M e r g e c lus te r i ng t he even t representations.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 26: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 337

tional, that is, the semantics of one DSR representation is a function of its constituent case-roles and propositions. Therefore DSRs can be considered as a counter example to Fodor and Pylyshyn's criticism concerning connectionism (Fodor & Pylyshyn, 1988).

(4) Similarity-based representations. DSRs are similarity-based, i.e. similar concepts end up with similar representations in the DSR learning process. This is because similar concepts function as similar case-roles for similar propositions. For example, the concept of milk functions as a similar case-role to the concept of juice in drink type propositions. Thus milk and juice will end up acquiring more similar DSRs.

In D Y N A S ~ simulations, the resulting DSRs exhibit a rather strong similarity property, that is, similar concepts end up having almost the same representations. We postulate that this strong similarity property is due to the limited number of propositions. In the contained simulations, 110 events (propositions) are only used to learn 73 w-concepts. This number is obviously inadequate. The more propositions used in the training, the more refined (distinct) will be the representations. We postulate that a child must experience a great number of propositions to learn a single concept correctly.

5.2. Performance of DYNASTY

We have tested DYNASTY'S generalization abilities by changing the ratio between trained and tested script instances. Table 4 shows ~ m ~ m ' s generalization ability with full story inputs, where all the events in the script are shown in the input stories.

Table IV. Generalization performance for the full story input. Trained (tested) designates the number of trained (tested) scripts. E, Scr (E, Ev) designates the average Euclidean distance over all the units for the script-recognizer (backbone-generator) module. Correct Scr (Correct Ev) designates the percentage of correctly recognized scripts (produced backbone-events), where 'correct' means that each and every unit in a script-type (backbone-event) representation is within 0.1 of the correct unit value (0.0 to 1.0 range). There are 4 units in script-type representations, and 10

units in backbone-event representations.

Trained Testcd E. Scr Correct Scr E. EV Correct Ev

We used 2 script-types and 16 script-based stories (8 for each script). Table IV shows excellent generalization performance. The percentage of correctly recognized scripts and correctly produced backbone-events are not reduced when the number of trained instances decrease. Table V shows DYNASTY'S generalization ability for partial story input, where only starred events in the training data (see Section 4.1) are shown in the input story.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 27: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

338 Geunbae Lee, Margot Flowers & Michael G. Dyer

Table V. Generalization performance for the partial story input

Trained Tested E. Scr Correct Scr E. Ev Correct Ev

The generalization performance is almost the same as the full story case, since the script-recognizer robustly recognizes the correct script-type pattern when only partial input stories ate given.

Table VI shows script-recognizer's weight damage resistance performance, where the same 2 scripts and 16 stories were used.

Table VI. Damage resistance for the script-recognizer module. 'Lesioned' designates the percentage of lesioned weights. E, desig- nates the average Euclidean distance, and 'Correct Scr' designates

the percentage of correctly recognized scripts

This table shows the result of the case where 12 scripts instances are trained and 4 instances are tested. Weights are randomly damaged according to the percentage under the Lesioned column. The interesting thing is that the percentage of the correctly recognized scripts suddenly drops at 20% lesioning, where only one script-type out of two is correctly recognized. Table VII shows damage resistance performance for the backbone-generator, which is poorer than the script-recognizer's case.

Table VII. Damage resistance for the backbone-generator module

Lesioned E. Correct Ev I Lesioned E. Correct Ev

We postulate that this poor resistance performance is due to the excessive similarity between each event representation. Because there is an exponential number of event representations compared with the number of word concepts, event represen- tation space gets excessive crowded. So a little damage on the weights can not be tolerated in order to finely distinguish each event representation.

Compared with a pure symbolic system, such as SAM (Cullingford, 1978), DYNASTY demonstrates many desirable feamres of PDP systems, i.e. automatic general- ization, fault-tolerance, and graceful degradation. Also, unlike symbolic systems,

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 28: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 339

D Y N ~ is suitable for the massively parallel hardware implementations using VLSI technology (Mead, 1987).

A modular connectionist architecture with recursive, compositional distributed representations (the DSRs in DYNASTY) opens a new way to building practical symbolic/connectionist systems that can perform fairly high-level inferencing tasks. This type of neurally inspired cognitive architecture can bridge the gap between logical/symbolic A1 and the more numerical/statistical neural network field. Usually symbolic A1 systems lack in expandability since they are brittle and break easily with larger practical data. But in the DYNASTY case, when the system needs to process larger practical data, all it needs is to increase the amount of training data.

5.3. Current Status

DYNASTY has been fully implemented on an HP9000/300 workstation in the C programming language, except for the Parser and surface-generator modules. The amount of C code for the full DYNASTY system is about 3000 lines. Currently DYNASTY

was trained for the 16 script instances (4 different script-types) and performed correctly for the 16 new script instances (stories). It took about 18 days for DSR- learner, 10 days for event-encoder, 3 days for script-recognizer and 1 day for backbone-generator trainings on the HP 9000/300 workstation (actual time, not CPU time).

Since the Parser and surface-generator have not yet been implemented, the system's actual input is event case triple forms of stories (not natural language input) and output of fully expanded stories is also in event case triple form.

There are several limitations to the current implementation of DYNASTY. These are listed below, along with their proposed resolutions.

First, DYNASTY uses orthogonal (localist) script-type representations, and does not have explicit script instance (script type plus their bindings) representations. However, we can build script instance representations using the event-encoder architecture from [script-type, script-role, instance] triples. For example, the restaurant script instance of John's going to the Chart-House can be learned from [s-restaurant, customer, John] and [s-restaurant, restaurant-name, Chart-House] triples by using the same procedures in Section 3.2.1. Of course, when we use script instance representations, the script- recognizer's function is changed to recognize the correct script instances, not the script types. Therefore its performance becomes degraded. The reason for the performance degradation is that the script instance representation of 'John's going to the Chart- House' is similar to the representation of 'Jack's going to the Chart-House' since John and Jack are similar to each other. Theoretically, we can adjust similarity levels by adding/deleting more propositions (events) in the training data, but in practice, it is not easy to find the correct amount of propositions to make John and Jack similar for generalization but still distinct enough for good binding performance.

Second, DYNASTY'S role-binding is delayed and quasi-symbolic. After all the backbone-events for the recognized script are produced, DYNASTY instantiates each script role by accessing the quasi-symbolic binding-table.6 However, we can make the script-recognizer recognize explicit script instance representations and make the backbone-generator produce instantiated event sequences (with the script-roles already instantiated) from those script instance representations. In this case, we do not need an explicit binding architecture and role-binding is performed during the script recogni- tion and generation process. However, role-binding performance gets degraded due to

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 29: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

340 Geunbae Lee, Margot Flowers & Michael G. Dyer

the excessive similarity between each script instance representation in the same script type.

Third, the current implementation.of G D uses von Neuman style addressing, and accessing of DSRs is sequential. In the GD, each entry has a symbolic part and distributed representation (DSR) pan, and they are stored in a table. For a more realistic model, the symbolic part of each w-concept could be replaced with a pattern representing the orthography or acoustic pattern of the words that refer to that w- concept, and accessing could be parallel and associative by using neural network models of GD. Kohonen's feature maps are one approach to this kind of distributed lexicon model (Miikkulainen, 1990).

5.4. ~ u r u r e Work

There are mainly three other future research directions for DYNASTY project.

(1) Parsing and sudace generation. We are trying to modify McClelland and Kawa- moto's case-role assignment network (McClelland & Kawamoto, 1986) for DWMTVS parser and surface generator. The Parser and surface-generator module could be Elman type recurrent PDP networks (Elman, 1988). In the parser module, the input layer has one word bank, and output layer has many banks for the case-roles. Similarly, in the surface-generator module, the input layer has many banks for the case-roles and the output layer has one bank for the word (see Miikkulainen & Dyer, 1989, for a similar architecture).

( 2 ) Goal/plan-based story processing. We are also conducting research to extend DYNASTY to process not only stereotypical script-based stories, but also goal/plan- based stories. The key to goal/plan-based story understanding (Wilensky, 1978) is so- called dynamic reinterpretation. The same action should be interpreted differently according to previous goals and plans, and our approach uses recurrent PDP network modules to recognize goals from actions and also to store goal/plan inference chains. In these network modules, the same action can be recognized as different goals according to the context state, ind the same goals/plans can form different inference chains according to the initial context state. For more details on the architecture, see Lee (1990a, b).

(3) Neural network models of the global dictionary. We are trying to implement the global dictionary using two Elman type recurrent PDP networks. The onhography-to- DSR network maps the ASCII codes of the symbols to their DSRs, and the DSR-to- orthography network maps the DSRs to their ASCII codes. By using these two recurrent networks as a GD, the current implementation of the sequential access of DSRs can be replaced with parallel and associative access.

6. Related Research

Our XRAAMS are based on Hinton's reduced description idea (Hinton, 1988) and Pollack's RAAMS (Pollack, 1988). RAAM is a PDP architecture which can devise compositional, similarity-based, and recursive PDP representations. The resulting RDRs (recursive distributed representations) (Pollack, 1990) can encode variable- sized recursive and sequential data structures, such as trees and stacks, in fixed resource systems. So it is argued (Pollack, 1990) that RDRs can form a bridge between

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 30: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 341

the data structures necessry for high-level cognitive tasks and the associative, pattern recognition machinery provided by neural networks, since they combine aspects of features, pointers and structures. RMM can be used to encode parse trees so that it can be applied to syntactic level processing (Chalmers, 1990).

RAAMS, however, lack in external storage for each representation formed, and the resulting RDRs are not global. RMM also lacks semantics at the word level in its representations. So ri&t now, RAAM'S applications are restricted to syntax processing, and there are no conceptual level applications of RAAMS to our knowledge. But DSRs (using XRAAMS) are learned using propositions, so they have word level semantics and can be applied to conceptual information processing as has been done in DYNASTY and DYNASTYII (Lee, 1990b). So what is needed for RDRs is another level of RMM style architecture to develop semantic level representations which have all kinds of strong features of RDRs, such as DSRs in DYNASTY.

FGREP (forming global representations using extended backpropagation) (Miikku- lainen & Dyer, 1988) is a mechanism to develop global distributed representations using the same kinds of symbol recirculation ideas used in the XRMM architecture. FGREP'S representations are optimal for the processing tasks because the system learns the representations at the same time it is trained to perform those tasks. The basic idea of FGREP is to extend the error signal to the input layer (during backpropagation learning) to modify the input representations as if they were weights. The advantage of FGREP representations are that they are optimal for the given tasks and have similarity properties, i.e. the representations which are used similarly in the tasks end up being similar to each other.

However FGREP representations are developed during specific task processing, so performance becomes poor when FGREP is applied to unrelated tasks. But when we combine the architectures for different tasks, then the single resulting FGREP represen- tations can be developed during the combined processing tasks, so it can provide optimal performances for several different tasks (Miikkulainen & Dyer, 1989). DSRs are, however, learned independent of any particular processing task, so the representa- tions should be useful in any task requiring access of the propositional content of word meanings.

DISPAR (DIStributed PARaphraser) (Miikkulainen & Dyer, 1989) is a PDP level system, similar to DYNASTY, which reads partial script-based stories and paraphrases them as causally complete output stories using FGREP representations. After paraphras- ing the stories, DISPAR ends up developing new representations which are optimal for the paraphrasing tasks. DISPAR uses a global lexicon which is the same as our global dictionary, but DISPAR'S global lexicon contains not only word concepts but also script and role representations (Miikkulainen & Dyer, 1989).

DISPAR has a flat structure, i.e. it has the fixed number of banks in each network layer according to the predefined number of case-roles or script-roles, so the layer representations get very sparse since most of the case-roles and script-roles (among all the case-roles and script-roles in the system) are empty in any given representation. The FGREP representations developed in DIsPAR are too similar to each other since there is no way to control the similarity of two words when they are used in the exact same way in a task. So DISPAR introduces a cloning mechanism to develop instance representations (e.g. John) from the generic concepts (e.g. human) by attaching random, fixed identification (ID) bits to the FGREP representations. But the bad thing is that it turns out that 90% of the time is actually used to learn the identification of the representations since BP is very inefficient for copying the static patterns to different places (Miikkulainen & Dyer, 1989). Moreover, attaching a fixed ID pan to

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 31: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

342 G e u n b a e Lee, Margot Flowers & Michael G. Dyer

the representations tends to undo PGREP'S effon for the similarity-based representa- tions, so it decreases the optimality of the FGREP representations. But in the DSR case, we can control the similarity between two words by adding more propositions, and there is no need to attach the orthogonal identification pan to the DSRs in DYNASTY.

DCPS (distributed connectionist production system) (Touretzky & Hinton, 1988) is a connectionist interpreter for a restricted class of production system, based on coarse- coded distributed representations and the Boltzman machine learning algorithm. The condition pan of each production rule consists of two triples of terms, which are either ground or of the form [ x A B] where x is a variable. The working memory of DCPS is a set of coarse-coded binary state units which represent a set of triples. T o find out which condition pan of a production rule matches the working memory, a special set of units called the clause-space are introduced: one for the first and one for the second triple in the condition pan of a production rule. For the variable binding, another set of coarse coded units called the bind-space is used. In the bind-space, a bind unit representing the constant, say 'A', is connected to all units in the clause-space which represent triples whose first element is 'A'.

D C P ~ is successful for an extremely limited form of rules (i.e. one variable in a fixed position in the triple). It is not obvious how the technique can be aplied if the position of variables in the a l e are changed and/or if the working memory also contains variables. Moreover, their coarse-coded representations require a large amount of human effon, and complex access mechanisms such as clause-spaces. In contrast, DSRs are automatically learned from propositions and need relatively simple access mechanisms such as XRAAMS and the GD. We cannot easily adopt their variable binding mechanism since their binding-space units are binary while our DSRs are continuous representations.

CRAM (Dolan & Dyer, 1987; Dolan, 1989) is a symbolic/PDP hybrid system which is able to read single paragraph, fable-like stories, and either give a thematically relevant summary or generate planning advice for a character in the story. C R ~ M is implemented using special PDP networks called tensor manipularion neworks, where the operation of the network is interpreted as manipulations of high rank tensors (generalized vector outer products) (Dolan & Smolensky, 1989; Smolensky, 1987b). The operations on tensors in turn are interpreted as operations on symbol structures. CRAM and DYNASTY both exploit functional design approaches, in which we define functional modules first, and replace them with PDP architectures. CRAM'S role- binding architecture uses conjunctive-cube-coding which stores several (schema role filler) triples in one cube, later allowing the system to retrieve fillers when the schemata and roles are given using complex PDP architectures employing tensor representations.

But CRAM'S disadvantages lie in the representations, i.e. it uses carefully designed, almost orthogonal micro-features to prevent cross-talk (Feldman & Ballard, 1982). Since tensor manipulation networks can not handle non-orthogonal similarity-based representations without running into cross-talk, CRAM actually has to have additional circuits to reduce cross-talk for satisfactory performance even with the micro-feature represeniations. So if we want to use the automatic similarity-based representations (which are usually non-orthogonal) such as DSRs, then tensor-based architectures such as CRAM usually break down.

DWCS (Touretzky & Geva, 1988) is a distributed connectionist schema processing system which emphasizes a concept and role inheritance mechanism. A concept is a set of slot/filler pairs and each slot can have only one value. Unlike other distributed connectionist schema processing systems such as CRAM, DUCS uses distributed represen-

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 32: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 343

tations for both slot names and slot fillers. So DWCS can encode fine semantic distinctions as subtle variations on the canonical pattern for a slot. For example, in a bird schema, if we ask about the nose of the bird instead of the beak, DUCS can still answer the question. In the same spirit, DYNASTY'S script role (e.g. customer) also uses distributed representations so DYNASTY can also encode the fine semantic distinctions between script-specific states such as restaurant-customer and market-customer.

7. Summary

We have proposed a new method for developing distributed semantic representations (DSRs) in order to serve as an adequate foundation for constructing and manipulating conceptual knowledge, and have presented an architecture, based on x w s , to automatically develop DSRs. Our experiments indicate that DSRs show many desir- able properties such as automaticity, portability, structure encoding ability and similar- ity-based distributed representations.

We have shown that DSRs can serve as building blocks in constructing symbolic/ connectionist cognitive architectures by (1) building similarity-based automatic distri- buted representations and (2) actually constructing architectures to perform script- based story processing tasks using recurrent PDP modules.

DYNASTY, a modular symbolic/connectionist system for the high-level inferencing can (1) automatically form distributed representations of concepts (words) and events from the input sentences in the domain of script-based story understanding, (2) generate complete script event sequences from fragmentary inputs, and (3) success- fully bind the roles in the script for the unstated events in the input. Moreover, the high-level representations (DSRs of concepts, events) formed contain constituent structure that can be decoded and extracted, making the semantic content available for multiple tasks. Finally, DSRs formed for the concepts, script-roles and events that have similar semantics end up with similar representations.

Notes

1. Wbcn input patterns are used as teaching patterns, BP can be considered as an unsupervised learning algorithm since we do not need a separate teaching pattern for each input.

2. An advantage of the XRAAM network lies in that we can decode the constituent structures from the representations by using the same XRAAM network.

3. By event case-triples, we mean the structures such as [evlO, ACT, entered], [evlO, AGE=, John], and IevlO, TO, Chart-House]. These three triples form one event, i.e. 'evlO: John entered the Chart-House'.

4. Note these representations come from the event-encoder module using the event triples. 5. The reason why the non-intuitive clustering of left-for and home occurs is that they are under-defined

in the training data. In this case, they are merged at step I2 (that means they are not that similar, See Figure 13), so this anomalous clustering does not affect the system's performance.

6. Even though the binding-table was implemented using von Neumann style addressing, the entries (script- role and instance) are DSR patterns, not symbols. So accessing the table requires matching the minimum distance pattern.

References

Allen, R.B. (1988) Sequential connectionist networks for answering simple questions about a microworld. In Roccedingr of the Tenth Annual Cognitive Science Sociery Conference, pp. 489-495. Hillsdale, NJ: Erlbaum.

Bower, G.H., Black, J.B. & Turner, T.J. (1979) Scripts in memory for text. Cogniriuc Psycholop, 11, 177-220.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 33: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

344 Geunbae Lee, Margot Flowers & Michael G. Dyer

Chalmers, D.J. (1990) Syntatic trnnsformations on distributed representations. Connection Science, 2, 53-63. Chun, H.W. & Mimo, A. (1987) A model of schema selection using marker passing and connectionist

spreading activation. In Proceedings of the Ninth Annual Cognitive Science Society Conference, pp. 887-896. Hillsdale, NJ: Erlbaum.

Cullingford, R.E. (1978) Script application: computer understanding of newspaper stories. PhD thesis, Department of Computer Science, Yale University. Technical Report 116.

Dolan, C.P. (1989) Tensor manipulation networks: connectionist and symbolic approaches to comprchrnsion, learning and planning. PbD thesis, Computer Science Department, UCLA.

Dolan, C.P. & Dyer, M.G. (1987) Symbolic schemata, role binding, and the evolution of s t t u c ~ r e in connectionist memories. In Proceedings of the IEEE First Annual Inremational Confmnce on Neural Networks, Vol. 2, pp. 287-298. IEEE.

Dolan, C.P. & Smolcnsky, P. (1989) Implementing a connectionist production system using tensor products. In D. S. Touretzky, G. E. Hinton & T. J. Sejnowski (Eds) Proceedings of rhe 1988 Connectionist Models Summer School, pp. 265-272. Lor Altos, CA: Morgan Kaufmann.

Dyer, M.G. (1990) symbolic NeuroEngineering for natural language processing: a multilevel research approach. In J. Barnden & J. Pollack (Eds) Advances in Connectionist and Neural Computation Theory. Norwood, NJ: Ablex (in press).

Dyer, M.G., Cullingford, R.E. & Alvarado, S. (1987) Scripts. In S. C. Shapiro (Ed.) Encyclopedia of Ar?i/icial Intelligence, pp. 980-994. New York: Wiley.

Dyer, M.G., Flowers, M. & Wang, A. (1988) Weight mam'x=pattern of activation: encoding semantic networks as dism'buted representations in DUAL, a PDP architecture. Technical Report UCLA-AI-88-5, Artificial Intelligence Laboratory, Computer Science Department, University of California, Los Angcles.

Dyer, M.G., Flowers, M. & Wang, A. (1989) Distributed symbol discovery through symbol recirculation: toward natural language processing in distributed connectionist networks. In R. Reilly & N. Sharkey (Eds) Connectionist Approaches to Natural Language Understanding. Hillsdale, NJ: Erlbaum (in press).

Elman, J.L. (1988) Finding smcture in time. Technical Report 8801, Center for Research in Language, University of California, San Diego.

Feldman, J.A. (1986) Neural representation of conceprual knowledge. Technical Report T R 189, Dcpartment of Computer Science, University of Rochester, NY.

Feldman, J.A. & Ballard, D.H. (1982) Connectionist models and their properties. Cognitive Science, 6, 205-254.

Fillmore, C.J. (1968) The case for case. In E. Bach & R. T. Harms (Eds) Universals in Linguistic Theory, pp. 1-90. New York: Holt, Rinchan & Winston.

Fodor, J. & Pylyshyn, Z. (1988) Connectionism and cognitive architecture: a critical analysis. Cognition, 28, 3-71.

Golden, R.M. (1986) Representing causal schemata in connectionist systems. In Proceedings of the Eighth Annual Cognitive Science Society Conference, pp. 13-22. Hillsdale, NJ: Erlbaum.

Hanson, S.J. & Kegl, J. (1987) Parsnip: a connectionist network that learns natural language grammar from exposure to natural language sentences. In Proceedings of the Ninth Annual Cognizive Science Society Conference, pp. 106-1 19. Hillsdale, NJ: Erlbaum.

Hartigan, J.A. (1975) Clustering Algorithms. New York: Wiley. Hinton, G.E. (1986) Learning distributed representations of concepts. In Proceedings of the Eighth Annual

Cognitive Science Socicfy Conference, pp. 2-12. Hillsdale, NJ: Erlbaum. Hinton, G.E. (1988) Representing part-whole hierarchies in connectionist networks. In Proceedings of the

Tenth Annual Cognitive Science Society Conference, pp. 48-54. Hillsdale, NJ: Erlbaum. Hinton, G.E., McClelland, J.L. & Rumelhart, D.E. (1986) Distributed representations. In D. E. Rumelhart

& J. L. McClelland (Eds) Parallel Dism'buzed Processing: Explorations in the Microsnucture of Cognition, Vol. I, Foundations, pp. 77-109. Cambridge, MA: MIT Press.

Lee. G. (1990a) Disnibuted semantic representations for goal/plan analysis of natural language stories in a connecrionisr architecture, PhD thesis, Computer Science Department, University of California, Los Angeles (in preparation).

Lee, G. (1990b) DYNASTYII; a neural network model of goal-based story processing rynem. Unpublished research report, Computer Science Department, UCLA.

Lee, G., Flowers, M. & Dyer, M.G. (1989a) Learning dism'bured representations of conceptual knowledge. Technical Report UCLA-AI-89-13, Artificial Intelligence Laboratory, Computer Science Department, University of California, Lor Angeles.

Lee, G., Flowers, M. & Dyer, M.G. (1989b) A symbolic/connectionist script applier mechanism. In Proceedings of the Eleventh Annual Cognitive Science Society Conference, pp. 714-721. Hillsdale, NJ: Cognitive Science Society, Erlbaum.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014

Page 34: Learning Distributed Representations of Conceptual Knowledge and their Application to Script-based Story Processing

Learning DSRs 345

McClelland, J.L. & Kawamoto, A.H. (1986) Mechanisms of sentence processing: assigning roles to constituents. In J. L. McClelland & D. E. Rumelhart (Eds) Parallel Dism.butcd Processing: Explorations in h e Microsmcture of Cognition, Vol. 11, Psychological and Biological Models, pp. 272-326. Cambridge, MA: MIT Press.

Mead, C. (1987) Silicon models of neural computation. In Proceedings of the IEEE F i m Annual Internarional Conference on Neural Networks. IEEE.

Miikkulainen, R. (1990) A dismmbuted/eature map model ofthe lexicon. Technical Report UCLA-AI-90-04, Artificial lntclligencc Laboratory, Computer Science Department, University of California, Los Angeles.

Miikkulainen, R. & Dyer, M.G. (1988) Forming global representations with extended back-propagation. In Proceedings of the IEEE Second Annual International Conference on Neural Networks, Vol. 1, pp. 285-292. IEEE.

Miikkulainen, R. & Dyer, M.G. (1989) A modular neural network architecture for sequential paraphrasing of script-based stories. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, pp. 49-56. IEEE.

Pollack, J.B. (1988) Recursive auto-associative memory: devising compositional dism'buted representations. Technical Report MCCS-88-124, Computing Research Laboratory, New Mexico State University.

Pollack, J.B. (1990) Recursive distributed representations. Artificial Intelligence, 45, Special issue on connectionist symbol processing.

Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986a) Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds) Parallel Distributed Processing: Explorations in the Microsmcture oJ Cognition, Vol. I, Foundations, pp. 318-362. Cambridge, MA: MIT Press.

Rumelhart, D.E., McClelland, J.L. & the PDP Research Group (1986b) Parallel Dism'bured Processing: Explorarions in the Microsmcture of Cognition. Cambridge, MA: MIT Press.

Rumelhan, D.E., Smolensky, P., McClelland, J.L. & Hinton, G.E. (1986~) Schemata and sequential thought processes in pdp models. In J. L. McClelland & D. E. Rumelhart (Eds) Parallel Distributed Processing: Exp/orations in the Microsmcrure of Cognition, Vol. 11, Psychological and Biological Models, pp. 7-57. Cambridge, MA: MIT Press.

Schank, R. (1973) Identification of conceptualization underlying natural language. In R. Schank & R. Colby (Eds) Computer Models of Thought and Language, pp. 187-248. San Francisco, CA: Freeman.

Schank, R. & Abelson, R. (1977) Scripts, Plans, Goals, and Understanding-an Inquiry into Human Knavledge Smctures. The Artificial Intelligence Series. Hillsdale, NJ: Erlbaum.

Schank, R. & Riesbeck, C.K. (Eds) (1981) Inside Computer Understanding. Hillsdale, NJ: Erlbaum. Smolensky, P. (1987a) A method for connectionist variable binding. Technical Report CU-CS-356-87,

Department of Computer Science and Institute of Cognitive Science, University of Colorado, Boulder. Smolensky, P. (1987b) On variable binding and the representation of symbolic structures in connectionist

systems. Technical Report CU-CS-355-87, Department of Computer Science and Institute of Cognitive Science, University of Colorado, Boulder.

Smolensky, P. (1988) On the proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1-74. St John, M.F. & McClelland, J.L. (1989) Applying contextual constraints in sentence comprehension. In D.

S. Touretzky, G. E. Hinton & T. J. Sejnowski (Eds) Proceedings of the 1988 Connectionist Models Summer School, pp. 338-346. Los Altos, CA: Morgan Kaufmann.

Touretzky, D.S. & Geva, S.A. (1988) A distributed connectionist representation for concept structures. In Proceedings of the Tenth Annual Cognitive Science Sociery Conference, pp. 155-163. Hillsdale, NJ: Erlbaum.

Touretzky, D.S. & Hinton, G.E. (1988) A distributed connectionist production system. Cognitive Science, 12, 423-436.

Waltz, D.L. & Pollack, 1.0. (1985) Massively parallel parsing: a strongly interactive model of natural language interpretation. Cognitive Science, 9, 51-74.

Wilensky, R. (1978) Understanding goal-based stories. PhD thesis, Computer Science Department, Yale University.

Dow

nloa

ded

by [

Uni

vers

ity o

f C

ambr

idge

] at

02:

26 0

4 D

ecem

ber

2014