42
LANGUAGE AND COGNITIVE PROCESSES, 1997, 12 (5/6), 765–805 Requests for reprints should be addressed to David C. Plaut, Mellon Institute 115-CNBC, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, PA 15213-2683, USA. E-mail: plaut6cmu.edu This work was supported nancially by the NIH/NIMH (Grant MH47566) and by the McDonnell-Pew Program in Cognitive Neuroscience (Grant T89-01245-016). I thank Marlene Behrmann, Matt Lambon Ralph, Jay McClelland, Karalyn Patterson, Mark Seidenberg and an anonymous reviewer for helpful comments and/or discussions. q 1997 Psychology Press Ltd Structure and Function in the Lexical System: Insights from Distributed Models of Word Reading and Lexical Decision David C. Plaut Departments of Psychology and Computer Science, Carnegie Mellon University, and the Center for the Neural Basis of Cognition, Pittsburgh, PA, USA The traditional view of the lexical system stipulates word-speci c representations and separate pathways for regular and exception words. An alternative approach views lexical knowledge as developing from general learning principles applied to mappings among distributed representations of written and spoken words and their meanings. On this distributed account, distinctions among words, and between words and nonwords, are not rei ed in the structure of the system but re ect the sensitivity of learning to the relative systematicity in the various mappings. Two computational simulations address ndings that have seemed problematic for the distributed approach. Both involve a consideration of the role of semantics in normal and impaired lexical processing. The rst simulation accounts for patients with impaired comprehension but intact reading in terms of individual differences in the division of labour between the semantic and phonological pathways. The second simulation demonstrates that a distributed network can reliably distinguish words from nonwords based on a measure of familiarity de ned over semantics. The results underscore the importance of relating function to structure in the lexical system within the context of an explicit computational framework.

Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

Embed Size (px)

Citation preview

Page 1: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

LANGUAGE AND COGNITIVE PROCESSES 1997 12 (56) 765ndash805

Requests for reprints should be addressed to David C Plaut Mellon Institute 115-CNBCCarnegie Mellon University 4400 Fifth Avenue Pittsburgh PA 15213-2683 USA E-mailplaut6cmuedu

This work was supported nancially by the NIHNIMH (Grant MH47566) and by theMcDonnell-Pew Program in Cognitive Neuroscience (Grant T89-01245-016) I thank MarleneBehrmann Matt Lambon Ralph Jay McClelland Karalyn Patterson Mark Seidenberg and ananonymous reviewer for helpful comments andor discussions

q 1997 Psychology Press Ltd

Structure and Function in the Lexical SystemInsights from Distributed Models of Word Reading

and Lexical Decision

David C Plaut

Departments of Psychology and Computer Science Carnegie MellonUniversity and the Center for the Neural Basis of Cognition Pittsburgh

PA USA

The traditional view of the lexical system stipulates word-specicrepresentations and separate pathways for regular and exception words Analternative approach views lexical knowledge as developing from generallearning principles applied to mappings among distributed representations ofwritten and spoken words and their meanings On this distributed accountdistinctions among words and between words and nonwords are not reied inthe structure of the system but reect the sensitivity of learning to the relativesystematicity in the various mappings Two computational simulations addressndings that have seemed problematic for the distributed approach Bothinvolve a consideration of the role of semantics in normal and impaired lexicalprocessing The rst simulation accounts for patients with impairedcomprehension but intact reading in terms of individual differences in thedivision of labour between the semantic and phonological pathways Thesecond simulation demonstrates that a distributed network can reliablydistinguish words from nonwords based on a measure of familiarity denedover semantics The results underscore the importance of relating function tostructure in the lexical system within the context of an explicit computationalframework

766 PLAUT

INTRODUCTION

Word reading is one of the most extensively investigated areas withincognitive psychology A tremendous amount of data has been gathered onhow various properties of a written word (frequency of occurrence age ofacquisition numbers of letters and phonemes orthographic neighbourhoodspellingndashsound regularity and consistency imageability meaningfulness prior context etc) inuence the speed and accuracy with which a word isunderstood and pronounced by normally and abnormally developingreaders skilled readers and brain-damaged patients with selective readingimpairments Developing a comprehensive theory of lexical processing thatcan account for this full range of ndings is a considerable challenge onethat has yet to be met

A natural way of accounting for the effects of some variable or distinctionon performance is to build the distinction directly into the structure of amodel A good example is the status of a letter string as a word Fewresearchers would argue against the relevance to understanding humanlanguage performance of the distinction between letter strings that arewords and those that are not The fact that in most contexts skilled readersare fast and accurate at discriminating written words from nonwords withrelatively little practice indicates that there must be some difference in howthe reading system responds to words and nonwords The simplest way ofincorporating this distinction is to introduce words as explicit structures in amodel Well-known examples of such structures are Mortonrsquos (1969)logogens and McClelland and Rumelhartrsquos (1981) word units in theInteractive Activation model In a model with word-specic representationswords can be distinguished from nonwords simply by whether one of themreaches threshold or becomes fully active Moreover it becomesstraightforward to account for effects of lexical variables by manipulat-ing the word representations to reect the variable explicitly Forexample effects of word frequency can be incorporated directly by settingthe resting level of the logogen or word unit to be proportional tofrequency

Distinctions among words also lead researchers to postulate additionalstructure in the lexical system The classic example is the distinction betweenregular words and exception words Many researchers assume thatknowledge of spellingndashsound correspondences is encoded in terms of anexplicit set of rules The rules need to be generalmdashfor example based ongraphemendashphoneme correspondencemdashto support effective pronunciationof unfamiliar word-like letter strings (eg M AVE RIN T) However in alanguage like English in which spellingndashsound correspondences are onlypartially systematic such rules will apply to only a subset of wordsmdashso-called regular words (eg GAVE H INT) The remaining exception words (eg

MODELS OF WORD READING AND LEXICAL DECISION 767

HAVE PIN T) are mispronouncedmdashregularised mdashby the rules Given thatskilled readers can pronounce such words correctly they must be using someother mechanism with word-specic knowledge like the logogens or wordunits In this way a given set of spellingndashsound correspondences rulesdenes a sharp dichotomy between the items which obey the rules and theitems which violate them theories that reify this dichotomy as two separateprocessing mechanisms are thus termed ldquodual-routerdquo theories (eg Besneramp Smith 1992 Coltheart 1978 1985 Coltheart Curtis Atkins amp Haller1993 Marshall amp Newcombe 1973 Meyer Schvaneveldt amp Ruddy 1974Morton amp Patterson 1980 Paap amp Noel 1991) Note that most such theoriesfurther subdivide the lexical mechanism into semantic and non-semanticcomponents to account for the conditions under which normal and impairedword reading is (or is not) inuenced by word meaning Thus despite thename the essence of a dual-route theory is not that it has exactly twopathways from print to sound rather it is the claim that the mechanism thatpronounces nonwords is functionally distinct from and operates accordingto different principles than the mechanism that pronounces exceptionwords

Instead of accounting for effects of lexical variables by manipulating thestructure of a model directly an alternative theoretical approach is to startwith a set of general assumptions on the nature of representation processingand learning in the cognitive system and to show how when instantiated toperform specic tasks these assumptions give rise to the relevantbehavioural effects This is the perspective adopted by researchersemploying connectionistparallel distributed processing (PDP) models oflexical processing (eg Plaut McClelland Seidenberg amp Patterson 1996Seidenberg amp McClelland 1989 Van Orden Pennington amp Stone 1990Van Orden amp Goldinger 1994) For example in the framework proposed bySeidenberg and McClelland (1989 see Fig 1) orthographic phonologicaland semantic information is represented in terms of distributed patterns ofactivity over separate groups of simple neuron-like processing units Withineach domain similar words are represented by similar patterns of activityLexical tasks involve transformations between these representations forexample oral reading requires the orthographic pattern for a word togenerate the appropriate phonological pattern Such transformations areaccomplished via the cooperative and competitive interactions among unitsincluding additional hidden units that mediate between the orthographicphonological and semantic units In processing an input units interact untilthe network as a whole settles into a stable pattern of activitymdashtermedan attractormdashcorresponding to its interpretation of the input Unitinteractions are governed by weighted connections between them whichcollectively encode the systemrsquos knowledge about how the different typesof information are related Weights that give rise to the appropriate

768 PLAUT

FIG 1 A connectionist framework for lexical processing based on that of Seidenberg andMcClelland (1989)

transformations are learned on the basis of the systemrsquos exposure to writtenwords spoken words and their meanings

As Fig 1 makes clear the distributed connectionist approach does notentail a complete lack of structure within the lexical system But thedistinctions that are relevant relate to the different types of information thatmust be coordinated orthographic phonological and semantic Given thatsuch information may be based on input from different modalities (at leastfor the surface forms) it is natural to assume that the correspondingrepresentationsmdashand hence the pathways between themmdashareneuroanatomically distinct In fact such divisions play an important role inaccounting for data on the selective effects of brain damage on readingHowever irrespective of these distinctions among types of representationsthere is a uniformity in the processing mechanisms by which they are derivedand interact In this way the distributed connectionist approach isfundamentally at odds with the core tenet of dual-route theories

Seidenberg and McClelland (1989) provided computational support forthe claim that word-specic representations and separate rule-based versuslexical processing routes are unnecessary to account for skilled oral readingThey trained a connectionist network (corresponding to the bottom portionof Fig 1 termed the phonological pathway) to map from the orthography of2897 monosyllabic English wordsmdashboth regular and exceptionmdashto theirphonology After training the network pronounced 977 of the wordscorrectly including most exception words The network also exhibited thestandard empirical pattern of an interaction of frequency and consistency innaming latency (Andrews 1982 Seidenberg Waters Barnes amp Tanenhaus1984 Taraban amp McClelland 1987 Waters amp Seidenberg 1985) if its

MODELS OF WORD READING AND LEXICAL DECISION 769

real-valued accuracy in generating correct responses was taken as a proxyfor reaction time

The theoretical impact of Seidenberg and McClellandrsquos model ishowever undermined by certain inadequacies in its match to humanperformance particularly in three respects First the model was much worsethan skilled readers at pronouncing orthographically legal nonwords(Besner Twilley McCann amp Seergobin 1990) Secondly it was unable toperform lexical decision accurately under many conditions (Besner et al1990 Fera amp Besner 1992) Thirdly it failed to exhibit central aspects ofuent surface dyslexia when damaged (Patterson Seidenberg ampMcClelland 1989) specically the frequency 3 consistency interaction innaming accuracy the high regularisation rates and (again) the near-perfectnonword reading (see Patterson Coltheart amp Marshall 1985)

More recently Plaut et al (1996) demonstrated that networkimplementations of the phonological pathway can learn to pronounceregular and exception words and yet also pronounce nonwords as well asskilled readers if they use more appropriately structured orthographic andphonological representations (based on graphemes and phonemes andembodying phonotactic and graphotactic constraints) Plaut and colleaguesalso provided a more adequate account of surface dyslexia based on thenotion that semantics plays an important role in skilled reading Specicallythe support for word pronunciations from the semantic pathway relieves thephonological pathway from having to master low-frequency exceptionwords by itself The combination of the two pathways is fully competent inskilled readers but following damage to the semantic pathway the limitedcompetence of the isolated phonological pathway manifests as surfacedyslexia

In the current work additional simulations are presented that furtherdevelop the role of semantics to address two remaining problematic issuesfor the distributed connectionist approach to lexical processing The rstconcerns a nding that would seem to challenge the above account of surfacedyslexia namely that some brain-damaged patients have substantialsemantic impairments but nonetheless can read low-frequency exceptionwords accurately (WLP Schwartz Marin amp Saffran 1979 MB Raymeramp Berndt 1994 DRN Cipolotti amp Warrington 1995 DC LambonRalph Ellis amp Franklin 1995) The rst set of simulations demonstrates thatsuch patients are natural consequences of expected individual differences inthe division of labour between the semantic and phonological pathwaysunder parametric variations even when the majority of individuals withsemantic damage would be expected to exhibit some degree of surfacedyslexia (Graham Hodges amp Patterson 1994 Patterson amp Hodges 1992)The second issue is the ability of distributed networks to perform lexicaldecision accurately A second simulation involving a full (although

770 PLAUT

feedforward) implementation of the framework in Fig 1 demonstrates thata network that maps orthography to semantics directly and via phonologycan reliably distinguish words from nonwords based on a measure offamiliarity dened over semantics

As background for the current work the next section presents a briefoverview of the approach taken by Plaut et al (1996) in accounting for theeffects of frequency and consistency in skilled word and nonword readingand in uent surface dyslexia The two simulation studies are then presentedfollowed by a general discussion

AN ACCOUNT OF FREQUENCY ANDCONSISTENCY EFFECTS IN NORMAL AND

IMPAIRED WORD READING

Seidenberg and McClelland (1989) based their orthographic andphonological representations on context-sensitive triples of letters andphonemic features (Wickelgren 1969) A central insight offered by Plaut etal (1996) was that such representations suffer from what was termed thedispersion problem and that it was this problem rather than any moregeneral failing of connectionist networks that lead to the poor nonwordreading performance of Seidenberg and McClellandrsquos model Specicallythe triples-based representations disperse the regularities between spellingand sound thereby hindering generalisation For example in theorthographic representation of the word GAVE the contribution made by theletter A depends entirely on the presence of the G and V there is no similarityto the contribution made by the A in SAVE or GATE and hence no basis for thetraining on the pronunciation of A in GAVE to support the samepronunciation in SAVE or G ATEmdashor in a nonword like M AVE

Plaut and colleagues hypothesised that the human reading systemgeneralises effectively because it uses representations that condensespellingndashsound regularities To support this hypothesis they designedorthographic and phonological representations in which within eachconsonant and vowel cluster letters and phonemes almost always activatedthe same units irrespective of context The representations were notintended to be fully general but rather to minimise the degree to whichspellingndashsound correspondences were dispersed across different units fordifferent words Plaut and colleagues found that when networks weretrained on essentially the same corpus as Seidenberg and McClellandrsquosmodel but using the new representations they were able to pronounce allthe words correctly and yet still generalise effectively to nonwordsMoreover the networks exhibited the empirical frequency 3 consistencyinteraction pattern when trained on actual (as opposed to logarithmically

MODELS OF WORD READING AND LEXICAL DECISION 771

FIG 2 (a) The frequency 3 consistency interaction exhibited in the settling time of anattractor network implementation of the phonological pathway in pronouncing words ofvarying frequency and spellingndashsound consistency (Plaut et al 1996 simulation 3) (b) Itsexplanation in terms of additive contributions of frequency and consistency subject to anasymptotic activation functionmdash s [ in equation (1) only the top half of which is shown FromPlaut et al (1996)

compressed) word frequencies even when naming latencies were modelleddirectly by the settling time of a recurrent attractor network (see Fig 2a)

Importantly Plaut et al (1996) went beyond providing only empiricaldemonstrations that networks could reproduce accuracy and latency data onword and nonword reading to offer a mathematical analysis of the critical

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 2: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

766 PLAUT

INTRODUCTION

Word reading is one of the most extensively investigated areas withincognitive psychology A tremendous amount of data has been gathered onhow various properties of a written word (frequency of occurrence age ofacquisition numbers of letters and phonemes orthographic neighbourhoodspellingndashsound regularity and consistency imageability meaningfulness prior context etc) inuence the speed and accuracy with which a word isunderstood and pronounced by normally and abnormally developingreaders skilled readers and brain-damaged patients with selective readingimpairments Developing a comprehensive theory of lexical processing thatcan account for this full range of ndings is a considerable challenge onethat has yet to be met

A natural way of accounting for the effects of some variable or distinctionon performance is to build the distinction directly into the structure of amodel A good example is the status of a letter string as a word Fewresearchers would argue against the relevance to understanding humanlanguage performance of the distinction between letter strings that arewords and those that are not The fact that in most contexts skilled readersare fast and accurate at discriminating written words from nonwords withrelatively little practice indicates that there must be some difference in howthe reading system responds to words and nonwords The simplest way ofincorporating this distinction is to introduce words as explicit structures in amodel Well-known examples of such structures are Mortonrsquos (1969)logogens and McClelland and Rumelhartrsquos (1981) word units in theInteractive Activation model In a model with word-specic representationswords can be distinguished from nonwords simply by whether one of themreaches threshold or becomes fully active Moreover it becomesstraightforward to account for effects of lexical variables by manipulat-ing the word representations to reect the variable explicitly Forexample effects of word frequency can be incorporated directly by settingthe resting level of the logogen or word unit to be proportional tofrequency

Distinctions among words also lead researchers to postulate additionalstructure in the lexical system The classic example is the distinction betweenregular words and exception words Many researchers assume thatknowledge of spellingndashsound correspondences is encoded in terms of anexplicit set of rules The rules need to be generalmdashfor example based ongraphemendashphoneme correspondencemdashto support effective pronunciationof unfamiliar word-like letter strings (eg M AVE RIN T) However in alanguage like English in which spellingndashsound correspondences are onlypartially systematic such rules will apply to only a subset of wordsmdashso-called regular words (eg GAVE H INT) The remaining exception words (eg

MODELS OF WORD READING AND LEXICAL DECISION 767

HAVE PIN T) are mispronouncedmdashregularised mdashby the rules Given thatskilled readers can pronounce such words correctly they must be using someother mechanism with word-specic knowledge like the logogens or wordunits In this way a given set of spellingndashsound correspondences rulesdenes a sharp dichotomy between the items which obey the rules and theitems which violate them theories that reify this dichotomy as two separateprocessing mechanisms are thus termed ldquodual-routerdquo theories (eg Besneramp Smith 1992 Coltheart 1978 1985 Coltheart Curtis Atkins amp Haller1993 Marshall amp Newcombe 1973 Meyer Schvaneveldt amp Ruddy 1974Morton amp Patterson 1980 Paap amp Noel 1991) Note that most such theoriesfurther subdivide the lexical mechanism into semantic and non-semanticcomponents to account for the conditions under which normal and impairedword reading is (or is not) inuenced by word meaning Thus despite thename the essence of a dual-route theory is not that it has exactly twopathways from print to sound rather it is the claim that the mechanism thatpronounces nonwords is functionally distinct from and operates accordingto different principles than the mechanism that pronounces exceptionwords

Instead of accounting for effects of lexical variables by manipulating thestructure of a model directly an alternative theoretical approach is to startwith a set of general assumptions on the nature of representation processingand learning in the cognitive system and to show how when instantiated toperform specic tasks these assumptions give rise to the relevantbehavioural effects This is the perspective adopted by researchersemploying connectionistparallel distributed processing (PDP) models oflexical processing (eg Plaut McClelland Seidenberg amp Patterson 1996Seidenberg amp McClelland 1989 Van Orden Pennington amp Stone 1990Van Orden amp Goldinger 1994) For example in the framework proposed bySeidenberg and McClelland (1989 see Fig 1) orthographic phonologicaland semantic information is represented in terms of distributed patterns ofactivity over separate groups of simple neuron-like processing units Withineach domain similar words are represented by similar patterns of activityLexical tasks involve transformations between these representations forexample oral reading requires the orthographic pattern for a word togenerate the appropriate phonological pattern Such transformations areaccomplished via the cooperative and competitive interactions among unitsincluding additional hidden units that mediate between the orthographicphonological and semantic units In processing an input units interact untilthe network as a whole settles into a stable pattern of activitymdashtermedan attractormdashcorresponding to its interpretation of the input Unitinteractions are governed by weighted connections between them whichcollectively encode the systemrsquos knowledge about how the different typesof information are related Weights that give rise to the appropriate

768 PLAUT

FIG 1 A connectionist framework for lexical processing based on that of Seidenberg andMcClelland (1989)

transformations are learned on the basis of the systemrsquos exposure to writtenwords spoken words and their meanings

As Fig 1 makes clear the distributed connectionist approach does notentail a complete lack of structure within the lexical system But thedistinctions that are relevant relate to the different types of information thatmust be coordinated orthographic phonological and semantic Given thatsuch information may be based on input from different modalities (at leastfor the surface forms) it is natural to assume that the correspondingrepresentationsmdashand hence the pathways between themmdashareneuroanatomically distinct In fact such divisions play an important role inaccounting for data on the selective effects of brain damage on readingHowever irrespective of these distinctions among types of representationsthere is a uniformity in the processing mechanisms by which they are derivedand interact In this way the distributed connectionist approach isfundamentally at odds with the core tenet of dual-route theories

Seidenberg and McClelland (1989) provided computational support forthe claim that word-specic representations and separate rule-based versuslexical processing routes are unnecessary to account for skilled oral readingThey trained a connectionist network (corresponding to the bottom portionof Fig 1 termed the phonological pathway) to map from the orthography of2897 monosyllabic English wordsmdashboth regular and exceptionmdashto theirphonology After training the network pronounced 977 of the wordscorrectly including most exception words The network also exhibited thestandard empirical pattern of an interaction of frequency and consistency innaming latency (Andrews 1982 Seidenberg Waters Barnes amp Tanenhaus1984 Taraban amp McClelland 1987 Waters amp Seidenberg 1985) if its

MODELS OF WORD READING AND LEXICAL DECISION 769

real-valued accuracy in generating correct responses was taken as a proxyfor reaction time

The theoretical impact of Seidenberg and McClellandrsquos model ishowever undermined by certain inadequacies in its match to humanperformance particularly in three respects First the model was much worsethan skilled readers at pronouncing orthographically legal nonwords(Besner Twilley McCann amp Seergobin 1990) Secondly it was unable toperform lexical decision accurately under many conditions (Besner et al1990 Fera amp Besner 1992) Thirdly it failed to exhibit central aspects ofuent surface dyslexia when damaged (Patterson Seidenberg ampMcClelland 1989) specically the frequency 3 consistency interaction innaming accuracy the high regularisation rates and (again) the near-perfectnonword reading (see Patterson Coltheart amp Marshall 1985)

More recently Plaut et al (1996) demonstrated that networkimplementations of the phonological pathway can learn to pronounceregular and exception words and yet also pronounce nonwords as well asskilled readers if they use more appropriately structured orthographic andphonological representations (based on graphemes and phonemes andembodying phonotactic and graphotactic constraints) Plaut and colleaguesalso provided a more adequate account of surface dyslexia based on thenotion that semantics plays an important role in skilled reading Specicallythe support for word pronunciations from the semantic pathway relieves thephonological pathway from having to master low-frequency exceptionwords by itself The combination of the two pathways is fully competent inskilled readers but following damage to the semantic pathway the limitedcompetence of the isolated phonological pathway manifests as surfacedyslexia

In the current work additional simulations are presented that furtherdevelop the role of semantics to address two remaining problematic issuesfor the distributed connectionist approach to lexical processing The rstconcerns a nding that would seem to challenge the above account of surfacedyslexia namely that some brain-damaged patients have substantialsemantic impairments but nonetheless can read low-frequency exceptionwords accurately (WLP Schwartz Marin amp Saffran 1979 MB Raymeramp Berndt 1994 DRN Cipolotti amp Warrington 1995 DC LambonRalph Ellis amp Franklin 1995) The rst set of simulations demonstrates thatsuch patients are natural consequences of expected individual differences inthe division of labour between the semantic and phonological pathwaysunder parametric variations even when the majority of individuals withsemantic damage would be expected to exhibit some degree of surfacedyslexia (Graham Hodges amp Patterson 1994 Patterson amp Hodges 1992)The second issue is the ability of distributed networks to perform lexicaldecision accurately A second simulation involving a full (although

770 PLAUT

feedforward) implementation of the framework in Fig 1 demonstrates thata network that maps orthography to semantics directly and via phonologycan reliably distinguish words from nonwords based on a measure offamiliarity dened over semantics

As background for the current work the next section presents a briefoverview of the approach taken by Plaut et al (1996) in accounting for theeffects of frequency and consistency in skilled word and nonword readingand in uent surface dyslexia The two simulation studies are then presentedfollowed by a general discussion

AN ACCOUNT OF FREQUENCY ANDCONSISTENCY EFFECTS IN NORMAL AND

IMPAIRED WORD READING

Seidenberg and McClelland (1989) based their orthographic andphonological representations on context-sensitive triples of letters andphonemic features (Wickelgren 1969) A central insight offered by Plaut etal (1996) was that such representations suffer from what was termed thedispersion problem and that it was this problem rather than any moregeneral failing of connectionist networks that lead to the poor nonwordreading performance of Seidenberg and McClellandrsquos model Specicallythe triples-based representations disperse the regularities between spellingand sound thereby hindering generalisation For example in theorthographic representation of the word GAVE the contribution made by theletter A depends entirely on the presence of the G and V there is no similarityto the contribution made by the A in SAVE or GATE and hence no basis for thetraining on the pronunciation of A in GAVE to support the samepronunciation in SAVE or G ATEmdashor in a nonword like M AVE

Plaut and colleagues hypothesised that the human reading systemgeneralises effectively because it uses representations that condensespellingndashsound regularities To support this hypothesis they designedorthographic and phonological representations in which within eachconsonant and vowel cluster letters and phonemes almost always activatedthe same units irrespective of context The representations were notintended to be fully general but rather to minimise the degree to whichspellingndashsound correspondences were dispersed across different units fordifferent words Plaut and colleagues found that when networks weretrained on essentially the same corpus as Seidenberg and McClellandrsquosmodel but using the new representations they were able to pronounce allthe words correctly and yet still generalise effectively to nonwordsMoreover the networks exhibited the empirical frequency 3 consistencyinteraction pattern when trained on actual (as opposed to logarithmically

MODELS OF WORD READING AND LEXICAL DECISION 771

FIG 2 (a) The frequency 3 consistency interaction exhibited in the settling time of anattractor network implementation of the phonological pathway in pronouncing words ofvarying frequency and spellingndashsound consistency (Plaut et al 1996 simulation 3) (b) Itsexplanation in terms of additive contributions of frequency and consistency subject to anasymptotic activation functionmdash s [ in equation (1) only the top half of which is shown FromPlaut et al (1996)

compressed) word frequencies even when naming latencies were modelleddirectly by the settling time of a recurrent attractor network (see Fig 2a)

Importantly Plaut et al (1996) went beyond providing only empiricaldemonstrations that networks could reproduce accuracy and latency data onword and nonword reading to offer a mathematical analysis of the critical

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 3: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 767

HAVE PIN T) are mispronouncedmdashregularised mdashby the rules Given thatskilled readers can pronounce such words correctly they must be using someother mechanism with word-specic knowledge like the logogens or wordunits In this way a given set of spellingndashsound correspondences rulesdenes a sharp dichotomy between the items which obey the rules and theitems which violate them theories that reify this dichotomy as two separateprocessing mechanisms are thus termed ldquodual-routerdquo theories (eg Besneramp Smith 1992 Coltheart 1978 1985 Coltheart Curtis Atkins amp Haller1993 Marshall amp Newcombe 1973 Meyer Schvaneveldt amp Ruddy 1974Morton amp Patterson 1980 Paap amp Noel 1991) Note that most such theoriesfurther subdivide the lexical mechanism into semantic and non-semanticcomponents to account for the conditions under which normal and impairedword reading is (or is not) inuenced by word meaning Thus despite thename the essence of a dual-route theory is not that it has exactly twopathways from print to sound rather it is the claim that the mechanism thatpronounces nonwords is functionally distinct from and operates accordingto different principles than the mechanism that pronounces exceptionwords

Instead of accounting for effects of lexical variables by manipulating thestructure of a model directly an alternative theoretical approach is to startwith a set of general assumptions on the nature of representation processingand learning in the cognitive system and to show how when instantiated toperform specic tasks these assumptions give rise to the relevantbehavioural effects This is the perspective adopted by researchersemploying connectionistparallel distributed processing (PDP) models oflexical processing (eg Plaut McClelland Seidenberg amp Patterson 1996Seidenberg amp McClelland 1989 Van Orden Pennington amp Stone 1990Van Orden amp Goldinger 1994) For example in the framework proposed bySeidenberg and McClelland (1989 see Fig 1) orthographic phonologicaland semantic information is represented in terms of distributed patterns ofactivity over separate groups of simple neuron-like processing units Withineach domain similar words are represented by similar patterns of activityLexical tasks involve transformations between these representations forexample oral reading requires the orthographic pattern for a word togenerate the appropriate phonological pattern Such transformations areaccomplished via the cooperative and competitive interactions among unitsincluding additional hidden units that mediate between the orthographicphonological and semantic units In processing an input units interact untilthe network as a whole settles into a stable pattern of activitymdashtermedan attractormdashcorresponding to its interpretation of the input Unitinteractions are governed by weighted connections between them whichcollectively encode the systemrsquos knowledge about how the different typesof information are related Weights that give rise to the appropriate

768 PLAUT

FIG 1 A connectionist framework for lexical processing based on that of Seidenberg andMcClelland (1989)

transformations are learned on the basis of the systemrsquos exposure to writtenwords spoken words and their meanings

As Fig 1 makes clear the distributed connectionist approach does notentail a complete lack of structure within the lexical system But thedistinctions that are relevant relate to the different types of information thatmust be coordinated orthographic phonological and semantic Given thatsuch information may be based on input from different modalities (at leastfor the surface forms) it is natural to assume that the correspondingrepresentationsmdashand hence the pathways between themmdashareneuroanatomically distinct In fact such divisions play an important role inaccounting for data on the selective effects of brain damage on readingHowever irrespective of these distinctions among types of representationsthere is a uniformity in the processing mechanisms by which they are derivedand interact In this way the distributed connectionist approach isfundamentally at odds with the core tenet of dual-route theories

Seidenberg and McClelland (1989) provided computational support forthe claim that word-specic representations and separate rule-based versuslexical processing routes are unnecessary to account for skilled oral readingThey trained a connectionist network (corresponding to the bottom portionof Fig 1 termed the phonological pathway) to map from the orthography of2897 monosyllabic English wordsmdashboth regular and exceptionmdashto theirphonology After training the network pronounced 977 of the wordscorrectly including most exception words The network also exhibited thestandard empirical pattern of an interaction of frequency and consistency innaming latency (Andrews 1982 Seidenberg Waters Barnes amp Tanenhaus1984 Taraban amp McClelland 1987 Waters amp Seidenberg 1985) if its

MODELS OF WORD READING AND LEXICAL DECISION 769

real-valued accuracy in generating correct responses was taken as a proxyfor reaction time

The theoretical impact of Seidenberg and McClellandrsquos model ishowever undermined by certain inadequacies in its match to humanperformance particularly in three respects First the model was much worsethan skilled readers at pronouncing orthographically legal nonwords(Besner Twilley McCann amp Seergobin 1990) Secondly it was unable toperform lexical decision accurately under many conditions (Besner et al1990 Fera amp Besner 1992) Thirdly it failed to exhibit central aspects ofuent surface dyslexia when damaged (Patterson Seidenberg ampMcClelland 1989) specically the frequency 3 consistency interaction innaming accuracy the high regularisation rates and (again) the near-perfectnonword reading (see Patterson Coltheart amp Marshall 1985)

More recently Plaut et al (1996) demonstrated that networkimplementations of the phonological pathway can learn to pronounceregular and exception words and yet also pronounce nonwords as well asskilled readers if they use more appropriately structured orthographic andphonological representations (based on graphemes and phonemes andembodying phonotactic and graphotactic constraints) Plaut and colleaguesalso provided a more adequate account of surface dyslexia based on thenotion that semantics plays an important role in skilled reading Specicallythe support for word pronunciations from the semantic pathway relieves thephonological pathway from having to master low-frequency exceptionwords by itself The combination of the two pathways is fully competent inskilled readers but following damage to the semantic pathway the limitedcompetence of the isolated phonological pathway manifests as surfacedyslexia

In the current work additional simulations are presented that furtherdevelop the role of semantics to address two remaining problematic issuesfor the distributed connectionist approach to lexical processing The rstconcerns a nding that would seem to challenge the above account of surfacedyslexia namely that some brain-damaged patients have substantialsemantic impairments but nonetheless can read low-frequency exceptionwords accurately (WLP Schwartz Marin amp Saffran 1979 MB Raymeramp Berndt 1994 DRN Cipolotti amp Warrington 1995 DC LambonRalph Ellis amp Franklin 1995) The rst set of simulations demonstrates thatsuch patients are natural consequences of expected individual differences inthe division of labour between the semantic and phonological pathwaysunder parametric variations even when the majority of individuals withsemantic damage would be expected to exhibit some degree of surfacedyslexia (Graham Hodges amp Patterson 1994 Patterson amp Hodges 1992)The second issue is the ability of distributed networks to perform lexicaldecision accurately A second simulation involving a full (although

770 PLAUT

feedforward) implementation of the framework in Fig 1 demonstrates thata network that maps orthography to semantics directly and via phonologycan reliably distinguish words from nonwords based on a measure offamiliarity dened over semantics

As background for the current work the next section presents a briefoverview of the approach taken by Plaut et al (1996) in accounting for theeffects of frequency and consistency in skilled word and nonword readingand in uent surface dyslexia The two simulation studies are then presentedfollowed by a general discussion

AN ACCOUNT OF FREQUENCY ANDCONSISTENCY EFFECTS IN NORMAL AND

IMPAIRED WORD READING

Seidenberg and McClelland (1989) based their orthographic andphonological representations on context-sensitive triples of letters andphonemic features (Wickelgren 1969) A central insight offered by Plaut etal (1996) was that such representations suffer from what was termed thedispersion problem and that it was this problem rather than any moregeneral failing of connectionist networks that lead to the poor nonwordreading performance of Seidenberg and McClellandrsquos model Specicallythe triples-based representations disperse the regularities between spellingand sound thereby hindering generalisation For example in theorthographic representation of the word GAVE the contribution made by theletter A depends entirely on the presence of the G and V there is no similarityto the contribution made by the A in SAVE or GATE and hence no basis for thetraining on the pronunciation of A in GAVE to support the samepronunciation in SAVE or G ATEmdashor in a nonword like M AVE

Plaut and colleagues hypothesised that the human reading systemgeneralises effectively because it uses representations that condensespellingndashsound regularities To support this hypothesis they designedorthographic and phonological representations in which within eachconsonant and vowel cluster letters and phonemes almost always activatedthe same units irrespective of context The representations were notintended to be fully general but rather to minimise the degree to whichspellingndashsound correspondences were dispersed across different units fordifferent words Plaut and colleagues found that when networks weretrained on essentially the same corpus as Seidenberg and McClellandrsquosmodel but using the new representations they were able to pronounce allthe words correctly and yet still generalise effectively to nonwordsMoreover the networks exhibited the empirical frequency 3 consistencyinteraction pattern when trained on actual (as opposed to logarithmically

MODELS OF WORD READING AND LEXICAL DECISION 771

FIG 2 (a) The frequency 3 consistency interaction exhibited in the settling time of anattractor network implementation of the phonological pathway in pronouncing words ofvarying frequency and spellingndashsound consistency (Plaut et al 1996 simulation 3) (b) Itsexplanation in terms of additive contributions of frequency and consistency subject to anasymptotic activation functionmdash s [ in equation (1) only the top half of which is shown FromPlaut et al (1996)

compressed) word frequencies even when naming latencies were modelleddirectly by the settling time of a recurrent attractor network (see Fig 2a)

Importantly Plaut et al (1996) went beyond providing only empiricaldemonstrations that networks could reproduce accuracy and latency data onword and nonword reading to offer a mathematical analysis of the critical

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 4: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

768 PLAUT

FIG 1 A connectionist framework for lexical processing based on that of Seidenberg andMcClelland (1989)

transformations are learned on the basis of the systemrsquos exposure to writtenwords spoken words and their meanings

As Fig 1 makes clear the distributed connectionist approach does notentail a complete lack of structure within the lexical system But thedistinctions that are relevant relate to the different types of information thatmust be coordinated orthographic phonological and semantic Given thatsuch information may be based on input from different modalities (at leastfor the surface forms) it is natural to assume that the correspondingrepresentationsmdashand hence the pathways between themmdashareneuroanatomically distinct In fact such divisions play an important role inaccounting for data on the selective effects of brain damage on readingHowever irrespective of these distinctions among types of representationsthere is a uniformity in the processing mechanisms by which they are derivedand interact In this way the distributed connectionist approach isfundamentally at odds with the core tenet of dual-route theories

Seidenberg and McClelland (1989) provided computational support forthe claim that word-specic representations and separate rule-based versuslexical processing routes are unnecessary to account for skilled oral readingThey trained a connectionist network (corresponding to the bottom portionof Fig 1 termed the phonological pathway) to map from the orthography of2897 monosyllabic English wordsmdashboth regular and exceptionmdashto theirphonology After training the network pronounced 977 of the wordscorrectly including most exception words The network also exhibited thestandard empirical pattern of an interaction of frequency and consistency innaming latency (Andrews 1982 Seidenberg Waters Barnes amp Tanenhaus1984 Taraban amp McClelland 1987 Waters amp Seidenberg 1985) if its

MODELS OF WORD READING AND LEXICAL DECISION 769

real-valued accuracy in generating correct responses was taken as a proxyfor reaction time

The theoretical impact of Seidenberg and McClellandrsquos model ishowever undermined by certain inadequacies in its match to humanperformance particularly in three respects First the model was much worsethan skilled readers at pronouncing orthographically legal nonwords(Besner Twilley McCann amp Seergobin 1990) Secondly it was unable toperform lexical decision accurately under many conditions (Besner et al1990 Fera amp Besner 1992) Thirdly it failed to exhibit central aspects ofuent surface dyslexia when damaged (Patterson Seidenberg ampMcClelland 1989) specically the frequency 3 consistency interaction innaming accuracy the high regularisation rates and (again) the near-perfectnonword reading (see Patterson Coltheart amp Marshall 1985)

More recently Plaut et al (1996) demonstrated that networkimplementations of the phonological pathway can learn to pronounceregular and exception words and yet also pronounce nonwords as well asskilled readers if they use more appropriately structured orthographic andphonological representations (based on graphemes and phonemes andembodying phonotactic and graphotactic constraints) Plaut and colleaguesalso provided a more adequate account of surface dyslexia based on thenotion that semantics plays an important role in skilled reading Specicallythe support for word pronunciations from the semantic pathway relieves thephonological pathway from having to master low-frequency exceptionwords by itself The combination of the two pathways is fully competent inskilled readers but following damage to the semantic pathway the limitedcompetence of the isolated phonological pathway manifests as surfacedyslexia

In the current work additional simulations are presented that furtherdevelop the role of semantics to address two remaining problematic issuesfor the distributed connectionist approach to lexical processing The rstconcerns a nding that would seem to challenge the above account of surfacedyslexia namely that some brain-damaged patients have substantialsemantic impairments but nonetheless can read low-frequency exceptionwords accurately (WLP Schwartz Marin amp Saffran 1979 MB Raymeramp Berndt 1994 DRN Cipolotti amp Warrington 1995 DC LambonRalph Ellis amp Franklin 1995) The rst set of simulations demonstrates thatsuch patients are natural consequences of expected individual differences inthe division of labour between the semantic and phonological pathwaysunder parametric variations even when the majority of individuals withsemantic damage would be expected to exhibit some degree of surfacedyslexia (Graham Hodges amp Patterson 1994 Patterson amp Hodges 1992)The second issue is the ability of distributed networks to perform lexicaldecision accurately A second simulation involving a full (although

770 PLAUT

feedforward) implementation of the framework in Fig 1 demonstrates thata network that maps orthography to semantics directly and via phonologycan reliably distinguish words from nonwords based on a measure offamiliarity dened over semantics

As background for the current work the next section presents a briefoverview of the approach taken by Plaut et al (1996) in accounting for theeffects of frequency and consistency in skilled word and nonword readingand in uent surface dyslexia The two simulation studies are then presentedfollowed by a general discussion

AN ACCOUNT OF FREQUENCY ANDCONSISTENCY EFFECTS IN NORMAL AND

IMPAIRED WORD READING

Seidenberg and McClelland (1989) based their orthographic andphonological representations on context-sensitive triples of letters andphonemic features (Wickelgren 1969) A central insight offered by Plaut etal (1996) was that such representations suffer from what was termed thedispersion problem and that it was this problem rather than any moregeneral failing of connectionist networks that lead to the poor nonwordreading performance of Seidenberg and McClellandrsquos model Specicallythe triples-based representations disperse the regularities between spellingand sound thereby hindering generalisation For example in theorthographic representation of the word GAVE the contribution made by theletter A depends entirely on the presence of the G and V there is no similarityto the contribution made by the A in SAVE or GATE and hence no basis for thetraining on the pronunciation of A in GAVE to support the samepronunciation in SAVE or G ATEmdashor in a nonword like M AVE

Plaut and colleagues hypothesised that the human reading systemgeneralises effectively because it uses representations that condensespellingndashsound regularities To support this hypothesis they designedorthographic and phonological representations in which within eachconsonant and vowel cluster letters and phonemes almost always activatedthe same units irrespective of context The representations were notintended to be fully general but rather to minimise the degree to whichspellingndashsound correspondences were dispersed across different units fordifferent words Plaut and colleagues found that when networks weretrained on essentially the same corpus as Seidenberg and McClellandrsquosmodel but using the new representations they were able to pronounce allthe words correctly and yet still generalise effectively to nonwordsMoreover the networks exhibited the empirical frequency 3 consistencyinteraction pattern when trained on actual (as opposed to logarithmically

MODELS OF WORD READING AND LEXICAL DECISION 771

FIG 2 (a) The frequency 3 consistency interaction exhibited in the settling time of anattractor network implementation of the phonological pathway in pronouncing words ofvarying frequency and spellingndashsound consistency (Plaut et al 1996 simulation 3) (b) Itsexplanation in terms of additive contributions of frequency and consistency subject to anasymptotic activation functionmdash s [ in equation (1) only the top half of which is shown FromPlaut et al (1996)

compressed) word frequencies even when naming latencies were modelleddirectly by the settling time of a recurrent attractor network (see Fig 2a)

Importantly Plaut et al (1996) went beyond providing only empiricaldemonstrations that networks could reproduce accuracy and latency data onword and nonword reading to offer a mathematical analysis of the critical

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 5: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 769

real-valued accuracy in generating correct responses was taken as a proxyfor reaction time

The theoretical impact of Seidenberg and McClellandrsquos model ishowever undermined by certain inadequacies in its match to humanperformance particularly in three respects First the model was much worsethan skilled readers at pronouncing orthographically legal nonwords(Besner Twilley McCann amp Seergobin 1990) Secondly it was unable toperform lexical decision accurately under many conditions (Besner et al1990 Fera amp Besner 1992) Thirdly it failed to exhibit central aspects ofuent surface dyslexia when damaged (Patterson Seidenberg ampMcClelland 1989) specically the frequency 3 consistency interaction innaming accuracy the high regularisation rates and (again) the near-perfectnonword reading (see Patterson Coltheart amp Marshall 1985)

More recently Plaut et al (1996) demonstrated that networkimplementations of the phonological pathway can learn to pronounceregular and exception words and yet also pronounce nonwords as well asskilled readers if they use more appropriately structured orthographic andphonological representations (based on graphemes and phonemes andembodying phonotactic and graphotactic constraints) Plaut and colleaguesalso provided a more adequate account of surface dyslexia based on thenotion that semantics plays an important role in skilled reading Specicallythe support for word pronunciations from the semantic pathway relieves thephonological pathway from having to master low-frequency exceptionwords by itself The combination of the two pathways is fully competent inskilled readers but following damage to the semantic pathway the limitedcompetence of the isolated phonological pathway manifests as surfacedyslexia

In the current work additional simulations are presented that furtherdevelop the role of semantics to address two remaining problematic issuesfor the distributed connectionist approach to lexical processing The rstconcerns a nding that would seem to challenge the above account of surfacedyslexia namely that some brain-damaged patients have substantialsemantic impairments but nonetheless can read low-frequency exceptionwords accurately (WLP Schwartz Marin amp Saffran 1979 MB Raymeramp Berndt 1994 DRN Cipolotti amp Warrington 1995 DC LambonRalph Ellis amp Franklin 1995) The rst set of simulations demonstrates thatsuch patients are natural consequences of expected individual differences inthe division of labour between the semantic and phonological pathwaysunder parametric variations even when the majority of individuals withsemantic damage would be expected to exhibit some degree of surfacedyslexia (Graham Hodges amp Patterson 1994 Patterson amp Hodges 1992)The second issue is the ability of distributed networks to perform lexicaldecision accurately A second simulation involving a full (although

770 PLAUT

feedforward) implementation of the framework in Fig 1 demonstrates thata network that maps orthography to semantics directly and via phonologycan reliably distinguish words from nonwords based on a measure offamiliarity dened over semantics

As background for the current work the next section presents a briefoverview of the approach taken by Plaut et al (1996) in accounting for theeffects of frequency and consistency in skilled word and nonword readingand in uent surface dyslexia The two simulation studies are then presentedfollowed by a general discussion

AN ACCOUNT OF FREQUENCY ANDCONSISTENCY EFFECTS IN NORMAL AND

IMPAIRED WORD READING

Seidenberg and McClelland (1989) based their orthographic andphonological representations on context-sensitive triples of letters andphonemic features (Wickelgren 1969) A central insight offered by Plaut etal (1996) was that such representations suffer from what was termed thedispersion problem and that it was this problem rather than any moregeneral failing of connectionist networks that lead to the poor nonwordreading performance of Seidenberg and McClellandrsquos model Specicallythe triples-based representations disperse the regularities between spellingand sound thereby hindering generalisation For example in theorthographic representation of the word GAVE the contribution made by theletter A depends entirely on the presence of the G and V there is no similarityto the contribution made by the A in SAVE or GATE and hence no basis for thetraining on the pronunciation of A in GAVE to support the samepronunciation in SAVE or G ATEmdashor in a nonword like M AVE

Plaut and colleagues hypothesised that the human reading systemgeneralises effectively because it uses representations that condensespellingndashsound regularities To support this hypothesis they designedorthographic and phonological representations in which within eachconsonant and vowel cluster letters and phonemes almost always activatedthe same units irrespective of context The representations were notintended to be fully general but rather to minimise the degree to whichspellingndashsound correspondences were dispersed across different units fordifferent words Plaut and colleagues found that when networks weretrained on essentially the same corpus as Seidenberg and McClellandrsquosmodel but using the new representations they were able to pronounce allthe words correctly and yet still generalise effectively to nonwordsMoreover the networks exhibited the empirical frequency 3 consistencyinteraction pattern when trained on actual (as opposed to logarithmically

MODELS OF WORD READING AND LEXICAL DECISION 771

FIG 2 (a) The frequency 3 consistency interaction exhibited in the settling time of anattractor network implementation of the phonological pathway in pronouncing words ofvarying frequency and spellingndashsound consistency (Plaut et al 1996 simulation 3) (b) Itsexplanation in terms of additive contributions of frequency and consistency subject to anasymptotic activation functionmdash s [ in equation (1) only the top half of which is shown FromPlaut et al (1996)

compressed) word frequencies even when naming latencies were modelleddirectly by the settling time of a recurrent attractor network (see Fig 2a)

Importantly Plaut et al (1996) went beyond providing only empiricaldemonstrations that networks could reproduce accuracy and latency data onword and nonword reading to offer a mathematical analysis of the critical

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 6: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

770 PLAUT

feedforward) implementation of the framework in Fig 1 demonstrates thata network that maps orthography to semantics directly and via phonologycan reliably distinguish words from nonwords based on a measure offamiliarity dened over semantics

As background for the current work the next section presents a briefoverview of the approach taken by Plaut et al (1996) in accounting for theeffects of frequency and consistency in skilled word and nonword readingand in uent surface dyslexia The two simulation studies are then presentedfollowed by a general discussion

AN ACCOUNT OF FREQUENCY ANDCONSISTENCY EFFECTS IN NORMAL AND

IMPAIRED WORD READING

Seidenberg and McClelland (1989) based their orthographic andphonological representations on context-sensitive triples of letters andphonemic features (Wickelgren 1969) A central insight offered by Plaut etal (1996) was that such representations suffer from what was termed thedispersion problem and that it was this problem rather than any moregeneral failing of connectionist networks that lead to the poor nonwordreading performance of Seidenberg and McClellandrsquos model Specicallythe triples-based representations disperse the regularities between spellingand sound thereby hindering generalisation For example in theorthographic representation of the word GAVE the contribution made by theletter A depends entirely on the presence of the G and V there is no similarityto the contribution made by the A in SAVE or GATE and hence no basis for thetraining on the pronunciation of A in GAVE to support the samepronunciation in SAVE or G ATEmdashor in a nonword like M AVE

Plaut and colleagues hypothesised that the human reading systemgeneralises effectively because it uses representations that condensespellingndashsound regularities To support this hypothesis they designedorthographic and phonological representations in which within eachconsonant and vowel cluster letters and phonemes almost always activatedthe same units irrespective of context The representations were notintended to be fully general but rather to minimise the degree to whichspellingndashsound correspondences were dispersed across different units fordifferent words Plaut and colleagues found that when networks weretrained on essentially the same corpus as Seidenberg and McClellandrsquosmodel but using the new representations they were able to pronounce allthe words correctly and yet still generalise effectively to nonwordsMoreover the networks exhibited the empirical frequency 3 consistencyinteraction pattern when trained on actual (as opposed to logarithmically

MODELS OF WORD READING AND LEXICAL DECISION 771

FIG 2 (a) The frequency 3 consistency interaction exhibited in the settling time of anattractor network implementation of the phonological pathway in pronouncing words ofvarying frequency and spellingndashsound consistency (Plaut et al 1996 simulation 3) (b) Itsexplanation in terms of additive contributions of frequency and consistency subject to anasymptotic activation functionmdash s [ in equation (1) only the top half of which is shown FromPlaut et al (1996)

compressed) word frequencies even when naming latencies were modelleddirectly by the settling time of a recurrent attractor network (see Fig 2a)

Importantly Plaut et al (1996) went beyond providing only empiricaldemonstrations that networks could reproduce accuracy and latency data onword and nonword reading to offer a mathematical analysis of the critical

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 7: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 771

FIG 2 (a) The frequency 3 consistency interaction exhibited in the settling time of anattractor network implementation of the phonological pathway in pronouncing words ofvarying frequency and spellingndashsound consistency (Plaut et al 1996 simulation 3) (b) Itsexplanation in terms of additive contributions of frequency and consistency subject to anasymptotic activation functionmdash s [ in equation (1) only the top half of which is shown FromPlaut et al (1996)

compressed) word frequencies even when naming latencies were modelleddirectly by the settling time of a recurrent attractor network (see Fig 2a)

Importantly Plaut et al (1996) went beyond providing only empiricaldemonstrations that networks could reproduce accuracy and latency data onword and nonword reading to offer a mathematical analysis of the critical

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 8: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

772 PLAUT

factors that govern why the networks (and by hypothesis subjects) behaveas they do This analysis was based on a network that while simpler than theactual simulationsmdashit had no hidden units and employed Hebbian ratherthan error-correcting learningmdashretained many of the essentialcharacteristics of the more general framework For this simplied networkPlaut and colleagues derived an expression for how the response of thenetwork to any input (test) pattern depended on its experience with everypattern on which the network was trained as a function of the patternrsquosfrequency of training its similarity with the test pattern and the consistencyof its output with that of the test pattern Specically the response s[t] of anyj

output unit j to a given test pattern t is given by

(1)

in which the standard smooth non-linear sigmoidal inputndashoutput functionfor each unit s [ is applied to the sum of three terms (1) the cumulativefrequency of training on the pattern t itself F[t] (2) the sum of the frequenciesF[f] of the ldquofriendsrdquo of pattern t (patterns trained to produce the sameresponse for unit j) each weighted by its similarity (overlap) with t O [ft] and(3) minus the sum of the frequencies F[e] of the ldquoenemiesrdquo of pattern t(patterns trained to produce the opposite response) each weighted by itssimilarity to t O [et]

Many of the basic phenomena in word reading can be seen as naturalconsequences of adherence to this ldquofrequencyndashconsistencyrdquo equationFactors that increase the summed input to units (eg word frequencyspellingndashsound consistency) improve performance as measured by namingaccuracy andor latency but their contributions are subject to ldquodiminishingreturnsrdquo due to the asymptotic nature of the activation function s [ (seeFig 2b) As a result performance on stimuli that are strong in one factor isrelatively insensitive to variation in other factors Thus regular words showlittle effect of frequency and high-frequency words show little effect ofconsistency giving rise to the standard pattern of interaction betweenfrequency and consistency in which the naming of low-frequency exceptionwords is disproportionately slow or inaccurate

It is important to note however that equation (1) is only approximate fornetworks with hidden units and trained by error-correcting algorithms likeback-propagation (Rumelhart Hinton amp Williams 1986a) These twoaspects of the Plaut et al (1996) simulations are critical in that they help toovercome interference from enemies (ie the negative terms in equation 1)thereby enabling the networks to achieve correct performance on exceptionwordsmdashthat is words with many enemies and few if any friendsmdashas well ason regular words and nonwords

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 9: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 773

Although Plaut and colleagues demonstrated that implementations of thephonological pathway on its own can learn to pronounce words andnonwords as well as skilled readers a central aspect of their general theory isthat skilled reading more typically requires the combined support of boththe semantic and phonological pathways (for similar proposals see Hillis ampCaramazza 1991 Van Orden amp Goldinger 1994) and that individuals maydiffer in the relative competence of each pathway (Seidenberg 1992)Certainly semantic involvement is necessary to pronounce homographs likeWIN D and READ correctly Furthermore a semantic variablemdashimageabilitymdashinuences the strength of the frequency 3 consistency interaction in thenaming latencies and errors of skilled readers (Strain Patterson ampSeidenberg 1995) Moreover brain damage that impairs lexical semanticsmdashtypically to the left temporal lobemdashcan lead to surface dyslexia (seePatterson et al 1985) In its purest ldquouentrdquo form (eg MP Behrmann ampBub 1992 Bub Cancelliere amp Kertesz 1985 KT McCarthy ampWarrington 1986 HTR Shallice Warrington amp McCarthy 1983) surfacedyslexic patients read nonwords and regular words with normal accuracyand latency but exhibit an interaction of frequency and consistency in wordreading accuracy that mirrors the interaction shown by normal subjects inlatency That is surface dyslexic patients are disproportionately poor atpronouncing low-frequency exception words often giving a pronunciationconsistent with more standard spellingndashsound correspondences (eg SEW

read as ldquosuerdquo termed a regularisation error) In fact there can be a closecorrelation for individual patients between the lack of comprehension ofexception words and their likelihood of being regularised (Funnell 1996Graham et al 1994 Hillis amp Caramazza 1991) and the surface dyslexicpattern may emerge gradually as lexical semantic knowledge deteriorates inpatients with some types of progressive dementia such as semanticdementia (Graham et al 1994 Patterson amp Hodges 1992 Schwartz et al1979) or dementia of the Alzheimerrsquos type (Patterson Graham amp Hodges1994b)

The distributed connectionist framework depicted in Fig 1 provides anatural formulation of how contributions from both the semantic andphonological pathways might be integrated in oral reading At a moreabstract level given that phonological units simply sum their inputs from thetwo pathways the inuence of the semantic pathway can be incorporatedinto equation (1) by adding an additional term S[t] to the summed input ofunit j [In fact if it is assumed that this term scales with imageability this canaccount for the three-way interaction of frequency consistency andimageability in skilled reading found by Strain et al (1995)] Whenformulated explicitly in connectionist terms however this integration hasimportant implications for the nature of learning in the two pathwaysDuring training to the extent that the contribution of the semantic pathway

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 10: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

774 PLAUT

reduces the overall error the phonological pathway will experience lesspressure to learn On its own it may master only those items it nds easiest tolearn words high in frequency andor consistency (ie those items with largepositive terms in equation 1) low-frequency exception words may never belearned completely As a result brain damage that reduced or eliminated thesemantic pathway would reveal the latent limitation of the phonologicalpathway giving rise to surface dyslexia

In further simulations Plaut and colleagues explored the possibility thatthe surface dyslexic reading pattern might reect the natural limitations ofan intact but isolated phonological pathway that had learned to rely partiallyon semantic support Given that a full implementation of the semanticpathway was beyond the scope of their work they approximated thecontribution that such a pathway would make to the oral reading of eachword during training by providing the output (phoneme) units of theimplemented phonological pathway with external input that pushed themtowards their correct values for each word during training Semanticdamage then was modelled by weakening or removing this external inputPlaut and colleagues found that indeed a phonological pathway trained inthe context of support from semantics exhibited the central phenomena ofsurface dyslexia when semantics was impaired Figure 3 shows how differentdegrees of semantic impairment provide a reasonably good quantitative tto the performance levels of individual surface dyslexic patients

In summary Plaut et al (1996) provided connectionist simulations andmathematical analyses supporting a view of lexical processing in which thedistinctions between words and nonwords and between regular andexception words are not reected in the structure of the system but rather infunctional aspects of its behaviour as it brings all its knowledge to bear inprocessing an input An important insight that emerges from the approach isthat semantic and phonological processing are intimately related over thecourse of reading acquisition in normal skilled performance and in theeffects of brain damage The following two simulation experiments aim toelaborate and clarify the role of semantics in oral reading and other lexicaltasks Although it might be more natural to consider normal performance(in lexical decision) before impaired performance (in surface dyslexia) webegin with the latter issue as the relevant simulations are more closelyrelated to existing work

SIMULATION 1 INDIVIDUAL DIFFERENCES INDIVISION OF LABOUR

Plaut and co-workersrsquo (1996) account of surface dyslexia involves a causalrelationship between an impairment in semantic input to phonology and theoccurrence of (regularisation) errors in reading low-frequency exception

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 11: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

775

FIG 3 (a) The effect of gradual elimination of semantics on the correct performance of Plautand co-workersrsquo (1996) network after 2000 epochs of training with semantics for Taraban andMcClellandrsquos (1987) high-frequency (HF) and low-frequency (LF) regular consistent words(Reg) and exception words (Exc) and for Glushkorsquos (1979) nonwords and approximatepercentage of errors on the exception words that are regularisations (b) Performance of twosurface dyslexic patients (MP Behrmann amp Bub 1992 Bub et al 1985 KT McCarthy ampWarrington 1986) and the network with different amounts of semantic damage From Plaut etal (1996)

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 12: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

776 PLAUT

words Note that such an impairment might arise from damage to semanticsitself or from damage to the mapping from semantics to phonology in whichsemantics is spared Thus it is perfectly consistent with this account thatsome surface dyslexic patients have intact comprehension (see GrahamPatterson amp Hodges 1995 Watt Jokel amp Behrmann 1997) Nonethelessthe semantics-to-phonology impairment should manifest in non-readingtasks such as picture naming Consistent with this expectation all knownsurface dyslexic patients are anomic In fact when the anomia is restricted toabstract concepts as in patient DRB (Franklin Howard amp Patterson1995) reading errors correspondingly occur only on exception words withabstract meanings

Although Plaut and co-workersrsquo account does not require that surfacedyslexic patients have impaired comprehension it does seem to imply thatpatients who do have a semantic impairment should exhibit some degree ofsurface dyslexia Thus the account is challenged by the nding that there arepatients with semantic impairments who nonetheless read low-frequencyexception words with normal or near-normal accuracy The two clearestcases of this are DRN (Cipolotti amp Warrington 1995) and DC (LambonRalph et al 1995) (see also Coslett 1991 Cummings Houlihan amp Hill1986 Friedman Ferguson Robinson amp Sunderland 1992 Raymer ampBerndt 1994 Schwartz et al 1979)

Cipolotti and Warrington (1995) presented data on patient DRN a67-year-old biological scientist with focal atrophy of the temporal lobes(more pronounced on the left) most likely due to Pickrsquos diseaseBehavioural ly he exhibited what is known as semantic dementia orprogressive uent aphasia (Hodges Patterson Oxbury amp Funnell 1992Snowden Goulding amp Neary 1989) Semantic dementia is a relatively pureimpairment in semantic memory as reected by severe anomia and wordcomprehension difculties with relatively few other cognitive impairmentsCipolotti and Warrington compared DRNrsquos ability to give denitions towords and to read them aloud for sets of words that varied in frequency andspellingndashsound consistency We focus here on his performance on low-frequency exception words as these items provide the most sensitivemeasure of surface dyslexia In keeping with his semantic dementia DRNshowed a substantial impairment in generating denitions for low-frequencyexception words producing 39 correct (1128) and 14 correct (321) fortwo sets of words using a very lenient scoring criterion By contrast his oralreading of low-frequency exception words was essentially intact heproduced 95 correct (4648 and 2021) on each of two lists

Lambon Ralph et al (1995) described the performance of patient DCwho had attended school only until the age of 14 At the time of testing DCwas 85 years old and exhibited the clinical symptoms of dementia of theAlzheimerrsquos type (see Schwartz 1990) including very poor episodic memory

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 13: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 777

with uent well-structured speech and occasional word-nding difcultiesLike DRN DC was impaired at dening low-frequency exception wordsbut was within the normal range at reading them aloud In generatingdenitions for the low-frequency exception words on Patterson and Hodgesrsquo(1992) ldquosurfacerdquo list DC was 31 correct (1342) by a lax criterion and57 correct (2442) by a very lax criterion By contrast she was 95 correct(4042) in reading the words aloud Similar dissociations between generatingdenitions and pronunciations were observed for the object labels from thePALPA (Kay Lesser amp Coltheart 1992) and for word lists from Shallice etal (1983) and from Strain et al (1995)

One possible explanation for the lack of surface dyslexia in patients likeDRN and DC is that the task of generating denitions is far moresensitive to semantic or semantics-to-phonology damage than is the task ofreading low-frequency exception words and that surface dyslexia emergesonly with very severe semantic damage In fact inspection of the results fromprogressive deterioration of semantics in Plaut and co-workersrsquo (1996)simulation (Fig 3a) indicates that low-frequency exception word reading isaffected only when semantic strength is reduced to a level of 15 (from amaximum of 50) Consistent with this idea patient WLP (Schwartz et al1979) became impaired at reading exception words only when her semanticdeterioration became very severe (Schwartz Saffran amp Marin 1980)Moreover many uent surface dyslexia patients show evidence ofcomprehension impairments andor anomia that seem more severe thanthose of DRN and DC As examples MP (Bub et al 1985) was at chanceat wordndashword and picturendashword semantic categorisation only very slightlyabove chance at semantic category sorting and exhibited no semanticpriming in word naming and KT (McCarthy amp Warrington 1986) scored 0on the WAIS Vocabulary test could not perform the Peabody PictureVocabulary test (Dunn 1965) could name only 11 (14130) of a subset ofthe Snodgrass and Vanderwart (1980) pictures and was only slightly abovechance at picturendashword matching By comparison on object naming DRNwas 25 correct (520) and DC was 58 correct (128220) However somepatients are no more anomic than DRN and DC but show severe surfacedyslexia For example GC (Patterson Graham amp Hodges 1994a) wascorrect in naming 45 (118260) of the Snodgrass and Vanderwart picturesbut could pronounce correctly only 38 (1642) of low-frequency exceptionwords (cf 95 for both DRN and DC) Thus the severity of semanticimpairment alone does seem to provide a full account of the differences inreading performance among all of the relevant patients

An additional hypothesis is that individual differences in the severityof surface dyslexia can arise not only from differences in the degree ofsemantic damage but also from premorbid differences in the division oflabour and overall competence of the reading system Recall that the

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 14: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

778 PLAUT

simulations of Plaut et al (1996) demonstrate that the phonological pathwayon its own can if necessary master the entire training corpus includinglow-frequency exception words but that support from the semantic pathwayrelieves it from having to do so Thus the ultimate competence of thephonological pathway is a function both of its own learning propertiesmdashitssensitivity to frequency and consistency as expressed by equation (1)mdashandof the strength of the semantic contribution to phonology Both of thesefactors are matters of degree that would be expected to vary acrossindividuals More generally Plaut and co-workers pointed out that thecompetence and division of labour of the reading system would be expectedto be inuenced by a wide variety of factors including the nature of readinginstruction the sophistication of preliterate phonological representationsthe overall amount of reading practice relative experience in reading aloudversus silently the computational resources devoted to each pathway andthe readerrsquos more general skill levels in visual pattern recognition and inspoken word comprehension and production Broadly speaking the overallcompetence of the phonological and semantic pathways together would beexpected to improve gradually with reading experience but that factors thataided the development of one pathway would be expected to limit thecompetence of the other

In Plaut and co-workersrsquo (1996) simulation of surface dyslexia this tradingrelation between the semantic and phonological pathways dependsprimarily on two specic parameters The rst is the asymptotic strengthover training of the external correct input provided to phoneme unitstermed semantic strength which corresponds directly to the competence ofthe putative semantic pathway The second parameter is the magnitude ofthe pressure to keep weights in the phonological pathway small termedweight decay Weight decay is standardly used in connectionist modelling toimprove generalisation (ie nonword reading) by preventing the networkfrom developing arbitrarily large weights in learning the training data Withweight decay the magnitude of each weight becomes proportional to itsimportance in reducing training error (Hinton 1989) Weights can still growlarge however if aspects of the task demand it In fact this property iscritical in the current context if the network is to master exception wordsBecause the pronunciations of such words violate standard spellingndashsoundcorrespondences they typically require the involvement of larger weights tooverride the standard mapping Thus the higher the amount of weightdecay the more difcult it is for the network to learn to pronounce exceptionwords particularly those of low frequency In this way the magnitude ofweight decay controls the base level of competence that can be achieved bythe phonological pathway in isolation There are a number of possibleneurophysiological analogues of weight decay that would be expected tovary across individuals For instance if a connection weight corresponds to

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 15: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 779

the number and strength of synapses between neurons then the magnitudeof weight decay can be thought of as the level of metabolic activity that canbe supported for maintaining synapses Low values of weight decaycorrespond to high levels of metabolic activity and thus a high level ofcompetence in the network

Under this view of individual differences as parametric variations of Plautand co-workersrsquo (1996) simulation patients such as DRN and DC maycorrespond to parameter combinations that lead to a high degree ofcompetence in the phonological pathway The goal of the current simulationwas to determine the effects of parametric variations of semantic strengthand weight decay on the severity of surface dyslexia exhibited by networkimplementations of the phonological pathway following semanticimpairment

Method

The method used in the simulation follows that used by Plaut et al (1996simulation 4) and readers are referred to that paper for additional details Afeedforward network was trained to map from orthographic representationsof 2998 monosyllabic words to their phonological representations Theorthographic representations consisted of 108 grapheme units that codesingle- and multi-letter graphemes for the onset vowel and coda of thewritten form of each word The phonological representations werestructured in an analogous fashion over 61 phoneme units Each of thegrapheme units was connected to each of 100 hidden units which in turnwas connected to each phoneme unit Each of the hidden and phoneme unitsalso had a bias weight that determined its ldquorestingrdquo state in the absence ofother input This bias is equivalent to the weight on a connection from a unitwhose state is always equal to 10 and was learned in the same way as otherweights

During training a word was presented to the network by clamping thestates of units representing graphemes contained in the word to 1 and allothers to 0 The hidden and phoneme units then computed their states basedon this input The state sj of each unit j varied between 0 and 1 as a smoothreal-valued function of the summed weighted input from other units

(2)

where wij is the weight on the connection from unit i to unit j bj is the biasweight of j and exp [ is the exponential function This is the standardsigmoid function s [ referred to in the frequencyndashconsistency equation(equation 1)

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 16: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

780 PLAUT

FIG 4 The magnitude of the additional external input supplied to phoneme units by theputative semantic pathway as a function of training epoch for different values of semanticstrength (g in equation 3) for a frequency f 5 533 corresponding to the average Kucera andFrancis (1967) frequency of the Patterson and Hodges (1992) low-frequency words

The summed input to phoneme units was augmented with an additionalterm S corresponding to the frequency-weighted contribution of thesemantic pathway to phonology

(3)

where f is the Kucera and Francis (1967) frequency of the word and tindexes the training epoch (ie number of presentations of the training cor-pus) The parameter g termed semantic strength determined the asymptoticlevel of input from semantics and was varied from 10 to 50 in the currentsimulation Figure 4 shows the magnitude of this input over time for eachvalue of g using f 5 533 the average Kucera and Francis (1967) frequency ofthe Patterson and Hodges (1992) low-frequency words The sign of thisexternal input was set to drive phoneme units towards their correct states

The error E of the network was measured by the sum of two terms (1)the cross-entropy (see Hinton 1989) between the generated phoneme statessj and the correct or target phoneme states tj for the word and (2) aproportion of the sum of the squares of the weights

(4)

where l is the weight decay parameter and was varied between 000005 and

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 17: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 781

1Note that this manipulation is equivalent in the limit of small weight changes to the morenatural procedure of sampling words for training as a function of their frequency

2This level of correct performance was calculated over only the 14 pairings of semanticstrength and weight decay giving rise to non-perfect performance on the low-frequencyexception words (see Fig 5)

0001 in the current simulation Back-propagation (Rumelhart Hinton ampWilliams 1986b) was used to calculate the derivative of this error withrespect to each weight in the network This derivative was scaled by a factorreecting the frequency of occurrence of the word proportional to thesquare-root of its Kucera and Francis (1967) frequency1 Each of the 2998words was presented in turn and their scaled derivatives were accumulatedAt the end of each epoch the weights were changed in proportion to the sumof the current accumulated derivatives and the past weight step (scaled by amomentum parameter of 09) The weight change was scaled both by a globallearning rate of 0001 and by a connection-specic learning rate that wasinitialised to 10 and adapted using the delta-bar-delta procedure (Jacobs1988) with an additive increment of 01 and a multiplicative decrement of09

For each combination of semantic strength and weight decay the networkwas initialised with the same set of random weights (sampled uniformlybetween 6 01) and trained for 2000 epochs At this point it was subjected toa complete semantic ldquolesionrdquo by removing the external input to thephoneme units The network was then tested for its pronunciations of thehigh- and low-frequency regular and exception words from Patterson andHodgesrsquo (1992) ldquosurfacerdquo list and 86 nonwords from Glushko (1979 seePlaut et al 1996 for details on scoring)

Results and Discussion

Figure 5 displays the number of errors to the Patterson and Hodges wordsmade by the network after the elimination of semantics as a function of thesemantic strength (g in equation 3) and weight decay ( l in equation 4) usedduring training These error scores were entered into an ANOVA over itemswith frequency and consistency as between-item factors and semanticstrength and weight decay as within-item factors

As is apparent from Fig 5 the network exhibited a strong frequency 3consistency interaction in accuracy [F(1164) 5 176 P 0001] Moreover866 (174201) of the errors to exception words were regularisations and985 (11861204)2 of presentations of the Glushko nonwords werepronounced correctly Thus overall the network exhibited the hallmarkcharacteristics of surface dyslexia

However as is also apparent from Fig 5 the effects of frequency andconsistency depended strongly on the levels of semantic strength and weight

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 18: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

782

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 19: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

783

FIG 5 Numbers of errors produced by the network to Patterson and Hodgesrsquo (1992) (a)high-frequency regular words (b) low-frequency regular words (c) high-frequency exceptionwords and (d) low-frequency exception words when semantics is removed completely after2000 epochs of training as a function of the semantic strength (g in equation 3) and weightdecay ( l in equation 4) used during training

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 20: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

784 PLAUT

decay In fact there was a four-way interaction among these factors[F(162624) 5 425 P 0001] In general accuracy on low-frequencyexception words improves if either semantic strength or weight decay isreduced reaching perfect or near-perfect levels if either of these factors isextreme or if both are moderate (eg g 5 30 and l 5 00002)

Thus patients like DRN and DC who have semantic impairments butno surface dyslexia can be understood as falling at one end of a continuumof performance levels that would be expected under different divisions oflabour within the reading system Note that this perspective provides anaccount of how these two patients can exhibit qualitatively similarperformance even though their educational backgroundsmdashand hencepresumably their reading experiencemdashdiffer so dramatically DRN witha high degree of education may have a highly developed phonologicalpathway corresponding to low levels of weight decay (eg l 5 000005)Even with a strong semantic pathway (eg g 5 50) complete semanticdamage yields 905 correct (3842) on low-frequency exception wordswhich is comparable to DRNrsquos performance of 95 on such words Bycontrast DC with little formal education may have developed semanticand phonological pathways that are both weak (eg g 5 10 and l 5 0001)This combination also yields 905 correct (3842) on low-frequencyexception words which is comparable to the 95 correct performance ofDC

It should be kept in mind that the current demonstration involves thesimplifying assumptions that the patientsrsquo brain damage completelyeliminates the contribution of the semantic pathway and completely sparesthe phonological pathway In general the networkrsquos performance improveswith partial sparing of the semantic pathway (see Fig 3a) and degrades withpartial impairment of the phonological pathway (see Plaut et al 1996 g 20and table 9 pp 94ndash95) Thus the performance of individual patients mayalso reect a balance of these two factors In particular partial phonologicalpathway damage would seem to be implicated in patients in the later stagesof semantic dementia Such patients become impaired at pronouncingregular words as well as exception words although an advantage for regularwords remains (see Patterson et al 1996) Interestingly apart from visualerrors (which occur equally often to regular and exception words) errors onregular words are predominantly what Patterson terms LARC errorsmdashorLegitimate Alternative Reading of Components The classic LARC error isa regularisation of an exception word but such errors also occur to regularwords with inconsistent bodies As examples (from Patterson amp Hodges1992) PB named HO OT to rhyme with ldquofootrdquo YEAST like ldquobreastrdquo HEAR likeldquobearrdquo FM pronounced BROWN like ldquoblownrdquo HEAT like ldquothreatrdquo COST likeldquopostrdquo etc Thus although errors to regular and exception words differ inquantity they do not seem to differ in nature suggesting that they have a

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 21: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 785

common cause (see Patterson et al 1996 for further discussion) On thePlaut et al (1996) account LARC errors to both regular and exceptionwords derive from the sensitivity of learning in the phonological pathwayto graded degrees of frequency and consistency as reected in equation(1)

In summary the patterns of performance of patients like DRN and DCindicate that low-frequency exception word reading does not always dependon semantic support Nonetheless the current simulation demonstrates thatsuch patients fall within the distribution of performance patterns expectedfrom parametric variations within a system in which the division of labourand mutual support of the semantic and phonological pathways play acritical role Thus the role of semantics in skilled reading and the effects ofsemantic impairment on dyslexic reading remain of central importance inunderstanding the organisation and function of the reading system At amore general level the simulation results also underscore the importance ofconsidering not just the performance of individual patients but also the fulldistribution of performance patterns that can arise following brain damage(see Plaut 1995a for additional results and discussion)

We now turn to issues in normal reading that have seemed problematic forthe distributed connectionist accountmdashspecically how skilled readersdistinguish words from nonwords and how this process is inuenced by thenature of the stimuli in the task Just as with the performance ofbrain-damaged patients a consideration of the contribution of semanticswill prove crucial in understanding the performance of normal subjects

SIMULATION 2 LEXICAL DECISION

As mentioned in the Introduction the distinction between words andnonwords is of fundamental importance to the lexical system and manyresearchers incorporate this distinction into their models by representingwords as explicit structural entities such as logogens or localist word unitsSuch units provide a natural account of how skilled readers can accuratelydistinguish written words from nonwords in lexical decision (LD) tasksGiven that the current distributed connectionist approach to lexicalprocessing does not contain word-specic representations it becomesimportant to establish that distributed models can in fact perform lexicaldecision accurately and that in doing so they are inuenced by properties ofthe words and nonwords in the same way as human readers

Seidenberg and McClelland (1989) attempted to demonstrate thatdistributed representations can provide a sufcient basis for lexical decisionThey assumed that subjects make wordnonword decisions based on somemeasure of the ldquofamiliarityrdquo of the stimulus (Atkinson amp Juola 1973 Balota

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 22: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

786 PLAUT

amp Chumbley 1984) This familiarity measure could potentially be computedfrom a variety of types of information derived by the lexical systemmdashorthographic phonological and semantic Subjects are assumed to adopt aspecic strategy and decision criterion that allows fast responding withacceptable error rates depending on the composition of the word andnonword stimuli Given that orthographic information is provided directlyin the input subjects rely on orthographic familiarity when this provides asufcient basis for distinguishing words from nonwordsmdashfor example if thenonwords are orthographically illegal consonant strings (eg PSLR) If thenonword stimuli include very word-like orthographically legal nonwords(eg NUST) subjects may consider phonological familiaritymdashwhether thestimulus sounds like a word If the nonwords include so-calledldquopseudohomophonesrdquo (eg BRANE) that have the same pronunciation as aword subjects may have to base their decision on whether the orthographygenerates a familiar meaning

Although Seidenberg and McClellandrsquos general framework includessemantics their specic implemented network did not Accordingly theyfocused on demonstrating that under some conditions lexical decision canbe performed by a distributed network using only orthographic andorphonological information In their simulation the orthographic familiarityof a stimulus was measured by the discrepancy between the orthographicpattern presented to the network and the one regenerated by the networkfrom its hidden representation Seidenberg and McClelland showed that thisorthographic error score tended to be smaller for trained stimuli (words)than for novel stimuli (nonwords) although there was some overlap in thedistributions Moreover the degree of overlap depended on properties ofthe words and nonwords in a way that corresponded to the effects of theseproperties on LD accuracy and latency in empirical studies For example theoverlap is increased and hence lexical decision is slower and less accuratewhen the words are of lower frequency (eg Gordon 1983) or includeso-called ldquostrangerdquo words with unusual spelling patterns (eg AISLE G AUG EWaters amp Seidenberg 1985) or when the nonwords include pseudo-homophones (eg Coltheart Davelaar Jonasson amp Besner 1977 McCannBesner amp Davelaar 1988) Seidenberg and McClelland assumed that as theoverlap in orthographic error scores for words and nonwords increases thenetwork (and subjects) would have to turn to phonological andor semanticinformation to perform lexical decision accurately In particular thepresence of strange words causes subjects to rely on phonologicalinformation giving rise to effects of spellingndashsound consistency in lexicaldecision (Waters amp Seidenberg 1985)

However Besner and colleagues (Besner et al 1990 Fera amp Besner1992) challenged the claim that Seidenberg and McClellandrsquos modelprovides an adequate account of LD performance even under conditions in

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 23: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 787

which orthographic information is deemed sufcient (ie in the absence ofstrange words) Besner et al (1990) reported that when a decision criterionover orthographic error scores is set to yield an error rate on words of 52(as observed by Waters amp Seidenberg 1985 experiment 2) the false-positive rate on nonwords exceeded 28 Although Waters and Seidenbergdid not report nonword error rates this value is certainly higher than wouldbe expected of subjects especially if given unlimited time to respondMoreover Fera and Besner (1992) demonstrated that for human subjectsthe degree of overlap in orthographic error scores for words and nonwordshad no effect on LD accuracy and latency nor on the magnitude of apseudohomophone effect among nonwords These negative ndingsconcerning Seidenberg and McClellandrsquos implemented model and itspredictions call into question the more general claim that words can bedistinguished from nonwords by a distributed system lacking word-specicrepresentations

The goal of the current simulation was to demonstrate that lexical decisioncan be performed accurately when based on a familiarity measure applied tosemantics and that moreover such a measure exhibits the appropriatesensitivity to properties of the words and nonwords in the task It isimportant to be clear at the outset that this demonstration should not beinterpreted as implying that lexical decision always relies on semantics Theperspective taken is exactly that of Seidenberg and McClelland (1989)mdashsubjects performing lexical decision may adopt a variety of strategiesincluding those that depend only on orthographic andor phonologicalinformation as a function of the composition of the stimuli In fact subjectsare assumed to rely on these more surface representations as much aspossible It is acknowledged however that in some contexts orthographicand phonological representations will not support sufciently accurate LDperformance on their own The current work aimed to establish thatsemantic familiarity provides a sufcient basis for lexical decisionwithin a distributed lexical system and supports the hypothesis thatsubjects can perform lexical decision accurately by relying on semanticswhen necessary

Method

A feedforward network was trained to map from the orthographicrepresentations of the 2998 monosyllabic words in Plaut et alrsquos (1996)corpus to their phonological representations and to newly created semanticrepresentations The architecture of the network shown in Fig 6 cor-responded to an instantiation of the full framework for lexical processingin Fig 1 except that the depicted hidden units between orthography and

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 24: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

788 PLAUT

3This alteration was introduced only to reduce the time required to train the networkmdashit isnot intended as an important theoretical claim about the organisation of the semantic pathwayIn fact a network with two separate sets of 500 hidden units was developed subsequently andreplicated the current results

4Note that the portion of the network that maps to semantics still has fewer hidden units (n 51000) than words on which it is trained (n 5 2998) so the network cannot solve the task bydevoting a single hidden unit to each word (ie by developing word-specic representations) Infact networks with sigmoidal units and trained with error-correcting procedures likeback-propagation do not develop this ldquolocalistrdquo solution even when given far more hidden unitsthan training examples

FIG 6 The network architecture used to model naming and lexical decision Large arrowsrepresent full connectivity between layers except where indicated Small arrows indicate inputunits (incoming arrow) or output units (outgoing arrow)

semantics and between phonology and semantics were combined3

Specically 108 grapheme units were connected to 100 hidden units whichin turn were connected to 61 phoneme units The grapheme and phonemeunits were also connected to a second group of 1000 hidden units whichwere then connected to 200 semantic units such that each hidden unit had aprobability of 05 of being connected to each semantic unit A much largernumber of hidden units was needed to map to semantics than to map fromorthography to phonology because there is no systematicity between thesurface forms of words and their meanings and connectionist networks ndunsystematic mappings particularly difcult to learn (see HintonMcClelland amp Rumelhart 1986 for discussion)4 Including bias weights thenetwork contained a total of 283998 connections

The fact that the network has a feedforward architecture should not beinterpreted as a theoretical claim about the structure of the human readingsystem The underlying theory incorporates interactivity and its use informing attractors as an important computational principle Thus thecurrent feedforward network should be thought of as an approximation to a

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 25: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 789

5Training the feedforward network took just under 10 days of CPU time on a Silicon GraphicsR4400 (150 MHz) processor Based on prior experience training a fully recurrent version wouldhave taken approximately 20 times this amount of processing or about 6 months

fully recurrent one Training a recurrent version of the network washowever infeasible due to limitations in the available computationalresources5 Nonetheless there are certain aspects of the feedforwardapproximation that are important to point out In particular processing isunidirectional from phonology to semantics so that phonologicalrepresentations (derived from orthography) inuence semantics inperforming lexical decision but semantic representations cannot inuencephonology in naming As a result unlike the rst simulation the namingperformance of the current network does not reect a contribution ofsemantics and thus constitutes only a coarse approximation of humanperformance Even so Plaut et al (1996) demonstrated that with regard tonormal performance networks trained without semantics produce patternsof accuracy and latency results that are very similar to those of networkstrained with semantics

The semantic representations for words were designed to capture only themost abstract characteristics of word meaningsmdashnamely that they clusterinto categories and are arbitrarily related to orthographic and phonologicalrepresentations They were created in the following way First 120 randomprototype patterns were generated over 200 semantic features such that eachfeature had a probability Pa 5 01 of being active Then each prototype wasused to generate 25 exemplars that cluster around it by regenerating eachsemantic feature (using Pa 5 01) with a probability Pr 5 005 under theconstraint that each exemplar had to differ from every other exemplar by atleast three features This procedure created 3000 semantic exemplars eachwith an average number of active features of 2012 (SD 1712 range 14ndash26the mean is slightly greater than 20 due to the constraint on the minimumdifference between semantic patterns) Fifteen of these exemplars werechosen at random and discarded and the remaining 2985 patterns wereassigned randomly to the words in the training corpus (to avoid thedifculties involved in learning one-to-many mappings from orthography-to-semantics the 13 pairs of homographs in the corpus such as W IN D andREAD were assigned only one meaning) The orthographic and phonologicalrepresentations for the words were the same as used in Simulation 1 and byPlaut et al (1996)

Although the semantic patterns used in the simulation do not reect therelative similarities among the actual meanings of these words their randomassignment to words ensures that there is no systematic relationship betweenthe written and spoken form of each word and its meaning On the currentapproach it is only this property that is critical for demonstrating that

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 26: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

790 PLAUT

semantics can support lexical decision In fact abstract semanticrepresentations like those used in the current simulation have been usedsuccessfully to model empirical phenomena in a number of psycholinguisticdomains including word learning (Chauvin 1988) inectional morphology(Cottrell amp Plunkett 1991 Hoeffner 1992) lexical ambiguity resolution(Joordens amp Besner 1994 Kawamoto 1988 1993 Kawamoto Farrar ampKello 1994) semantic and associative priming (Masson 1995 Plaut 1995b)and rehabilitation of impaired reading via meaning (Plaut 1996)

Given that the number of incoming connections to units varied widely inparts of the network some sets of connections were initialised differentlythan others Specically weights in the network were initialised to randomvalues drawn uniformly between 6 1 except that hidden-to-phonemeweights were initialised between 6 02 hidden-to-semantics weights wereinitialised between 6 01 and the bias weights for phoneme and semanticunits were pre-set to 2 294444 (the magnitude of input producing an initialstate of 005) These conditions ensured effective learning at the very outsetof training but are otherwise irrelevant to the results reported below

The network was trained with back-propagation using a learning rate of00001 momentum of 09 (set to 00 for the rst 10 epochs) adaptiveconnection-specic learning rates (Jacobs 1988 increment of 01decrement of 09) and the cross-entropy error function without weightdecay (see equation 4) During training each presentation of a wordgenerated both a phonological pattern and a semantic pattern as output Theerror computed over the phoneme units was back-propagated to thegrapheme units and used to calculate derivatives for weights in theorthography-to-phonology pathway exactly as in Simulation 1 The errorcomputed over the semantic units was back-propagated both to thegrapheme units and to the phoneme units and used to calculate derivativesfor the orthography-to-semantics and phonology-to-semantics weightsSemantic error was not back-propagated through the phoneme units toinuence the derivatives for the orthography-to-phonology weightshowever to prevent the network from trading off phonological accuracy forsemantic accuracy These error derivatives were scaled by a factorproportional to the logarithm of the wordrsquos Kucera and Francis (1967)frequency and accumulated for each word before being used to change theweights at the end of each epoch

After 1300 epochs of training the network was tested for its performanceboth at naming and at lexical decision using the words in the training corpusand two lists of nonwords The rst list consists of 591 pronounceablenonwords created by Seidenberg et al (1994) from each unique body inSeidenberg and McClellandrsquos (1989) corpus and is referred to below as theldquobody-matchedrdquo nonword list It was used by Seidenberg and colleagues tocompare the nonword reading performance of 24 human subjects with the

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 27: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 791

performance of Plaut and McClellandrsquos (1993) network (a precursor to thesimulations of Plaut et al 1996) and the performance of the pronunciationrules in Coltheart and co-workersrsquo (1993) Dual Route Cascaded model Thislist was chosen because its size and diversity allows a thorough evaluation ofoverall LD accuracy The second list of nonwords (Seidenberg PetersenMacDonald amp Plaut 1996) contains 64 pseudohomophones and 64nonpseudohomophone control nonwords and is referred to below as theldquoPHnonPHrdquo nonword list The two sets of items are very closely matchedorthographically because they were constructed in groups of four byexchanging onsets and bodies (eg pseudohomophones JOAK and HOAPnonpseudohomophones HOAK and JOAP) Thus these nonwords provide astringent test for the existence of pseudohomophone effects in the absenceof orthographic confounds (see Seidenberg et al 1996 for discussion)

Lexical decision was based on a measure of the familiarity of the semanticpattern generated by a word or nonword A commonly used measure offamiliarity in distributed networks is the negative of the ldquoenergyrdquo S i jsisjwij

(Hopeld 1982) and a number of researchers (eg Besner amp Joordens1995 Borowsky amp Masson 1996 Joordens amp Becker 1997 Masson ampBorowsky 1995 Rueckl 1995) have proposed recently that it may bepossible to perform lexical decision in distributed models on the basis ofdifferences in the energy for words versus nonwords A serious drawback ofthis measure however is that it requires decision processes to have explicitaccess to the weights between units (analogous to the size and number ofsynapses between neurons) which is far less neurobiologically plausiblethan a procedure that need only access unit states

An appropriate alternative measure termed stress is based only on thestates of units Specically the stress Sj of unit j is a measure of theinformation content (entropy) of its state sj corresponding to the degree towhich it differs from rest

(5)

The stress of a unit is 0 when its state is 05 and approaches 1 as its stateapproaches either 0 or 1 The target semantic patterns for words are binaryand thus have maximal stress Because over the course of training thesemantic patterns generated by words increasingly approximate their targetpatterns the average stress of semantic units approaches 1 for words Bycontrast nonwords are novel stimuli that share graphemes with words thathave conicting semantic features As a result nonwords will typically fail todrive semantic units as strongly as words do producing semantic patternswith much lower average stress Accordingly the average stress of semanticunits here termed simply semantic stress should provide an adequate basisfor performing lexical decision We assume that LD responses are actuallygenerated by a stochastic decision process (eg Ratcliff 1978 Usher

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 28: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

792 PLAUT

amp McClelland 1995) in which a decision criterion is adopted such thatstimuli with stress values farther from this criterion are responded to morequickly

Results and Discussion

Naming After the 1300 epochs of training the network pronounced allof the 2998 words in the training corpus correctly (where homographs wereconsidered correct if they elicited either correct pronunciation) Moreoverthe cross-entropy error the network produced when tested on Patterson andHodgesrsquo (1992) high- and low-frequency regular and exception wordsreplicated the standard empirical nding of a frequency 3 consistencyinteraction in naming latency [means high-frequency regular 5 00022low-frequency regular 5 00051 high-frequency exception 5 00051low-frequency exception 5 00203 F(1164) 5 2056 P 0001]

With regard to nonword reading 914 (540591) of body-matchednonwords were given a pronunciation that either matched the pronunciationgiven by at least one of Seidenberg and co-workersrsquo (1994) 24 humansubjects or was consistent with the pronunciation of a word in the trainingcorpus with the same body For the PHnonPH nonwords the networkrsquospronunciations of 969 (6264) of the pseudohomophones and 953(6164) of the nonpseudohomophones matched that of some word in thetraining corpus with the same body

Thus overall the network exhibited the appropriate pattern of skilledperformance in naming words and nonwords

Lexical Decision Semantic stress values were calculated using equation(5) for the body-matched nonwords and for sets of high- medium- andlow-frequency words The words sets consisted of the 600 words in thetraining corpus with the highest Kucera and Francis (1967) frequency (mean6534 median 158) the 600 with median frequency (mean 8807 median 9)and the 600 with the lowest frequency (mean 06298 median 1) Figure 7displays the distributions of semantic stress for these words and nonwords

As Fig 7 shows there is very little overlap between the semantic stressvalues for words and those for nonwords In fact if a decision criterion isadopted such that a stimulus is accepted as a word if it generates semanticstress in excess of 0955 then the network produces error rates of 15 onboth words (271800) and nonwords (9591) and a d 9 value of 433Considering only low-frequency words a criterion of 095 yields 25 misses(15600) and 44 false-alarms (26591) and a d 9 of 367 For high-frequencywords a criterion of 0965 yields 0167 misses (1600) and false-alarms(1591) and a d 9 of 587 These very high levels of discriminability betweenwords and nonwords should be interpreted as reecting asymptotic

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 29: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 793

FIG 7 Distributions of semantic stress values for the body-matched nonwords (n 5 591Seidenberg et al 1994) and for high- medium- and low-frequency words from the trainingcorpus (n 5 600 for each)

performance in the absence of time pressure We assume that subjects cantrade accuracy for reduced latency under conditions in which they areencouraged to respond as quickly as possible while keeping error ratesacceptably low (eg by adjusting the response threshold in Ratcliffrsquos 1978diffusion model)

Moreover Fig 7 illustrates that the distributions of stress values for wordsvary systematically as a function of their frequency Specically the overlapwith nonwords increases as word frequency decreases This propertyprovides an account of the frequency blocking effect in lexical decisionGordon (1983 see also Glanzer amp Ehrenreich 1979) compared LD latencyto high- medium- and low-frequency words when presented in mixed blocksversus when blocked by frequency The blocking manipulation did notinuence the error rates for words in each frequency band nor the latenciesto low-frequency words By contrast LD latencies to medium- andhigh-frequency words were faster (by 19 and 40 msec respectively) whenpresented in pure blocks than when mixed with low-frequency words Thesendings make sense in the context of the distributions of semantic stressvalues shown in Fig 7 if subjects adjust their decision criterion to optimisetheir performance within each block given the composition of the stimuli Aconservative criterion is required to achieve acceptable levels of accuracywhen low-frequency words are present in the block (either mixed or pure)However when only higher-frequency words are present a more aggressivecriterion can be adopted that for the same error rate produces fasterresponding

The network also produced reliably higher semantic stress values for thepseudohomophones in the PHnonPH list compared with their non-

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 30: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

794 PLAUT

pseudohomophone control nonwords [means PH 5 09246 nonPH 509184 paired t(63) 5 2408 P 5 0019] The network tends to producegreater semantic stress for a pseudohomophone (eg H OAP) compared with acontrol nonword (eg JOAP) because the pronunciation it derives for thepseudohomophone (H OPE) was trained to help drive semantic units toextreme valuemdashspecically the (binary) semantic representation of thebase word Given that higher stress values for nonwords decrease theirdiscriminability from words this result corresponds to the empirical ndingof increased latency andor error rates to pseudohomophones comparedwith control nonwords in lexical decision under time pressure (eg Coltheartet al 1977 McCann et al 1987) Nonetheless in the absence of timepressure both types of nonwords are easily discriminated from even thelowest-frequency words For example a decision criterion of 095 produces25 errors to words (15600) 625 errors to pseudohomophones (464)and no errors to nonpseudohomophones (064) corresponding to a d 9 of382

In summary a network that maps orthography to semantics both directlyand via phonology performed lexical decision accurately if wordnonworddecisions were based on a measure of semantic familiarity termed stressthat reects the degree to which generated semantic patterns are binaryMoreover the distributions of stress values for different types of words andnonwords accounted for empirical ndings concerning the effects of wordfrequency and nonword pseudohomophony on LD performance In thisway the ndings establish clearly that words can be distinguished fromnonwords based on the functional properties of distributed semanticrepresentations without recourse to word-specic structuralrepresentations

GENERAL DISCUSSION

The traditional view of the lexical system stipulates rather complicated anddomain-specic structures and processes including those that apply toindividual words but not to nonwords or to some words (regulars) but not toothers (exceptions) The current article adopts an alternative view in whichlexical knowledge and processing develop through the operation of generallearning principles as applied to distributed representations of written andspoken words and their meanings Distinctions between words andnonwords and among different types of words are not reied in thestructure of the system but rather reect the functional implications of thestatistical structure relating orthographic phonological and semanticinformation The structural divisions within the system are assumed to arisefrom the neuroanatomic localisation of input and output modalities not

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 31: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 795

from differences in representational content (for similar arguments seeFarah 1994 Farah amp McClelland 1991)

This distributed view of lexical processing has been championed mostexplicitly by Van Orden et al (1990) and by Seidenberg and McClelland(1989) Seidenberg and McClelland supported the view with an explicitcomputational model but this support was limited by inadequacies in themodelrsquos ability to account for skilled performance in nonword reading and inlexical decision (Besner et al 1990) and impaired performance in uentsurface dyslexia (Patterson et al 1989) Plaut et al (1996) provided a moreadequate account of nonword reading and of surface dyslexia by improvingthe orthographic and phonological representations and by incorporating aninuence of semantics on word reading However the claimed role ofsemantics in reading as reected in the account of surface dyslexia has beenchallenged by the performance of patients whose word reading is unaffectedby semantic impairment (eg Cipolotti amp Warrington 1995 Lambon Ralphet al 1995) Moreover the limitations of Seidenberg and McClellandrsquosmodel in performing lexical decision remained unaddressed

The current simulations provide additional support for a distributedtheory of lexical processing by addressing these two challenges Theapproach taken to both involves a reconsideration and elaboration of therole of semantics in naming and lexical decision

Semantics in Word Naming Individual Differences

The rst simulation demonstrated that parametric variations within readingmodels give rise to individual differences in the overall competence anddivision of labour between the semantic and phonological pathways suchthat individuals who are able to pronounce low-frequency exception wordswithout semantic support fall at one end of a continuum In particular thesemantic and phonological pathways together learn to support skilled wordreading but the specic division of labour between the twomdashparticularly forlow-frequency exception wordsmdashdepends on factors that either improve thecompetence of the semantic pathway or impede the competence of thephonological pathway Two factors were investigated The rst theasymptotic strength of semantic support for phonology summarises avariety of factors that would be expected to inuence learning in thesemantic pathway particularly the mapping from orthography to semanticsThe second factor weight decay in the phonological pathway can bethought of as reecting the degree to which the underlying physiology cansupport large numbers of synapses and hence strong interactions betweenneurons Simulations demonstrated that the phonological pathway canpronounce low-frequency exception words without semantics if eithersemantic strength or weight decay is particularly low during training but

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 32: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

796 PLAUT

that in general semantic damage leads to some degree of readingimpairment (surface dyslexia)

On the one hand this explanation would seem to be at odds with claimsadvanced by Patterson and colleagues (Graham et al 1994 1995 Pattersonet al 1994a b Patterson and Hodges 1992) that semantic support is criticalto the integrity of phonological representations The current ndings suggestthat this property may still hold for most individuals but not for those withhighly developed phonological pathways On the other hand theexplanation for individual differences in division of labour relies critically onthe supposition that the semantic and phonological pathways combine tosupport skilled reading and that the semantic contribution to phonology hasimportant implications for the nature of learning in the phonologicalpathway In this way even though the current account allows for thepossibility that semantics plays a minimal role in the skilled readingperformance of some individuals (and hence the preservation of readingperformance despite semantic damage) it nonetheless incorporates thefundamental insight of Patterson and colleaguesmdashthat a consideration ofsemanticndashphonological interactions is critical for a general under-standing of the organisation and operation of the reading system Theaccount also retains the ability to explain why in individuals who do exhibitsurface dyslexia following semantic damage there is a close relationshipbetween the comprehension and correct pronunciation of individualexception words (Funnell 1996 Graham et al 1994 Hillis amp Caramazza1991)

The introduction of individual differences into the current explanationraises the question of whether Plaut and co-workersrsquo (1996) account ofsurface dyslexia is underconstrained After all a variety of types and degreesof impairment to the models give rise to the qualitative pattern of surfacedyslexia It should be pointed out though that the approach is no differentthan the traditional dual-route model in this respect (see Coltheart ampFunnell 1987) To be clear what is in question about the current account isthe specic claim regarding the relationship between premorbid differencesin division of labour and the quantitative severity of surface dyslexiafollowing semantic or semantic-to-phonological damage More extensivetesting of the non-reading capabilities of surface dyslexic patients is neededto address this concern to establish independent patterns of performancethat predict the severity of their reading impairment

What is not left underspecied by the approach is the computational basisfor the surface dyslexia pattern itself This pattern arises directly from theintrinsic sensitivity of learning in distributed networks to word frequencyand spellingndashsound consistency as expressed by the frequencyndashconsistencyequation (equation 1) for a two-layer Hebbian network In the normalsystem this sensitivity manifests as a frequency 3 consistency interaction in

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 33: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 797

naming latency Low-frequency exception words (eg PINT) are nameddisproportionately slowly because their vowel pronunciations (I THORN aI) havethe least support from friends and suffer the greatest competition fromenemies (eg I THORN I in PRIN T HINT M IN T etc) However when theperformance of the system is limited (eg by weight decay andor strongsemantic support during learning or by direct damage) these weaklysupported vowels may now lose the competition to their enemies resultingin regularisation errors (PINT THORN ldquopihntrdquo) High-frequency exceptions (egHAVE) are more immune to these inuences because their own frequency(F[t] in equation 1) counterbalances the effect of enemies (eg G AVE SAVEetc) However with greater impairment the vowels in these items alsobegin to lose the competition and are regularised (HAVE THORN ldquohaiverdquo) Withstill greater impairment low-frequency regular words (eg SO UR) begin toelicit errors If they have inconsistent bodies they are also subject tocompetition from enemies (eg POUR FOU R) and their errors reect thiscompetition (SOUR THORN ldquosorerdquo)mdashso-called LARC errors (Patterson et al1996) Thus on the current account the errors made by surface dyslexicpatients to all types of words and the pattern of naming latencies exhibitedby skilled readers have a common underlying cause the inherent sensitivityof distributed networks to frequency and consistency This is why thesimulations of surface dyslexia in particular account for the full pattern ofperformance across a range of levels of severity of impairment even whenthe specic factors that lead to a particular level of impairment in a specicpatient may be in need of further specication

It is important to emphasise that the postulated individual differences indivision of labour do not give rise to all possible patterns of readingimpairment following semantic damage allowing any observed pattern to beexplained For instance no parametric variation in the network will cause itto be severely impaired on low-frequency exception words without alsoshowing some impairment on high-frequency exception words (see Fig 5)or to be completely unable to read exception words while remainingunimpaired on regular words (Note that the standard dual-route theory hasno difculty exhibiting either of these patterns) Rather the patterns ofperformance that result from semantic impairment are highly constrainedThat is they all correspond to the specic pattern of uent surfacedyslexiamdasha frequency 3 consistency interaction in accuracy with poorreading of low-frequency exception words high rates of regularisations andnormal nonword reading Individual differences serve only to locate patientsalong a single continuummdashseverity of reading impairmentmdashwith thepossibility that some like DRN and DC fall at one end of the continuumand are essentially unimpaired This extreme pattern however belies thecommonality of the effects that are observed when the full distribution ofpatients is considered This commonality derives on the current account

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 34: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

798 PLAUT

from inherent properties of connectionist learning applied to distributedrepresentations Essentially the effects of word frequency and spellingndashsound consistency are inextricably related because they have the sameunderlying cause frequency-weighted sensitivity to inputndashoutput similaritywith frequency reecting the most similar item (the stimulus itself) andconsistency reecting the less similar items (its friends and enemies)

Semantics in Lexical Decision Stress as aMeasure of Familiarity

The second simulation addressed a further challenge to a distributed theoryof lexical processing by demonstrating that in a network that mappedorthography to semantics directly and via phonology a measure offamiliarity derived from semantic activationmdashtermed semantic stressmdashprovide a sufcient basis for distinguishing words from nonwordsMoreover this measure was shown to account for some aspects of howlexical decision performance is inuenced by the nature of the words andnonwords in the task namely the frequency blocking effect (Gordon 1983)and the pseudohomophone effect (Coltheart et al 1977 McCann et al1988)

The stress measure reects the strength with which semantic units aredriven from their ldquorestingrdquo activation level (05) towards an extreme value (0or 1) The reason why semantic stress distinguishes words from nonwordsstems from the lack of systematicity in the mapping from orthography tosemantics This lack of systematicity means that orthographic similarity isunpredictive of semantic similarity Thus during training the network mustlearn to map visually similar words (eg HAVE GAVE SAVE) to completelyunrelated sets of semantic features The presentation of a nonword (egM AVE) partially engages the mappings for visually similar words but becausethese mappings are inconsistent with each other they generate conictinginput to the semantic units resulting in only weak (non-binary) semanticactivation By contrast the considerable systematicity between orthographyand phonology enables the mappings for visually similar words to cooperateand collectively produce strongly active correct pronunciations fornonwords

Other researchers (eg Besner amp Joordens 1995 Borowsky amp Masson1996 Joordens amp Becker 1997 Masson amp Borowsky 1995 Rueckl 1995)have proposed performing lexical decision in distributed networks based onthe negative of the ldquoenergyrdquo in the network ( S i jsisjwij Hopeld 1982) Thecurrent stress measure has the advantage of being based entirely on unitactivations rather than requiring decision processes to have access to weightson connections between other units However it should be pointed out thatthe two measures are closely related To the extent that unit states are on the

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 35: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 799

6What is meant by ldquoappropriaterdquo here is that both unit states should be on the same side ofldquorestrdquo if the weight between them is positive and on opposite sides if the weight is negativeassuming that unit states are normalised relative to the rest value (ie 05 for standard [01]units) before entering into the energy equation

appropriate side of ldquorestrdquo given the sign of the weight between them6 thenincreasing stress (by moving states towards their extreme values) will alsodecrease energy Thus the present results would also be expected to hold atleast qualitatively if lexical decision were based on the negative of energy

Although the simulation establishes that lexical decision can beperformed accurately based on semantics it should not be interpreted asimplying that subjects always do so In fact following Seidenberg andMcClelland (1989) we assume that subjects can base their decisions on anyavailable information in the lexical system and that they adopt a strategythat optimises their performance given the composition of the stimuli Wealso assume that subjects will rely on orthographic information when thissufces given that such information is more reliable and more rapidlyavailable than either phonological or semantic information Thus thesimulation is not intended as a fully adequate account of lexical decisionunder all conditions In particular it is not intended to account for theinuence of orthographic factors such as neighbourhood density on lexicaldecision (see eg Andrews 1992 Sears Hino amp Lupker 1995) Howeverthe simulation does serve to demonstrate that subjects can fall back onsemantic information when necessarymdashfor example when orthographicallystrange words like AISLE and GAUGE or highly word-like pseudohomophoneslike HOAP and JO AK are included among the stimuli

An important aspect of LD performance that is not treated in detail in thecurrent work is a specication of the actual processing mechanism thatcomputes semantic stress and uses it to make wordnonword decisions in realtime One of the advantages of stress over an alternative measure offamiliarity like energy (Hopeld 1982) is that the computation of stress doesnot require access to the values of connection weights between units (whichare presumably inaccessible to other units) It is fairly straightforward for adecision process to compute semantic stress if it has access to the semanticunits as input Moreover Usher and McClelland (1995) have demonstratedrecently that competition between linear stochastic time-averaging unitsrepresenting alternative responses gives rise to a number of basic propertiesof empirical ndings in standard choice reaction time tasks including theshapes of time-accuracy functions latency-probability functions hazardfunctions and reaction-time distributions Their approach could be appliedto generate LD latencies in the current context by creating a ldquoyesrdquo unitwhose input is the current level of semantic stress and a ldquonordquo unit whoseinput constitutes a decision criterion that is adjusted to optimise

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 36: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

800 PLAUT

performance and having the network respond when one of the units exceedsa response criterion

CONCLUSION

The simulations described in this article in conjunction with thosedeveloped previously (Plaut et al 1996 Seidenberg amp McClelland 1989)illustrate how connectionist computational principlesmdashdistributedrepresentation structure-sensitive learning and interactivitymdashcan provideinsight into central empirical phenomena in normal and impaired lexicalprocessing Moreover they make it clear that distinctions in the function ofthe lexical systemmdashas manifest in the behaviour of experimental subjectsmdashneed not reect corresponding distinctions in the structure of the systemThus networks exhibit word-frequency effects and wordnonworddiscrimination without word representations and spellingndashsoundconsistency effects without separate mechanisms for regular and exceptionitems In this way gaining insight into the structure and function of thecognitive system by observing its normal and impaired behaviour mdashthecentral goal of cognitive psychology and neuropsychologymdashmay dependcritically on developing theories and explicit simulations in the context of aspecic computational framework that relates structure to function

The current work demonstrates how the distributed connectionistapproach can provide an effective theoretical framework for understandingword naming and lexical decision This is not to say that the existingdistributed models are fully adequate and account for all of the relevant datain sufcient detailmdashthis is certainly not the case In fact given that they aremodels they are abstractions from the actual processing system and arecertainly wrong in their details Nonetheless their relative success atreproducing key patterns of data in the domain of word reading and the factthat the very same computational principles are being applied successfullyacross a wide range of linguistic and cognitive domains suggests that thesemodels capture important aspects of representation and processing in thehuman language and cognitive system

REFERENCESAndrews S (1982) Phonological recoding Is the regularity effect consistent Memory and

Cognition 10 565ndash575Andrews S (1992) Frequency and neighborhood effects on lexical access Lexical similarity

or orthographic redundancy Journal of Experimental Psychology Learning Memory andCognition 18 234ndash254

Atkinson RC amp Juola JF (1973) Factors inuencing speed and accuracy of wordrecognition In S Kornblum (Ed) Attention and performance IV pp 583ndash612 New YorkAcademic Press

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 37: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 801

Balota DA amp Chumbley JI (1984) Are lexical decisions a good measure of lexica laccess The role of word frequency in the neglected decision stage Journal of ExperimentalPsychology Human Perception and Performance 10 340ndash357

Behrmann M amp Bub D (1992) Surface dyslexia and dysgraphia Dual routes a singlelexicon Cognitive Neuropsychology 9 209ndash258

Besner D amp Joordens S (1995) Wrestling with ambiguitymdashFurther reections Reply toMasson and Borowsky (1995) and Rueckl (1995) Journal of Experimental PsychologyLearning Memory and Cognition 21 515ndash301

Besner D amp Smith MC (1992) Models of visual word recognition When obscuring thestimulus yields a clearer view Journal of Experimental Psychology Learning Memory andCognition 18 468ndash482

Besner D Twilley L McCann RS amp Seergobin K (1990) On the connection betweenconnectionism and data Are a few words necessary Psychological Review 97 432ndash446

Borowsky R amp Masson MEJ (1996) Semantic ambiguity effects in word identicationJournal of Experimental Psychology Learning Memory and Cognition 22 63ndash85

Bub D Cancelliere A amp Kertesz A (1985) Whole-word and analytic translation ofspelling-to-sound in a non-semantic reader In K Patterson M Coltheart amp JC Marshall(Eds) Surface dyslexia pp 15ndash34 Hove Lawrence Erlbaum Associates Ltd

Chauvin Y (1988) Symbol acquisition in humans and neural (PDP) networks PhD thesisUniversity of California San Diego CA

Cipolotti L amp Warrington EK (1995) Semantic memory and reading abilities A casereport Journal of the International Neuropsychological Society 1 104ndash110

Coltheart M (1978) Lexical access in simple reading tasks In G Underwood (Ed)Strategies of information processing pp 151ndash216 New York Academic Press

Coltheart M (1985) Cognitive neuropsychology and the study of reading In MI Posner ampOSM Marin (Eds) Attention and performance XI pp 3ndash37 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M amp Funnell E (1987) Reading writing One lexicon or two In DA AllportDG MacKay W Printz amp E Scheerer (Eds) Language perception and production Sharedmechanisms in listening speaking reading and writing pp 313ndash339 New York AcademicPress

Coltheart M Davelaar E Jonasson J amp Besner D (1977) Access to the internal lexiconIn S Dornic (Ed) Attention and performance VI pp 535ndash555 Hillsdale NJ LawrenceErlbaum Associates Inc

Coltheart M Curtis B Atkins P amp Haller M (1993) Models of reading aloud Dual-route and parallel-distributed-processing approaches Psychological Review 100 589ndash608

Coslett HB (1991) Read but not write ldquoideardquo Evidence for a third reading mechanismBrain and Language 40 425ndash443

Cottrell GW amp Plunkett K (1991) Learning the past tense in a recurrent networkAcquiring the mapping from meaning to sounds In Proceedings of the 13th AnnualConference of the Cognitive Science Society pp 328ndash333 Hillsdale NJ Lawrence ErlbaumAssociates Inc

Cummings JL Houlihan JP amp Hill MA (1986) The pattern of reading deterioration indementia of Alzheimer type Observations and implications Brain and Language 29315ndash323

Dunn LM (1965) The Peabody picture vocabulary test Circle Pines American GuidanceService

Farah MJ (1994) Neuropsychological inference with an interactive brain A critique of thelocality assumption Behavioral and Brain Sciences 17 43ndash104

Farah MJ amp McClelland JL (1991) A computational model of semantic memoryimpairment Modality-specicity and emergent category-specicity Journal ofExperimental Psychology General 120 339ndash357

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 38: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

802 PLAUT

Fera P amp Besner D (1992) The process of lexical decision More words about a paralleldistributed processing model Journal of Experimental Psychology Learning Memory andCognition 18 749ndash764

Franklin S Howard D amp Patterson K (1995) Abstract word anomia CognitiveNeuropsychology 12 549ndash566

Friedman RB Ferguson S Robinson S amp Sunderland T (1992) Dissociation ofmechanisms of reading in Alzheimerrsquos disease Brain and Language 43 400ndash413

Funnell E (1996) Response biases in oral reading An account of the co-occurrence ofsurface dyslexia and semantic dementia Quarterly Journal of Experimental Psychology49A 417ndash314

Glanzer M amp Ehrenreich SL (1979) Structure and search of the internal lexicon Journalof Verbal Learning and Verbal Behavior 18 381ndash398

Glushko RJ (1979) The organization and activation of orthographic knowledge in readingaloud Journal of Experimental Psychology Human Perception and Performance 5674ndash691

Gordon B (1983) Lexical access and lexical decision Mechanisms of frequency sensitivityJournal of Verbal Learning and Verbal Behavior 22 24ndash44

Graham KS Hodges JR amp Patterson K (1994) The relationship betweencomprehension and oral reading in progressive uent aphasia Neuropsychologia 32299ndash316

Graham KS Patterson K amp Hodges JR (1995) Progressive pure anomia Insufcientactivation of phonology by meaning Neurocase 1 25ndash38

Hillis AE amp Caramazza A (1991) Category-specic naming and comprehensionimpairment A double dissociation Brain 114 2081ndash2094

Hinton GE (1989) Connectionist learning procedures Articial Intelligence 40 185ndash234

Hinton GE McClelland JL amp Rumelhart DE (1986) Distributed representations InDE Rumelhart JL McClelland amp the PDP Research Group (Eds) Parallel distributedprocessing Explorations in the microstructure of cognition Vol 1 Foundations pp 77ndash109Cambridge MA MIT Press

Hodges JR Patterson K Oxbury S amp Funnell E (1992) Semantic dementiaProgressive uent aphasia with temporal lobe atrophy Brain 115 1783ndash1806

Hoeffner J (1992) Are rules a thing of the past The acquisition of verbal morphology by anattractor network In Proceedings of the 14th Annual Conference of the Cognitive ScienceSociety pp 861ndash866 Hillsdale NJ Lawrence Erlbaum Associates Inc

Hopeld JJ (1982) Neural networks and physical systems with emergent collectivecomputational abilities Proceedings of the National Academy of Science USA 792554ndash2558

Jacobs RA (1988) Increased rates of convergence through learning rate adaptationNeural Networks 1 295ndash307

Joordens S amp Becker S (1997) The long and short of semantic priming effects in lexica ldecision Journal of Experimental Psychology Learning Memory and Cognition 231083ndash1105

Joordens S amp Besner D (1994) When banking on meaning is not (yet) money in the bankExplorations in connectionist modeling Journal of Experimental Psychology LearningMemory and Cognition 20 1051ndash1062

Kawamoto A (1988) Distributed representations of ambiguous words and their resolutionin a connectionist network In SL Small GW Cottrell amp MK Tanenhaus (Eds) Lexicalambiguity resolution Perspectives from psycholinguistics neuropsychology and articialintelligence San Mateo CA Morgan Kaufmann

Kawamoto AH (1993) Nonlinear dynamics in the resolution of lexical ambiguity Aparallel distributed processing approach Journal of Memory and Language 32 474ndash516

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 39: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 803

Kawamoto AH Farrar WT amp Kello CT (1994) When two meanings are better thanone Modeling the ambiguity advantage using a recurrent distributed network Journal ofExperimental Psychology Human Perception and Performance 20 1233ndash1247

Kay J Lesser R amp Coltheart M (1992) Palpa Psycholinguistic assessments of languageprocessing in aphasia Hove Lawrence Erlbaum Associates Ltd

Kucera H amp Francis WN (1967) Computational analysis of present-day AmericanEnglish Providence RI Brown University Press

Lambon Ralph M Ellis AW amp Franklin S (1995) Semantic loss without surfacedyslexia Neurocase 1 363ndash369

Marshall JC amp Newcombe F (1973) Patterns of paralexia A psycholinguistic approachJournal of Psycholinguistic Research 2 175ndash199

Masson MEJ (1995) A distributed memory model of semantic priming Journal ofExperimental Psychology Learning Memory and Cognition 21 3ndash23

Masson MEJ amp Borowsky R (1995) Unsettling questions about semantic ambiguity inconnectionist models Comment on Joordens and Besner (1994) Journal of ExperimentalPsychology Learning Memory and Cognition 21 509ndash514

McCann RS Besner D amp Davelaar E (1988) Word recognition and identication Doword frequency effects reect lexical access Journal of Experimental Psychology HumanPerception and Performance 14 693ndash706

McCarthy R amp Warrington EK (1986) Phonological reading Phenomena andparadoxes Cortex 22 359ndash380

McClelland JL amp Rumelhart DE (1981) An interactive activation model of contexteffects in letter perception Part 1 An account of basic ndings Psychological Review 88375ndash407

Meyer DE Schvaneveldt RW amp Ruddy MG (1974) Functions of graphemic andphonemic codes in visual word recognition Memory and Cognition 2 309ndash321

Morton J (1969) The interaction of information in word recognition Psychological Review76 165ndash178

Morton J amp Patterson K (1980) A new attempt at an interpretation or an attempt at anew interpretation In M Coltheart K Patterson amp JC Marshall (Eds) Deep dyslexia pp91ndash118 London Routledge and Kegan Paul

Paap KR amp Noel RW (1991) Dual route models of print to sound Still a good horse racePsychological Research 53 13ndash24

Patterson K amp Hodges JR (1992) Deterioration of word meaning Implications forreading Neuropsychologia 30 1025ndash1040

Patterson K Coltheart M amp Marshall JC (Eds) (1985) Surface dyslexia HoveLawrence Erlbaum Associates Ltd

Patterson K Seidenberg MS amp McClelland JL (1989) Connections and disconnectionsAcquired dyslexia in a computational model of reading processes In RGM Morris (Ed)Parallel distributed processing Implications for psychology and neuroscience pp 131ndash181Oxford Oxford University Press

Patterson K Graham N amp Hodges JR (1994a) The impact of semantic memory loss onphonological representations Journal of Cognitive Neuroscience 6 57ndash69

Patterson K Graham N amp Hodges JR (1994b) Reading in Alzheimerrsquos type dementiaA preserved ability Neuropsychology 8 395ndash412

Patterson K Plaut DC McClelland JL Seidenberg MS Behrmann M amp Hodges JR(1996) Connections and disconnections A connectionist account of surface dyslexia In JReggia R Berndt amp E Ruppin (Eds) Neural modeling of cognitive and brain disorderspp 177ndash199 New York World Scientic

Plaut DC (1995a) Double dissociation without modularity Evidence from connectionistneuropsychology Journal of Clinical and Experimental Neuropsychology 17 291ndash321

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 40: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

804 PLAUT

Plaut DC (1995b) Semantic and associative priming in a distributed attractor network InProceedings of the 17th Annual Conference of the Cognitive Science Society pp 37ndash42Hillsdale NJ Lawrence Erlbaum Associates Inc

Plaut DC (1996) Relearning after damage in connectionist networks Toward a theory ofrehabilitation Brain and Language 52 25ndash82

Plaut DC amp McClelland JL (1993) Generalization with componential attractors Wordand nonword reading in an attractor network In Proceedings of the 15th Annual Conferenceof the Cognitive Science Society pp 824ndash829 Hillsdale NJ Lawrence Erlbaum AssociatesInc

Plaut DC McClelland JL Seidenberg MS amp Patterson K (1996) Understandingnormal and impaired word reading Computational principles in quasi-regular domainsPsychological Review 103 56ndash115

Ratcliff R (1978) A theory of memory retrieval Psychological Review 85 59ndash108Raymer AM amp Berndt RS (1994) Models of word reading Evidence from Alzheimerrsquos

disease Brain and Language 47 479ndash482Rueckl JG (1995) Ambiguity and connectionist networks Still settling into a solution

Comment on Joordens and Besner (1994) Journal of Experimental Psychology LearningMemory and Cognition 21 501ndash508

Rumelhart DE Hinton GE amp Williams RJ (1986a) Learning internal representationsby error propagation In DE Rumelhart JL McClelland amp the PDP Research Group(Eds) Parallel distributed processing Explorations in the microstructure of cognition Vol 1Foundations pp 318ndash362 Cambridge MA MIT Press

Rumelhart DE Hinton GE amp Williams RJ (1986b) Learning representations byback-propagating errors Nature 323 533ndash536

Schwartz MF (Ed) (1990) Modular decits in Alzheimer-type dementia Cambridge MAMIT Press

Schwartz MF Marin OSM amp Saffran EM (1979) Dissociations of language function indementia A case study Brain and Language 7 277ndash306

Schwartz MF Saffran EM amp Marin OSM (1980) Fractioning the reading process indementia Evidence for word-specic print-to-sound associations In M Coltheart KPatterson amp JC Marshall (Eds) Deep dyslexia pp 259ndash269 London Routledge andKegan Paul

Sears CR Hino Y amp Lupker SJ (1995) Neighborhood size and neighborhoodfrequency effects in word recognition Journal of Experimental Psychology HumanPerception and Performance 21 876ndash900

Seidenberg MS (1992) Beyond orthographic depth Equitable division of labour In RFrost amp K Katz (Eds) Orthography phonology morphology and meaning pp 85ndash118Amsterdam Elsevier

Seidenberg MS amp McClelland JL (1989) A distributed developmental model of wordrecognition and naming Psychological Review 96 523ndash568

Seidenberg MS Waters GS Barnes MA amp Tanenhaus MK (1984) When doesirregular spelling or pronunciation inuence word recognition Journal of Verbal Learningand Verbal Behaviour 23 383ndash404

Seidenberg MS Plaut DC Petersen AS McClelland JL amp McRae K (1994) Non-word pronunciation and models of word recognition Journal of Experimental PsychologyHuman Perception and Performance 20 1177ndash1196

Seidenberg MS Petersen A MacDonald MC amp Plaut DC (1996) Pseudohomophoneeffects and models of word recognition Journal of Experimental Psychology LearningMemory and Cognition 22 48ndash62

Shallice T Warrington EK amp McCarthy R (1983) Reading without semanticsQuarterly Journal of Experimental Psychology 35A 111ndash138

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 41: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general

MODELS OF WORD READING AND LEXICAL DECISION 805

Snodgrass JG amp Vanderwart M (1980) A standardized set of 260 pictures Norms forname agreement image agreement familiarity and visual complexity Journal ofExperimental Psychology Learning Memory and Cognition 6 174ndash215

Snowden JS Goulding PJ amp Neary D (1989) Semantic dementia A form ofcircumscribed cerebral atrophy Behavioral Neurology 2 167ndash182

Strain E Patterson K amp Seidenberg MS (1995) Semantic effects in single-word namingJournal of Experimental Psychology Learning Memory and Cognition 21 1140ndash1154

Taraban R amp McClelland JL (1987) Conspiracy effects in word recognition Journal ofMemory and Language 26 608ndash631

Usher M amp McClelland JL (1995) On the time course of perceptual choice A model basedon principles of neural computation Technical Report PDPCNS955 Pittsburgh PACarnegie Mellon University Department of Psychology

Van Orden GC amp Goldinger SD (1994) Interdependence of form and function incognitive systems explains perception of printed words Journal of ExperimentalPsychology Human Perception and Performance 20 1269

Van Orden GC Pennington BF amp Stone GO (1990) Word identication in readingand the promise of subsymbolic psycholinguistics Psychological Review 97 488ndash522

Waters GS amp Seidenberg MS (1985) Spellingndashsound effects in reading Time course anddecision criteria Memory and Cognition 13 557ndash572

Watt S Jokel R amp Behrmann M (1997) Surface dyslexia in progressive aphasia Brainand Language 56 211ndash233

Wickelgren WA (1969) Context-sensitive coding associative memory and serial order in(speech) behavior Psychological Review 76 1ndash15

Page 42: Structure and Function in the Lexical System: Insights ...plaut/papers/pdf/Plaut97LCP.structure.pdf · alternative approach views lexical knowledge as developing from general