15
Does working memory have a single capacity limit? Robert Taylor , Hana Thomson, David Sutton, Chris Donkin School of Psychology, University of New South Wales, Australia article info Article history: Received 11 April 2016 revision received 13 September 2016 Keywords: Stimulus complexity Visual working memory Computational models abstract Debate continues over whether visual working memory has a single, fixed capacity. Empirically, performance in working memory tasks worsens as the complexity of stimuli increases. However, there exist two explanations for this result. One proposal is that visual working memory is capable of holding fewer complex stimuli. The alternative proposal is that visual working memory can store 3–4 items, irrespective of their complexity. According to this fixed-capacity explanation, performance is worse for complex items because discrimination between complex items is more difficult than discrimination between simple items. These so-called comparison errors are more likely with complex items, and when left unaccounted for, lead to an underestimate of the capacity of working memory. Previous attempts at resolving this debate have relied on clever empirical manip- ulations of the similarity between stimuli. However, such approaches change the task that is given to the participant, and so may also change the way that participants use their memory. Here, we use a standard change detection task, but use a measurement model to estimate both the capacity of memory, and the probability of comparison errors. We apply the model to two change detection experiments in which we varied the complexity of the stimuli that participants must remember. Critically, we found that capacity esti- mates, and not comparison error estimates, varied depending upon stimulus complexity. Our results suggest that the number of items that can be stored is dependent on the com- plexity of the stimuli. Ó 2016 Elsevier Inc. All rights reserved. Introduction There is continuing debate within the field of visual working memory (VWM) regarding the processes through which items are stored and the capacity limits imposed by these processes. The discrete slots view suggests that only a limited number of items can be retained in memory (Luck & Vogel, 1997), whereas the continuous resource view suggests that a limited pool of resources may be flex- ibly allocated across all items, with no necessary constraint on the number of items that can be stored (Ma, Husain, & Bays, 2014; Wilken & Ma, 2004). Early evidence suggested that VWM capacity is limited to around 3–4 visual items, and that this capacity was invariant to changes in the number of stimulus features (Fukuda, Awh, & Volgel, 2010; Luck & Vogel, 1997; Vogel, Woodman, & Luck, 2001). However, there have since been multiple demonstrations of a detriment to performance in visual working memory tasks due to additional stimulus complexity (Cowan, Blume, & Saults, 2013; Fougnie, Asplund, & Marois, 2010; Hardman & Cowan, 2015; Oberauer & Eichenberger, 2013; Olson & Jiang, 2002). Though it is now largely agreed upon that performance in visual working memory tasks is worse for more complex stimuli, the cause for this result remains an open issue. Alvarez and Cavanagh (2004) were one of the first to demonstrate that performance worsened for more com- plex stimuli. They showed that there was an inverse linear http://dx.doi.org/10.1016/j.jml.2016.09.004 0749-596X/Ó 2016 Elsevier Inc. All rights reserved. Corresponding author at: School of Psychology, University of New South Wales, Kensington, NSW 2052, Australia. E-mail address: [email protected] (R. Taylor). Journal of Memory and Language 93 (2017) 67–81 Contents lists available at ScienceDirect Journal of Memory and Language journal homepage: www.elsevier.com/locate/jml

Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

Journal of Memory and Language 93 (2017) 67–81

Contents lists available at ScienceDirect

Journal of Memory and Language

journal homepage: www.elsevier .com/locate / jml

Does working memory have a single capacity limit?

http://dx.doi.org/10.1016/j.jml.2016.09.0040749-596X/� 2016 Elsevier Inc. All rights reserved.

⇑ Corresponding author at: School of Psychology, University of NewSouth Wales, Kensington, NSW 2052, Australia.

E-mail address: [email protected] (R. Taylor).

Robert Taylor ⇑, Hana Thomson, David Sutton, Chris DonkinSchool of Psychology, University of New South Wales, Australia

a r t i c l e i n f o

Article history:Received 11 April 2016revision received 13 September 2016

Keywords:Stimulus complexityVisual working memoryComputational models

a b s t r a c t

Debate continues over whether visual working memory has a single, fixed capacity.Empirically, performance in working memory tasks worsens as the complexity of stimuliincreases. However, there exist two explanations for this result. One proposal is that visualworking memory is capable of holding fewer complex stimuli. The alternative proposal isthat visual working memory can store 3–4 items, irrespective of their complexity.According to this fixed-capacity explanation, performance is worse for complex itemsbecause discrimination between complex items is more difficult than discriminationbetween simple items. These so-called comparison errors are more likely with complexitems, and when left unaccounted for, lead to an underestimate of the capacity of workingmemory. Previous attempts at resolving this debate have relied on clever empirical manip-ulations of the similarity between stimuli. However, such approaches change the task thatis given to the participant, and so may also change the way that participants use theirmemory. Here, we use a standard change detection task, but use a measurement modelto estimate both the capacity of memory, and the probability of comparison errors. Weapply the model to two change detection experiments in which we varied the complexityof the stimuli that participants must remember. Critically, we found that capacity esti-mates, and not comparison error estimates, varied depending upon stimulus complexity.Our results suggest that the number of items that can be stored is dependent on the com-plexity of the stimuli.

� 2016 Elsevier Inc. All rights reserved.

Introduction

There is continuing debate within the field of visualworking memory (VWM) regarding the processes throughwhich items are stored and the capacity limits imposed bythese processes. The discrete slots view suggests that onlya limited number of items can be retained in memory(Luck & Vogel, 1997), whereas the continuous resourceview suggests that a limited pool of resources may be flex-ibly allocated across all items, with no necessary constrainton the number of items that can be stored (Ma, Husain, &Bays, 2014; Wilken & Ma, 2004).

Early evidence suggested that VWM capacity is limitedto around 3–4 visual items, and that this capacity wasinvariant to changes in the number of stimulus features(Fukuda, Awh, & Volgel, 2010; Luck & Vogel, 1997; Vogel,Woodman, & Luck, 2001). However, there have since beenmultiple demonstrations of a detriment to performance invisual working memory tasks due to additional stimuluscomplexity (Cowan, Blume, & Saults, 2013; Fougnie,Asplund, & Marois, 2010; Hardman & Cowan, 2015;Oberauer & Eichenberger, 2013; Olson & Jiang, 2002).Though it is now largely agreed upon that performancein visual working memory tasks is worse for more complexstimuli, the cause for this result remains an open issue.

Alvarez and Cavanagh (2004) were one of the first todemonstrate that performance worsened for more com-plex stimuli. They showed that there was an inverse linear

Page 2: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

68 R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81

relationship between visual complexity and workingmemory capacity, as operationalized by visual search timeand the conventional change detection task, respectively.They found that capacity estimates were smaller for itemswith slower search rates, and concluded that fewer visuallycomplex items can be held in visual working memory.Alvarez and Cavanagh (2004) concluded that the limit onthe number of items that can be held in visual workingwas not fixed, and is jointly affected by the number ofitems that must be remembered and the complexity ofthe stimuli.

Advocates of the slots model highlighted a critical flawin Alvarez and Cavanagh’s (2004) claim that people storedfewer complex items than simple items. Alvarez andCavanagh (2004) used the standard method for calculatingcapacity, which attributes all incorrect responses to a fail-ure to store an item. As such, increased errors in a changedetection task are associated with smaller capacity esti-mates; however, if another type of error is contributing toincorrect responses in the change detection task, thencapacity will be underestimated. Awh, Barton, and Vogel(2007) argued that errors in change detection tasks can alsoarise through perceptual confusion between the contents ofmemory and the test item, and that such confusion isincreasingly likely for more visually complex items. Funda-mentally, Awh et al.’s argument is that participants are ableto store approximately 3–4 complex items in memory, buta reduction in the mnemonic resolution for complex itemsincreases the probability of making a comparison error,which ultimately confounds estimates of item capacity.

To support this claim, Awh et al. (2007) conducted anexperiment in which participants were asked to rememberstimulus sets that included two types of complex items –shaded cubes and Chinese characters. At test two typesof changes could occur. A within-category change involveda stimulus being swapped for another member of the samecategory; e.g., one Chinese character changes to another.Conversely, a cross-category change involved a stimulusbeing swapped for a member of the other category; e.g.,a Chinese character being replaced by a shaded cube. Thecritical finding was that participants struggled to detectwithin-category changes. On the other hand, cross-category detection was excellent; in fact, participants wereequally able to detect cross-category changes as they wereable to identify changes between simple colors. Awh et al.(2007) concluded that within-category changes involvingcomplex stimuli increased the rate of comparison errors.

Scolari, Vogel, and Awh (2008), Barton, Ester, and Awh(2009) and Umemoto, Scolari, Vogel, and Awh (2010) havealso shown that cross-category change detection of com-plex items is superior to within-category discrimination,and have further argued for a fixed number of discreteslots. In Barton et al. (2009) and Umemoto et al. (2010),the authors also attempted to estimate the error rateassociated with comparing complex within-categorystimuli. They reasoned that a participant’s ability to detectcross-category changes provided a pure estimate of itemcapacity. Thus, any additional errors committed in thewithin-category change condition could be attributed tothe comparison of the test item with the contents ofmemory. To address this issue they proposed a model that

could measure both the probability that an item wasstored and the probability that a comparison error wasmade, given the item had been stored in memory.However, despite the excellent motivation behind themodel, Morey, Morey, Brisson, and Tremblay (2012)showed that this particular implementation was incorrect.

Brady and Alvarez (2015) also raise concerns regardingthe validity of capacity estimates derived from cross-category changes. They instead suggest that participantsform global representations of the study array which facil-itate detection of change. For example, on some arrays theChinese characters (or cubes) may be clustered moreclosely together, whereas on others the items may belocated more diffusely. They found that when items ofone category were clustered together, cross-categorychanges could be made without having to attend to theitem characteristics, because the change was quite obvi-ous. Conversely, as the items became more dispersed,cross-category change detection grew poorer. To furthersupport their claim, they demonstrated that when thestudy arrays were more heterogeneous (consisting of morecomplex items than just cubes and Chinese characters),and so limit the ability to form global representations,capacity estimates dropped to approximate 1–2 complexitems for cross-category changes, consistent with the find-ings from Alvarez and Cavanagh (2004). Brady and Alvarez(2015) conclude that performance using large changesoverestimates capacity, which challenges the assumptionof a fixed working memory capacity.

Given the foregoing, it is unclear as to whether itemcapacity remains invariant to item complexity. However,the two positions are clear and distinct. One proposal isthat the number of items stored in memory is invariantto stimulus complexity, but that complex items will leadto comparison errors. The alternative proposal is thatworking memory can hold fewer complex items thansimple items, irrespective of comparison errors. We setout to discern between these two very specific hypotheses.

Our approach differs from previous attempts to answerthe question of a fixed capacity of visual working memory.We develop and apply a computational model that simplyinfers both comparison errors and capacity from perfor-mance in a standard change detection task. As such, wedo not require within- and cross-category changes, andthus leave the task for the participant as simple as possible,free from any potential issues with the encoding of itemsfrom different categories (Brady & Alvarez, 2015). Themodel we use is inspired by that of Barton et al. (2009),but corrects the issues raised by Morey et al. (2012).

We apply the model to two experiments in which wevary the particular stimuli that participants must remem-ber. By estimating both capacity and comparison errors,we provide a direct test of the two claims over the influ-ence of complexity on visual working memory. Accordingto Alvarez and Cavanagh (2004), we should expect thatthe effect of complexity influences the number of itemsthat can be held in memory (i.e., smaller capacity). Accord-ing to Awh et al. (2007), we should expect that the samenumber of items will always enter into memory, but morecomplex items will yield a greater number of errors oncethe items are in memory (i.e., more comparison errors).

Page 3: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81 69

Experiment 1

In Experiment 1 we used the single-probe changedetection paradigm (Cowan et al., 2005). Participants arepresented with an array of items to study and must decidewhether a single test item is the same as one of the studyitems, or has changed. Participants completed four ses-sions, and in each session they were presented with onlyone of four different types of stimuli: letters, words, colors,or complex shapes.

Participants

Twenty participants were recruited via an on-linerecruitment system at the University of New South Wales.Participants completed four experimental sessions, in thelab, each of which lasted approximately one hour. Partici-pants had the option to complete multiple sessions perday, though a minimum break of one hour was requiredbetween sessions. All participants were paid $15 persession.

Stimuli

The experimental stimuli consisted of four sets of stim-uli, as shown in Fig. 1. There was a list of 10 letters (B, D, G,H, K, L, M, P, R, and S), and a list of 10 words (BRIM, DUSK,GLUE, HERB, KNOT, LUMP, MAST, PORK, RAIL and SEAM).The words were approximately matched for frequencyaccording to the Kucera-Francis written frequency norms(Kucera & Francis, 1967). The visual stimuli were com-posed of either 10 highly-discriminable colors (dark-blue-green, purple, cyan, orange, yellow, green, blue, red,white and black), or 10 novel complex shapes. Stimuli werepresented using a 24 inch LCD monitor located approxi-mately 60 cm from the participant. All stimuli were madeto be of a similar size, having an approximate visual angleof .75 � .75 degrees. On a given trial, the set of study itemswere randomly positioned within a 8.8 � 7.3 degree array,subject to the constraint that there be at least a 2 degreeseparation between all items, and from any item to thecenter of the array.

Design

All participants completed a single session with eachstimulus type. For each session participants completed540 trials, which were broken into 9 blocks of 60 trials.

Fig. 1. The four sets of stimuli used in each experiment. The top left are the colorand right are letters and words, respectively. Note that the white color stimulus dthis figure legend, the reader is referred to the web version of this article.)

Within each session both set size, N = [2,5,8], and the pro-portion of trials on which the test item changed from thestudy item, p = [0.3,0.5,0.7], were manipulated. Set sizewas manipulated across trials and probed equally oftenfor same and change trials. Change proportion was manip-ulated across blocks, subject to the constraint that allchange proportion conditions were completed before theywere repeated. Within any given session the stimulus typeremained unchanged, though the order participantsreceived each stimulus type was randomized.

Procedure

Participants were instructed that they would undergo asimple memory task. At the start of each session, partici-pants were introduced to the stimulus set to be shown dur-ing that hour. Prior to each block the participant wasinformed of the proportion of change trials in the currentblock (with a verbal description, and a pie graph). Trialsbegan with a fixation cross presented in the center of thescreen for 500 ms. Next, a study array consisting of N itemswas presented on the screen for 1000 ms. Following thestudy array a black screen was displayed for 500 ms, whichwas then followed by a 500 ms screen containing multicol-ored mask stimuli at the location of each of the studyitems. A single test item was then presented in one ofthe locations in which a study item was presented. Onchange trials, the test item was a stimulus that had notbeen presented during the study phase. A circular cue sur-rounded the test item. The participant was asked torespond as to whether the test item matched the studyitem presented in that location by pressing the ‘‘J” key, orwhether the test item had changed by pressing the ‘‘F”key. Responses were self-paced. Once a response had beenmade a 1000 ms feedback screen indicated whether theresponse was correct or not, after which a 1000 ms blankscreen was presented before the next trial began.

A measurement model for capacity and comparison errors

Model overviewThe model that we apply was designed to reflect the

two assumptions that are fundamental to the Awh et al.(2007) theory. First, we assume that the observer is cap-able of storing a subset, k, of the N items they are askedto remember. Second, if an item has been stored in mem-ory, then there is some probability that the observer willmake a comparison error, and thus give an incorrect

stimuli, the top right are the complex shape stimuli, while the bottom leftid not have a black border. (For interpretation of the references to color in

Page 4: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

70 R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81

response. In other words, the model does not assume thatitems stored in memory can be recalled with perfectprecision.

In certain instantiations of the slots model there is anassumption that responses based upon memory will beperfect. This assumption has been referred to as the ’cer-tainty assumption’ (Rouder, Province, Swagman, & Thiele,submitted for publication). However, the model detailedhere is based upon that outlined in Rouder et al.(submitted for publication) and relaxes this assumption.Specifically, we assume that responses based on memorymay be incorrect due to a poor mapping between the testitem and the contents of memory, which results in a com-parison error. Like Barton et al. (2009), our model proposestwo independent processes that influence an observersability to detect change: the probability that an item hasbeen stored in memory, and the probability of making acomparison error.

Model specificationWe assume that study items are allocated to a limited

number of slots. The average number of available slots,denoted by the capacity parameter, k, is assumed toremain fixed across changes in both set size, N, and changeproportion, p. Note that the slots model assumes thatcapacity is a discrete quantity. However, conventionallycapacity is estimated as if it were a continuous quantity,which approximates variability in capacity across trials.While our model does not explicitly assume that the num-ber of slots varies across trials, we adopt the convention ofestimating an average capacity value.

The probability that the probed item is successfullystored in memory, given the ith set size, is given by

Fig. 2. Description of the comparison error slo

di ¼ min kNi;1

� �. The ‘min’ function simply says that if the

number of study items is smaller than the available capac-ity, then all items are stored with probability 1. If an item isstored in memory, then there is some probability of a com-parison error at test, though this probability depends onwhether the trial is a change or same trial. If the test itemchanged, then a correct change response is made withprobability aj. If the test item is the same as the study item,then an incorrect change response is given with probabilitybj. If the test item at the probed location was not stored inmemory, the participant will be forced to guess. We esti-mate the probability that the participant guesses changein the jth change proportion condition to be gj. That is,we assume that participants will adjust their guessingbehavior in response to the base-rate of change trials(see Fig. 2).

The model outlined in Fig. 2 says that for the ith set sizeand the jth change proportion, the probability of making acorrect change response on a change trial, hcij, or an incor-

rect change response on a same trial, hsij, is given by

hcij ¼ diaj þ ð1� diÞgj

hsij ¼ dibj þ ð1� diÞgj

ð1Þ

Implementation of comparison errorsNote that both aj and bj are conditional upon the item

having successfully made it into memory. They do notaffect the probability that an item is stored. In essence,they are conditional hit and false alarms rates, respectively,and determine the probability that a correct response ismade given that an item is in memory. Note also that theprobability of a comparison error, given that an item is in

ts model for the change detection task.

Page 5: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81 71

memory, depends on the change proportion condition. Theprobability of a comparison error depends on change pro-portion because aj and bj represent errors in change andsame trials, respectively, and are simply a proportion ofchange responses. So, if the proportion of change trials islarge, then both aj and bj will be larger than if the propor-tion of change trials is smaller. This assumption allows par-ticipants to require different levels of evidence of a matchbetween test items and the contents of memory dependingon the probability of a change trial. Note also that becauseboth aj and bj can vary across change proportion, it is pos-sible to produce curvilinear ROC functions. This behavior istypically not possible in the discrete-slots framework.

We also assume that the probability of a comparisonerror is constant across all set size conditions. In otherwords, we assume that an item in memory is stored withthe same resolution, regardless of how many items werepresented to the participant. Such an assumptions followsdirectly from the simplest versions of slot-based models ofvisual working memory. However, the slots plus averagingmodel in Zhang and Luck (2008) proposes that items canbe stored with higher resolution when the capacity ofmemory is larger than N. The idea is that items may bestored in multiple slots, and these multiple representationscan be aggregated to yield better memory. As such, we alsofitted a version of our model that estimated one set of aand b parameters for the N ¼ 2 conditions, and anotherfor the N ¼ 5 and N ¼ 8 conditions. The results for thisalternative model were extremely similar to those pre-sented here, and certainly yield the same conclusions.We have placed the details and results for this model inthe Supplementary Materials.

1 For rate parameters we truncated the normal distributions between 0and 1. Another approach is to model individual differences as normallydistributed latent variables that are transformed into probabilities througha logistic function. We considered this latter approach also. The fits to thedata were virtually identical. We subsequently report the fits based uponthe original truncated normals. Plots and code for both model implemen-tations are available on the OSF.

Model inferenceWe estimate the parameters of the model separately

for each of the four different stimulus types. So, for themth participant, completing the task with the sth stimu-lus type, the relevant parameters are: ksm, ajsm, bjsm, andgjsm. Each individual produces a number of correct changeresponses, or hits hijsm, and a number of incorrect changeresponses, or false alarms f ijsm. We assume that aBinomial process generates these observed hits and falsealarms, such that

hijsm � Binðhcijsm;ncijsmÞ

f ijsm � Binðhsijsm;nsijsmÞ

ð2Þ

where ncijsm and ns

ijsm are the number of change and sametrials completed by an individual in each condition.

We use hierarchical Bayesian estimation to fit themodel to data. The hierarchical model assumes that theparameter values of individual participants are drawn froma population-level Normal distribution. For example, weassume that the mth individual’s capacity for the sth stim-ulus type comes from a Normal distribution with mean Ks

and standard deviation rK , such that ksm � NðKs;rKÞ. Forthe remaining parameters, we chose to set the standarddeviation of the population-level parameters to be r, inorder to constrain the model. For example, we assumedthat the mth individual’s guessing rate for the sth stimulus

type in the jth change proportion condition wasgjsm � NðGjs;rÞ.1 Since our focus was on parameter estima-tion we used vague priors for our model parameters. Weplaced a Beta distribution Beð1;1Þ prior on the population-level means for rates (Ajs;Bjs, and Gjs) and a Uniform Uð0;8Þprior on the population-level mean capacity (Ks). Finally,we placed a Uniform Uð:01;10Þ on both rK and r. Finally,all posterior distributions were obtained using JAGS by run-ning 6 chains of 5000 samples (after 500 burn-in samples).

Model validationBecause our conclusions depend entirely upon the

inferences drawn from our model, we ran a number of val-idation analyses. The details of these are reported in full inthe Supplementary Materials section. Briefly, we firstreport a simulation study that demonstrates that thedesign of our experiment allows us to estimate the param-eters of our model with sufficient precision to discriminatebetween the two focal hypotheses. That is, our designallows us to determine whether changes in performancefor more complicated stimuli are due to increased compar-ison errors or reduced capacity.

Second, we show that the comparison error parameterin the model captures what should be an increased rateof comparison errors in actual data. We report the resultsof an experiment in which the change between study andtest stimuli, when it occurred, was either small or large.Critically, since all of the stimuli were colored squares,there should be no difference in capacity estimates forsmall- and large-change trials. Rather, we should expectthe number of comparison errors to increase when thechange between study and test items is smaller. We fitour model to the small- and large-change trials separately,and found large changes in the comparison error parame-ter, while the capacity parameter (and all other parame-ters) remain relatively constant between the twoconditions. That is, we find that our model attributes thedifference between small- and large-change conditions tothe comparison error parameter, and not to the capacityparameter.

Model predictionsWe focus our inference on the means of the population-

level distributions, Ks, Ajs, Bjs, and Gjs. The values of Ks willtell us how many items of each stimulus type are stored inmemory. Ajs and Bjs will tell us about the rate of compar-ison errors that occur for each of the different types ofstimuli. The way that the K , A and B parameters changeas a function of stimulus complexity will allow us to distin-guish between our two hypotheses. According to Alvarezand Cavanagh (2004), we expect that K will be smallerfor more complex items, while Awh et al. (2007) would

Page 6: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

72 R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81

predict that Kwill be constant for all stimuli, but that A and1� B will be smaller for more complex stimuli.

Our primary analysis will be to estimate parametersfrom the data. However, to facilitate a more qualitativeinterpretation of our data, we plot the predicted hit andfalse alarm rates for a range of parameter settings of ourmodel in Fig. 3. In the top panels of the figure, we holdthe capacity parameter constant and change the probabil-ity of a comparison errors. If we imagine that moving fromthe top left to the top right panel is equivalent to increas-ing the complexity of a stimulus, then we see the pattern ofdata we expect based on Awh et al. (2007). The bottompanels of the figure demonstrates the effect of reducingthe capacity of working memory. The difference betweenthe top left and the bottom left panels shows the effectof increasing stimulus complexity according to Alvarezand Cavanagh (2004). The key difference between thetwo accounts is that when the capacity for complex stimuliis reduced, we expect to see a greater difference betweenthe small set size and the larger set sizes. That is, we expectperformance to remain good when there are only 2 stimuli,but participants should get much worse for the N ¼ 5 andN ¼ 8 conditions.

Fig. 3. Predicted hit and false alarm rates for four different parameter settingswould expect for simple color stimuli. The top right panel shows the predictiofigures reflect the predictions for complex stimuli that are consistent with Alvarcolor in this figure legend, the reader is referred to the web version of this artic

Results

Turning first to the model parameter estimates, Fig. 4plots the posterior distributions for the mean of thepopulation-level parameter distributions in our model.The posteriors are plotted as violin plots, which are com-posed of a central box-plot that is surrounded by asmoothed density. These plots represent the relative likeli-hood of the values that each parameter could take toaccount for the observed data. There are two critical resultsto take away from these posterior distributions. First, wefind that the capacity estimates, K, for letters, words, andcolors are consistent with the standard upper limit forshort term memory of 3–4 items. However, the capacityfor complex shapes is markedly lower, at around 2 items.This finding is consistent with previous arguments thatthere is a smaller capacity for more complex items(Alvarez & Cavanagh, 2004; Brady & Alvarez, 2015).

Second, we observe negligible changes in the compar-ison error parameters, A and B, across the different stimu-lus types. This result implies that items stored in memory,regardless of their complexity, yield the same probabilityof a correct or incorrect change response. More specifically,

of our model. The parameters in the top left panel reflect those that onens for more complex stimuli according to Awh et al. (2007). The bottomez and Cavanagh’s (2004) account. (For interpretation of the references tole.)

Page 7: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81 73

we see that changes to items that make it into memory areidentified with near perfect accuracy.

Finally, though not of particular theoretical relevance,we see a typical pattern of guessing behavior. Participantsappear to do something akin to probability matching,where they increase the rate at which they guess changeas the proportion of change trials increases (as opposedto the more rational approach of maximizing, such asalways guessing change when the probability of a changetrial is greater than 50%). This pattern is observed for allstimulus types, though there seems to an inflated rate ofresponding change to color stimuli.

To assess the fit of the model to the data we generatedposterior predictive distributions. We do this by simulatinghit and false alarm rates, defined by Eq. (1), using the val-

Fig. 4. Posterior distributions for the population-level parameter means. The toppanel displays the mean comparison error probability across both stimulus typedisplays the guessing rates across each level of change proportion for each stim

ues contained within the population-level posterior distri-butions. So, we generate distributions of hits and falsealarms based upon our model’s most likely parameter val-ues. We then compare the posterior model predictions tothe observed data. We estimated the observed hit and falsealarm rates by fitting a hierarchical data model that simplyestimates a binomial rate across all conditions. Thisensures that the observed rates have also undergone theshrinkage imposed by hierarchical models, as did our com-putational model. As before, the model assumes thatindividual-participant hit and false alarm rates are drawnfrom population-level normal distributions.

Fig. 5 displays the empirical data (in white) and themodel fits (in color) for each stimulus condition. Lookingfirst at the data, we see that the shape stimuli show a dif-

panel displays the capacity means across each stimulus type; the middleand change proportion for change, A, and same, B, trials; the bottom panelulus type.

Page 8: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

2 It is worth noting that the pattern of parameter estimates, and theconclusions from this experiment, are practically identical when vaguepriors are used to analyze the data from Experiment 2. All of the materialsnecessary to carry out this analysis are available on the OSF.

74 R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81

ferent qualitative pattern to the other stimuli. Comparedwith the letter, word, and color stimuli, the shapes showa smaller difference between the N ¼ 5 and N ¼ 8 condi-tions than between the N ¼ 2 and N ¼ 5 conditions. Recallthat this is the hallmark pattern of a reduction in capacity,based on Fig. 3.

The posterior predictive distributions in Fig. 5 suggestthat the model provides a reasonable fit to the data. Themodel is able to capture the general pattern in theobserved data, in terms of the effect of set size, change pro-portion, and stimulus type. Though the model certainlydoes not capture all of the patterns in the data, recall thatwe aim to use this model as a measurement tool. As such,we are more tolerant of some of the misfits, as we do notthink that these speak directly to the measurement ofcomparison errors and capacity. For example, the modelfails to capture the false alarm rates when p ¼ 0:3 for colorand shape stimuli. However, to foreshadow, we do not seesuch patterns in Experiment 2, suggesting that such misfitsmay not be a large problem.

Discussion

Our results indicate that participants stored fewer ofthe complex shape items than they did letters, words, orcolors. However, interestingly, we find that participantsmade almost no comparison errors with any of the stimulithat were contained in memory. These results differ fromprevious suggestions that comparison errors provide thekey limit upon capacity estimates for complex stimuli(Awh et al., 2007; Barton et al., 2009). That is, our resultsare more in line with the conclusions of Alvarez andCavanagh (2004). What’s more, we are able to draw theseconclusions while having allowed for the possibility ofcomparison errors.

We now demonstrate the same empirical effect in a sec-ond change detection experiment. In the previous experi-ment we had participants complete a change detectiontask with stimuli that varied in complexity. In the changedetection paradigm, participants are presented with allstimuli simultaneously, and so are allowed to choosehow to allocate items into memory. However, participantsonly have a limited amount of time during which itemencoding can be preformed. For example, one could reasonthat there was not enough time to encode the complexstimuli, and so the number of items that could be storedwas limited by encoding limitations.

In Experiment 2 we sought to replicate our previousfindings, though allow for more time to view the studyarray when there are more items in the array. Specifically,we allowed 250 ms encoding time per item. Accordingly,when participants were shown an array of five items theyhad a total of 1250 ms (5 � 250 ms) encoding time. Addi-tionally, we also incorporate the knowledge we havegained from Experiment 1 to update our computationalmodel. When fitting the model to the data from Experi-ment 1 we had very little substantive knowledge aboutwhat values the model parameters might take. As a resultwe sought only to estimate the model parameters and soplaced uninformative priors on the population-levelparameter means.

The resulting posterior distribution reflects the reallo-cation of credibility to the most likely parameter values.That is, it encapsulates what has been learned from theobserved data and how we should revise our beliefs aboutthe value of a specific model parameter. Moreover, thisnew information can be used to update the model specifi-cation. Specifically, we can use the posterior distributionsestimated in Experiment 1 as priors on the population-level model parameters in Experiment 2 (Kary, Taylor, &Donkin, 2016). Using informed priors also allows us to per-form Bayesian inferential tests on differences in the modelparameters across conditions.

Experiment 2

Method

Twenty new participants were recruited into Experi-ment 2. All methodological details regarding participants,design, stimuli, and modeling were the same as in Experi-ment 1. The only procedural difference was the length oftime the study items were presented to the participant.We allowed 250 ms encoding time per item within thestudy array. The total presentation time was thus equalto the number of study items in the array multiplied by250 ms. Accordingly, for the N = [2,5,8] set size conditions,total presentation time was equal to 500 ms, 1250 ms, and2000 ms, respectively.

Model updating

Our model is hierarchical, and so participant-levelparameters are draws from population-level normal distri-butions. Each population-level distribution has a mean anda standard deviation parameter; for example, thepopulation-level capacity parameter for a given stimulustype has a mean of Ks and a standard deviation of rK . Recallthat in Experiment 1, we placed uninformative priors onthe mean and standard deviations of these normal distri-butions. For example, we placed a Uniformð0;8Þ prior onthe mean of the population-level capacity parameter, Ks.In Experiment 2, we will instead use informativepriors, based upon the posterior distributions fromExperiment 1.2

We need an informative prior for the mean and stan-dard deviation of each population-level normal distribu-tion. First, for the means, we defined our informativepriors by fitting Normal and Beta distributions to thepopulation-level mean posterior distributions shown inFig. 4. We assumed that the population-level mean capac-ity parameter, Ks would be normally distributed. So, foreach stimulus type, we fitted normal distributions to theposterior distributions shown in the top row of Fig. 4,and used the maximum-likelihood estimates as our prior.For example, for Letter stimuli, the best-fitting normal dis-tribution to the posterior distribution plotted in the far left

Page 9: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

Fig. 5. Posterior predictive distribution for the comparison error model across each stimulus condition in Experiment 1. Each color band corresponds to adifferent set size condition: red = set size 2; green = set size 5; and blue = set size 8. Within each band there are three distinct clusters, each of whichcorrespond to the different change proportion conditions. The white data points are the mean hit and false alarms rates for the observed data, with 2standard deviations indicated. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81 75

of the top row of Fig. 4 had a mean of 3.3 and astandard deviation of 0.35, and so our prior on KL wasNð3:3;0:35Þ. The prior distributions for all population-level mean capacity parameters are given in the top rowof Table 1.

The same basic approach applied when estimating thedistributions placed on the population-level rate parame-ters (i.e., Aj;Bj, and Gj), only here the parameters must liein the interval ½0;1�. We thus used a Beta distribution foreach of these parameters. So, for the jth change proportioncondition we assumed that Aj, Bj, and Gj were distributedas a Beða; bÞ, where a and b were estimated by fitting theposterior distributions from Experiment 1. The full set ofdistributions used to inform the population-level meanparameters are contained in Table 1.

We also used informative priors for the population-level standard deviation parameters. We used normal dis-tributions to characterize the posterior distributions forthe variance parameters. We required only two such pri-ors: one for the standard deviation on the rate parameters,r, and one for the standard deviation of the capacityparameter rK . For the variance of the rate parameters,the updated prior was r � Nð:11; :01Þ, whereas the vari-ance for the capacity parameter was rK � Nð1:43; :18Þ.

Bayesian hypothesis tests

Weapplied the Savage-Dickey density ratio test to assessthe differences between the capacity posterior distribution,denoted as d, for each stimulus type. The Savage-Dickey test

Page 10: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

Table 1Prior distributions placed on Ks for Experiment 2.

Parameter Stimulus type

Letter Word Colour Shape

K Nð3:30; :35Þ Nð2:67; :40Þ Nð3:05; :35Þ Nð1:15; :46ÞA:3 Beð51:55;1:08Þ Beð25:41;1:23Þ Beð51:55;1:10Þ Beð26:17;1:30ÞA:5 Beð69:64;1:08Þ Beð46:53;1:13Þ Beð71:16;1:06Þ Beð28:09;1:45ÞA:7 Beð59:33;1:08Þ Beð67:30;1:05Þ Beð82:11;1:06Þ Beð38:00;1:16ÞB:3 Beð1:10;67:59Þ Beð1:05;77:23Þ Beð1:07;66:48Þ Beð1:11;52:03ÞB:5 Beð1:08;60:85Þ Beð0:99;48:10Þ Beð1:12;50:84Þ Beð1:29;31:13ÞB:7 Beð1:25;31:83Þ Beð2:13;24:77Þ Beð1:47;28:99Þ Beð1:29;27:17ÞG:3 Beð71:05;132:52Þ Beð66:63;145:42Þ Beð107:34;132:89Þ Beð92:76;179:66ÞG:5 Beð120:01;95:74Þ Beð117:79;112:19Þ Beð140:81;90:64Þ Beð160:33;129:14ÞG:7 Beð112:97;33:19Þ Beð147:54;70:13Þ Beð82:74;16:82Þ Beð161:35;56:93Þ

Note: Nðl;rÞ denotes Normal distribution and Beða; bÞ denotes Beta distribution.

76 R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81

uses the prior andposterior distributionon d to testwhetherthe difference is reliably different from zero. Accordingly,the test corresponds to a conventional two-tailed hypothe-sis test insofar as the null hypothesis assumes that there isno difference between the capacity posterior distributions,H0; d ¼ 0, and the alternative hypothesis assumes that thereis a difference, H1; d– 0. However, unlike conventionalhypothesis tests we can obtain evidence for both the nulland alternative hypothesis. This permits an evaluation ofthe relative odds that the data were generated by the nullhypothesis relative to the alternative.

The prior placed on the population-level differencebetween stimulus capacity, pðdÞ, was simply the differencebetween the priors placed on the population-level meancapacity for each stimulus type (i.e., Ks). So, as an example,for the capacity differences between letters and colors,denoted as pðdLCÞ ¼ pðKLÞ � pðKCÞ, where pðKLÞ and pðKCÞare the informative priors as described in the previous sec-tion (also see Table 1), samples were first drawn from pðKLÞand pðKCÞ and then the difference between each pair ofsamples was calculated. This procedure yields the priorfor the difference in capacity between color and letterstimuli. Similarly, the posterior distribution on the capac-ity differences is the difference between the obtained pos-terior distributions for KL and KC . Finally, we used alogspline density estimator to obtain the prior and poste-rior density estimates of the capacity differences(Wagenmakers, Lodewyckx, Kuriyal, & Grasman, 2010).

The Savage-Dickey density ratio is the ratio of the priorand posterior densities at the given point of interest. In thepresent case, the point of interest is d ¼ 0. This ratio iscalled the Bayes factor and it quantifies the degree of sup-port for the null hypothesis relative to the alternative,denoted here as BF01. A Bayes factor greater than 1 pro-vides support for the null hypothesis, whereas a Bayes fac-tor less than 1 indicates support for the alternative.Conveniently, we may also express the degree of supportfor the alternative hypothesis relative to the null hypothe-sis in the following way: BF10 ¼ 1=BF01.

Results

Fig. 6 displays the posterior distributions of the modelparameters for each stimulus type. The immediate impres-

sion from the figure is that the population-level meansexhibited a very similar pattern to the estimates fromExperiment 1. There was, however, a slight increase inthe capacity estimates across stimulus type. However, weagain see that the capacity for complex shapes was smal-ler, at around 2 items. Also, the comparison error parame-ters again remain fairly constant across all stimuli types.Our results also suggest that, while ostensibly participantsmay have been able to use the additional presentation timeto encode a little more information into memory, they donot appear to use that extra time to remember a largenumber of shapes, at a cost of more comparison errors.That is, even with more time to encode items, the numberof complex items that can be stored is smaller than that forsimpler items, and this limit is below the assumed fixedcapacity of 3–4 items. Crucially, the results from Experi-ment 2 replicate the results reported for Experiment 1.

Turning next to the Bayesian hypothesis tests, ourimmediate inferential focus was on the differencesbetween the capacity estimates for complex shapes rela-tive to all other stimulus types. The Bayes factors for allcapacity differences are displayed in Table 2, though wepresent only the prior and posterior distributions for theshape stimuli comparisons in Fig. 7. It is worth noting thatbecause we used informative priors, we expect that capac-ity is different for shapes and other stimuli – the prior dis-tributions fall to the right of the dotted horizontal line inFig. 7. The posterior distributions from Experiment 2 indi-cate a solidification of such beliefs, such that the posteriordistributions shift slightly further away from 0. The poste-rior distributions also have smaller variance than the pri-ors, indicating increased certainty about the differencebetween the capacity for shape and other stimuli. Whenthe difference between prior and posterior at 0 was quan-tified via a Bayes factor, we found that the alternativehypothesis was 608 times, 62 times, and 16 times morelikely than the null model, which expects no difference incapacity, respectively (cf: Table 2). These results providedecisive evidence for capacity differences between theshape and other stimuli.

Finally, we performed a posterior predictive check toassess the fit of the model predictions to the empiricaldata. The empirical data (white) and model fits (color) forExperiment 2 are displayed in Fig. 8. The data exhibited a

Page 11: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

Fig. 6. Posterior distributions for the population-level parameter means. The top panel displays the capacity means across each stimulus type; the middlepanel displays the mean comparison error probability across both stimulus type and change proportion for change, A, and same, B, trials; the bottom paneldisplays the guessing rates across each level of change proportion for each stimulus type.

Table 2BF10 for the posterior difference between capacity estimates acrossstimulus type.

Letter Word Colour Shape

Letter 0 1.37 2.68 608.84Word – 0 0.57 62.48Colour – – 0 15.97Shape – – – 0

R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81 77

very similar pattern to Experiment 1. The model againappeared to capture the general pattern of results acrossboth set size and stimulus type. Most critically, the modeldescribes the attenuation in the hit and false alarms rates

for set sizes 5 and 8 for shape stimuli. Analogous to Exper-iment 1, the model did not perfectly capture all the empir-ical data patterns. For example, we see that the modelpredicts worse performance than was observed for letterand word stimuli. This mis-prediction is likely becauseperformance for letters and words was better in Experi-ment 2 than Experiment 1, and our model used informa-tive priors that were based on Experiment 1. That said,we do not think that these mis-predictions necessarilythreaten the validity of our conclusions about capacityand comparison errors. For example, with the exceptionof the set size 2 condition for shapes, none of the misfitsare systematic across the two experiments. That is, whilethe model may under predict performance in a particular

Page 12: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

Fig. 7. Top Panel: Prior on the difference between the population-level mean capacity estimates across stimulus type in Experiment 2. These priordistributions are derived by taking the difference between the informative prior distributions placed upon the population-level means. Bottom Panel:Posterior distribution of the difference between the population-level mean capacity estimates across stimulus type. These distributions are derived bytaking the difference between the posterior population-level mean estimates across stimulus type having conditioned on the data from Experiment 2.

78 R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81

condition in Experiment 1, no such misfit appears in Exper-iment 2. Furthermore, the misfits relate less to the patternof attenuation (or squashing) in the rates for the set size 5and 8 conditions for the shape stimuli, which was the sig-nature pattern of smaller capacity we highlighted in Fig. 3.

General discussion

Our results suggest that changes in stimulus complexityproduce negligible changes in the rate of comparisonerrors across stimulus types. Instead, the present findingsindicate that item capacity is sensitive to changes in thecomplexity of the stimuli, and suggest that the originalinterpretation of the result in Alvarez and Cavanagh(2004) was correct – observers store fewer complex stim-uli. Specifically, we find that when people undergo changedetection tasks using perceptually simple stimuli (i.e, let-ters, words, and colors), capacity estimates tend to hoverabout the accepted item limit of approximately 3–4 items.However, when participants are subjected to stimuli thatare more complex (i.e., abstract polygons) we find capacityestimates drop to between 1 and 2 items. Our results are

thus more consistent with the conclusions made byAlvarez and Cavanagh (2004) and Brady and Alvarez(2015) regarding capacity limits for complex stimuli.

Our conclusions differ from those of Awh et al. (2007),who showed that cross-category change detection for com-plex stimuli was as good as change detection for simplestimuli. However, our experiment differs from that ofAwh et al. (2007), in that participants only ever neededto remember items from just one stimulus category. Inthe Awh et al. (2007) experiments, perhaps when partici-pants were asked to remember items from multiple cate-gories, they encoded only enough information to be ableto identify the particular type of stimulus. In that way, par-ticipants would have been able to make accurate cross-category decisions, but would have made more errors onwithin-category decisions (Brady & Alvarez, 2015). Theparticipants in our experiments, however, appear to haveencoded more information about a given stimulus, allow-ing for more accurate within-category decisions, but werethus only able to store a smaller number of those items.Such an explanation for our results suggests that partici-pants are capable of allocating their mnemonic resourcein a flexible manner. That is, they can store fewer items

Page 13: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

Fig. 8. Posterior predictive distribution for the comparison error model across each stimulus condition in Experiment 2. Each color band corresponds to adifferent set size condition: red = set size 2; green = set size 5; and blue = set size 8. Within each band there are three distinct clusters, each of whichcorrespond to the different change proportion conditions. The white data points are the mean hit and false alarms rates for the observed data, with 2standard deviations indicated. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81 79

but with higher resolution, or more items with poorerprecision.

Our results, on the surface, appear inconsistent withrecent results reported in Jackson, Linden, Roberts,Kriegeskorte, and Haenschel (2015). Jackson et al. (2015)independently manipulated the similarity and complexityof visual stimuli, and found that it was only the similaritybetween stimuli, but not complexity, that increases therate at which errors occur in a change detection task. How-ever, we note that the capacity estimates for the stimuliused in this study hovered about 1.5–2 items, lower thanthe usual capacity of 4 items. As such, despite manipulat-ing the relative complexity of their stimuli, we expect thatJackson et al. used a fairly complex set of visual stimuli. Inother words, we expect that their simplest stimuli were as

difficult to remember as the stimuli that are generally con-sidered to be visually complex stimuli.

Our model makes no theoretical distinction betweenwhether comparison errors occur due to an increase insimilarity between items, or because the items becomemore ‘complex’. However, this point is irrelevant withrespect to our fundamental conclusion. That is, if the com-plex stimuli lead our participants to commit comparisonerrors they might have occurred because the shapes weremore similar to one another, or because they were morevisually complex. Our model, however, is agnostic to suchdescriptions; moreover, we found no evidence for thesetypes of errors in our data. If comparison errors did existin the data, our model has a demonstrated capacity to mea-sure them. Simply, regardless of how one might describe

Page 14: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

80 R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81

the process for comparison error, our model classified thepoorer performance with complex stimuli as being due tofewer items being stored, and not because of an increasein comparison errors.

It is worth being explicitly clear that we are not propos-ing the model we used here as a theoretical account ofvisual working memory capacity. Rather, this model wasdeveloped and used to distinguish between two very clearand distinct explanations for an empirical effect. Themodel was developed as a direct instantiation of Awhet al.’s (2007) alternative explanation of the empiricalresult reported in Alvarez and Cavanagh (2004). We thenused this model to learn that the original interpretationof the result in Alvarez and Cavanagh (2004) was correct– observers store fewer complex than simple stimuli. Thereremains, however, a question of what particular processgives rise to such differences in capacity estimates.

One potential explanation may be the degree withwhich each stimulus type can be verbally encoded. Whileit is surely the case that the abstract shapes used hereare not amenable to verbal encoding, it might be the casethat categorical labels supplement encoding and improveperformance with the color stimuli. For example, capacityfor color stimuli was virtually identical to the capacityfor letters and words. Accordingly, the capacity observedfor complex shapes suggests that perhaps the true capacityof visual working memory is closer to 1–2 items and thatsimpler stimuli readily exceed this limit via the recruit-ment of verbal encoding mechanisms.

With respect to this line of reasoning, a recent paper bySense, Morey, Prince, Heathcote, and Morey (2016) demon-strated that verbal recoding of visual stimuli is unlikely tooccur during change detection tasks. Sense and colleaguesfound that whether articulatory suppression was includedor not during the study phase had no differential effectupon change detection performance with simple colors.Additionally, the null effect persisted regardless of whetherstudy items were presented in an array or sequentially. Astate-trace analysis further suggested that there was nointeraction between articulatory suppression and perfor-mance. That is, performance could be adequately summa-rized by invoking a single latent dimension relating tovisual memory. Sense et al. (2016) conclude that if individ-uals had employed any verbal strategies then these strate-gies had no demonstrable effect upon performance. In lightof these findings it is unlikely that the differences betweencapacity estimates reported here are merely an artifact ofcomparing stimuli that differ in how well they can be ver-balized. Instead, the complexity of the stimulus imposes alimit that is fundamentally different to the limit imposedby simpler items.

Our belief is that visual working memory is a continu-ous resource, but that not all items may be encoded intomemory on a given trial (Donkin, Kary, Tahir, & Taylor,2016; Donkin, Nosofsky, Gold, & Shiffrin, 2013; Donkin,Tran, & Nosofsky, 2014; Nosofsky & Donkin, 2016). Thatis, though we have presented a model that is firmly rootedwithin the ‘‘slots” framework, our position is that thesechanges in capacity estimates point to the allocation of aflexible resource. Simple stimuli require less of thatresource, and so more of those stimuli can be encoded.

Complex stimuli, however, require a larger amount ofmnemonic resource to be encoded with the same preci-sion, and so fewer items can be remembered. It is interest-ing to note that such an account does allow for thepossibility that participants would allocate less resourceto more complex items, and so be able to remember morecomplex items. Had participants used such an encodingstrategy, then our results may have turned out to havebeen in agreement with Awh et al. (2007). Thankfully, itappears that participants prefer to have high-precisionrepresentations for fewer complex items, thus allowingus to discriminate between our two competinghypotheses.

An alternative explanation for our results may be pro-vided by a ‘‘slots + averaging” account. According to theslots + averaging theory, it is possible to encode an iteminto multiple slots. An item that has been stored in multi-ple slots will yield a more precise representation. Onecould argue that participants choose to allocate multipleslots to more complex items, in order to gain a precise rep-resentation of those items. Though possible, we think thatthe distinction between slots and continuous resourcemodels becomes largely semantic once such possibilitiesare introduced. If one allows resource to be allocatedwithin a slots framework then the resolution, or granular-ity, of that resource is simply a matter of degree. Further,slots + averaging models have not fared well when theirquantitative predictions have been contrasted withcontinuous resource alternatives (e.g., van den Berg, Awh,& Ma, 2014; van den Berg, Shin, Chou, George, & Ma,2012). Moreover, our Supplementary Materials containsan instantiation of this type of model. We found that themodel provided virtually identical results and loadedthe poorer performance for the complex stimuli onto thecapacity parameter. The model also yielded noisierposterior predictions and did not provide an improveddescription of the empirical data.

In summary, we presented a model that allowed for thesimultaneous estimation of capacity and comparisonerrors in a simple working memory task. Across two suchchange detection experiments we found that participantswere able to store fewer complex items than simplerobjects. Our results are inconsistent with the notion of asingle, fixed capacity of visual working memory.

Acknowledgements

Thank you to Robert Nosofsky, with whom the datafrom our pilot experiment was collected. All data and codeis available on the OSF: https://osf.io/rhmu2/. ChrisDonkin’s contribution to this research was supported bythe Australian Research Council (DP130100124;DE130100129).

Appendix A. Supplementary material

Supplementary data associated with this article can befound, in the online version, at http://dx.doi.org/10.1016/j.jml.2016.09.004.

Page 15: Journal of Memory and Language - UNSW · Experiment 1 In Experiment 1 we used the single-probe change detection paradigm (Cowan et al., 2005). Participants are presented with an array

R. Taylor et al. / Journal of Memory and Language 93 (2017) 67–81 81

References

Alvarez, G. A., & Cavanagh, P. (2004). The capacity of visual short-termmemory is set both by visual information load and by number ofobjects. Psychological Science, 15(2), 106–111.

Awh, E., Barton, B., & Vogel, E. K. (2007). Visual working memoryrepresents a fixed number of items regardless of complexity.Psychological Science, 18, 622–628.

Barton, B., Ester, E. F., & Awh, E. (2009). Discrete resource allocation invisual working memory. Journal of Experimental Psychology: HumanPerception and Performance, 35, 1359–1367.

Brady, T. F., & Alvarez, G. A. (2015). No evidence for a fixed object limit inworking memory: Spatial ensemble representations inflate estimatesof working memory capacity for complex objects. Journal ofExperimental Psychology: Learning, Memory, and Cognition, 41, 921.

Cowan, N., Blume, C., & Saults, J. S. (2013). Attention to attributes andobjects in working memory. Journal of Experimental Psychology:Learning, Memory, and Cognition, 39, 731–747.

Cowan, N., Elliott, E. M., Saults, J. S., Morey, C. C., Mattox, S., Hismjatullina,A., et al. (2005). On the capacity of attention: Its estimation and itsrole in working memory and cognitive aptitudes. Cognitive Psychology,51, 42–100.

Donkin, C., Kary, A., Tahir, F., & Taylor, R. (2016). Resources masqueradingas slots: Flexible allocation of visual working memory. CognitivePsychology, 85, 30–42.

Donkin, C., Nosofsky, R. M., Gold, J., & Shiffrin, R. M. (2013). Fixed slotsmodels of visual working memory response times. PsychologicalReview, 120, 873–902.

Donkin, C., Tran, S., & Nosofsky, R. M. (2014). Landscaping analyses of theROC predictions of discrete-slots and signal-detection models ofvisual working memory. Attention, Perception & Psychophysics, 76,2103–2116.

Fougnie, D., Asplund, C. L., & Marois, R. (2010). What are the units ofstorage in visual working memory? Journal of Vision, 10, 1–11.

Fukuda, K., Awh, E., & Volgel, E. K. (2010). Discrete capacity limits in visualworking memory. Current Opinion in Neurobiology, 20, 177–182.

Hardman, K. O., & Cowan, N. (2015). Remembering complex objects invisual working memory: Do capacity limits restrict objects orfeatures. Journal of Experimental Psychology: Learning, Memory, andCognition, 41, 235–247.

Jackson, M. C., Linden, D., Roberts, M. V., Kriegeskorte, N., & Haenschel, C.(2015). Similarity, not complexity, determines visual workingmemory performance. Journal of Experimental Psychology: Learning,Memory, and Cognition, 41, 1882–1892.

Kary, A., Taylor, R., & Donkin, C. (2016). Using Bayes factors to test thepredictions of models: A case study in visual working memory.Journal of Mathematical Psychology, 72, 210–219.

Kucera, H., & Francis, N. (1967). Computational analysis of present-dayamerican english. Brown University Press.

Luck, S. K., & Vogel, E. K. (1997). The capacity of visual working memoryfor features and conjunctions. Nature, 390, 279–281.

Ma, W. J., Husain, M., & Bays, P. M. (2014). Changing concepts of workingmemory. Nature Neuroscience, 17, 347–356.

Morey, R. D., Morey, C. C., Brisson, B., & Tremblay, S. (2012). A criticalevaluation of c as a measure of mnemonic resolution. Journal ofExperimental Psychology: Human Perception and Performance, 38(4),1069.

Nosofsky, R. M., & Donkin, C. (2016). Response-time evidence for mixedmemory states in a sequential-presentation change-detection task.Cognitive Psychology, 84, 31–62.

Oberauer, K., & Eichenberger, S. (2013). Visual working memory declineswhen more features must be remembered for each object. Memory &Cognition, 41, 1212–1227.

Olson, I. R., & Jiang, Y. (2002). Is visual short-term memory object based?rejection of the ‘‘strong-object” hypothesis. Perception &Psychophysics, 64, 1055–1067.

Rouder, J. N., Province, J. M., Swagman, A. R., & Thiele, J. (submitted forpublication). From ROC curves to psychological theory. http://pcl.missouri.edu/sites/default/files/p_6.pdf, http://pcl.missouri.edu/sites/default/files/sup.pdf.

Scolari, M., Vogel, E. K., & Awh, E. (2008). Perceptual expertise enhancesthe resolution but not the number of representations in workingmemory. Psychonomic Bulletin & Review, 15(1), 215–222.

Sense, F., Morey, C. C., Prince, M., Heathcote, A., & Morey, R. D. (2016).Opportunity for verbalization does not improve visual changedetection performance: A state-trace analysis. Behavior ResearchMethods, 1–10.

Umemoto, A., Scolari, M., Vogel, E. K., & Awh, E. (2010). Statistical learninginduces discrete shifts in the allocation of working memory resources.Journal of Experimental Psychology: Human Perception and Performance,36(6), 1419.

van den Berg, R., Awh, E., & Ma, W. J. (2014). Factorial comparison ofworking memory models. Psychological Review, 121, 129–149.

van den Berg, R., Shin, H., Chou, W. C., George, R., & Ma, W. J. (2012).Variability in encoding precision accounts for visual short-termmemory limitations. Proceedings of the National Academy of Sciences,109, 8780–8785.

Vogel, E. K., Woodman, G. F., & Luck, S. J. (2001). Storage of features,conjunctions, and objects in visual working memory. Journal ofExperimental Psychology: Human Perception and Performance, 27,92–114.

Wagenmakers, E. J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010).Bayesian hypothesis testing for psychologists: A tutorial on thesavage–dickey method. Cognitive Psychology, 60, 158–189.

Wilken, P., & Ma, W. J. (2004). A detection theory account of changedetection. Journal of Vision, 4, 1120–1135.

Zhang, W., & Luck, S. J. (2008). Discrete fixed-resolution representations invisual working memory. Nature, 453(7192), 233–235.