A stochastic model for free recall

  • Published on

  • View

  • Download

Embed Size (px)


  • PSYCHOMETRIKA--VOL. 27, NO. 2 JUNE, 1962





    J . E . I41EITH SMITH


    A statistical model for verbal learning is presented and tested against experimental data. The model describes a Markov process with a realizable absorbing state, allowing complete learning on some finite trial as well as imperfect retention prior to this trial.

    This paper describes a probabilistic model for verbal learning. The reason for adding another such model to the number already available [2] is that none of the earlier models is adequate to describe some recent experimental data on free recall. In our experiment, subiects would look at a series of 48 words, presented one at a time, then attempt to recall them in any order they chose. Most of the nine subjects recited 12 such lists, each for six trials. The total number of lists recited was 105, and the total number of subject- words was therefore 5040. The data were self-consistent and reliable enough to make us dissatisfied with the models that would not fit them, and to motivate us to find a better alternative.

    The experiment, described in detail in [9], was similar to one reported by Brunet, Miller, and Zimmerman [1]. It was in fact performed partly in order to replicate their data, which have been fitted with linear-operator and set-theoretical models by Bush and Mosteller [3] and by Miller and McGill [7], respectively. Our experimental procedure differed from that of Brunner, Miller, and Zimmerman in two ways: our subjects each learned several lists of words rather than lust one, and instead of listening to the words they looked at them. We have not attempted to discover which of these variations may be responsible for certain differences in our data.

    Most conspicuous among these differences is the proportion of words recalled for the first time on each trial. The stochastic models mentioned

    *This work was carried out while the author was at Lincoln Laboratory, Massa- chusetts Institute of Technology.

    tOperated with support from the U. S. Army, Navy, and Air Force.



    ubo'e predict a geometric distribution for these data. The variance of the distribution that we observed, however, was so large as to render a geometric distribution implausible. At first we thought that this unexpected finding might stem from the differences (i) between our subjects in their ability to learn words or (ii) between our words in their ability to be learned. The excessive variance remained, however, even when each subject's data were analyzed separately. Moreover, no relation was found between the average number of the trial on which a word was first recalled and its frequency of usage in printed English, as estimated by Thorndike and Lorge [8].

    We were thus unable to find any obvious artifactual basis for the dis- crepancy between our distribution of trials to first recall and that predicted by the earlier models. Therefore we decided to suppose that our distribution was in fact generated by a process different from the one-parameter process assumed by the latter. We did not have to search far in order to find an Mternative hypothesis: a straightforward two-stage process was sufficient to account for the distribution of first recalls. A third parameter was subsequently found necessary to describe retention after initial recall.

    The following description will be expressed in terms of three hypothetical processes that we have found helpful in understanding the data. These processes were suggested by the three parameters of the model. No attempt. has been made to identify them with any classical psychological functions: they cannot be differentiated empirically until we discover how small varia- tions in the experimental procedure affect the data and thus the parameters of the model. The reader should bear in mind, however, that the empirical significance of the model's parameters is not tested by how well the model describes the present set of data. We shall here restrict ourselves entirely to the descriptive problem and leave open the question of the model's generMity. We present the following interpretation of the three parameters principally for its heuristic value.

    One of the three processes, which we call labeling, occurs with probability X on any trial, and is irreversible. Labeling, in other words, need occur only once in order for a word to be recalled for the first time. Another process, selecting, is assumed to occur with probability ~ on each trial. It is as though select.ing a word were to rehearse it, and labeling it, to find a mnemonic association for it. Blind rehearsal is ineffective, but once a word has acquired a mnemonic tag it is recalled after every trial on which it is rehearsed (or attended to, or selected). A word may be either labeled, or selected, or both, on any trial. In order to be recalled for the first time after a given trial, the word must have been selected on that trial. It must also have been labeled on that trial, or it must have been labeled (but not yet selected) on some previous triM. A word that is selected on trim t, with probability ~, but not yet labeled, with probability (1 -- h)', will not yet be recalled. On the other hand, a word that is labeled on trial t, with probability ),(1 - X)'-~, will


    TABLE 1

    Relative Frequency of Recall as a Function of the Number of Previous Consecutive Recalls

    Previous consecutive recalls 1 2 3 4 5 Proportion recalled O. 797 0.879 0.914 0.968 0.958

    not be recalled until it is selected, with probability ~, either on that trial or on some subsequent trial. This word is then recalled again after every trial on which it is selected.

    This formulation accounts for the distribution of trials to first recall. The model as it stands, however, implies that the conditional probability of recalling an item once it has been recalled should simply be z, the probability of its being selected, no matter how often this particular item has been recalled. (The model so iar implies in addition thai the total proportion of items recalled on each successive trial should approach a value not of unity but also of z.) From Table 1 it is clear, however, that the probability of recall is an increasing function of the number of previous consecutive recalls. Con- sequently, a third process has to be invoked. This process we call fixing. It is assumed that, on any trial on which an item is recalled, it is fixed with probability ~b. Once fixed, this item will be recalled on every subsequent trial, regardless of whether it is selected. Before it has been fixed, on the other hand, it must be selected in order to be recalled. Thus it is as thoogh each item in a list will sooner or later become permanently fixed in the learner's memory. It will then always be recalled even when it has not been rehearsed on a particular trial.

    A word may accordingly be in any one of five states after a given trial.

    1. It has not yet been labeled. 2. It has been labeled but not yet selected. 3. It has been labeled and was selected (and therefore recalled) on this

    trial, but has not yet been fixed. 4. It was recalled but not fixed on some previous trial, and it was not

    selected (and therefore not recalled) on this trial. 5. It has been fixed, either on this trial or on some previous trial.

    All words are initially in state 1. All of them eventually end up in state 5. As far as the formal properties of the model are concerned, state 4 is

    exactly equivalent to state 2. In applying the model, however, it will be necessary to distinguish between a word that has been forgotten and one that has not yet been recalled. We have therefore distinguished between these two formally identical states.

    The five states are represented by the numbered circles in Fig. i. The



    X~(1-c x(1-o-) o-q,



    FIGUm~ 1 Directed Graph of the Hypothetical Learning Process

    The states are defined as follows: (1) not labeled (not yet recMled), (2) labeled, not processed (not yet recalled), (3) processed, not stored (recalled), (4) not processed, not stored (forgotten), (5) stored (recalled).

    arrows here denote the paths open to a word on a particular trial. According to this diagram, a word may go from state 1 (not labeled) to state 2 (labeled but not selected), to state 3 (recalled but not fixed), or to state 5 (fixed). I t may similarly go from state 2 to state 3 or state 5. A word in state 3 (re- called but not fixed) may go to state 4 (forgotten) or to state 5 (fixed). A word in state 4 (forgotten) may move to state 3 (recalled) or to state 5 (recalled and fixed). A word in any one of states 1 through 4, furthermore, may also remain in that state on a given trial. A word in state 5 always remains in this state. The present model, then, describes a five-state Markov process with an absorbing state (state 5). All words eventually reach this state, which represents perfect retention. For a general discussion of Markovian models in psychology, see Miller [6].


    According to the present hypothesis, the proportion of words that have not yet been labeled (and have therefore not yet been recalled) on trial i is given by

    (1) Pia1 = (1 - A)P+l,l . The proportion that is labeled but not yet selected, and thus not yet recalled, by this trial is

    (2) Pi2 = (1 - a)Pi-1,2 + (1 - 4Pi-,,I . The proportion recalled on this trial but not yet fixed is

    (3) Pi,3 = ~ ( 1 - +)(Pi-l,z + Pi-1.3 + Pi-l,*) + ~ h ( 1 - $)Pi-1,l The proportion forgotten on this trial (recalled at least once before but not yet fixed, and not selected on this trial) is (4) p i . 4 = (1 - a)(p"-1,3 + pi-1,4). Finally, the propor-tion fixed by this trial is

    (5) Pi,, = Pi-1.5 + a4(Pi-l.2 + Pi-1.3 + Pi-1.4) + Aa4Pi-1.1 . This system of equations may be written in matrix notation as follows:

    Let T denote this matrix of transitional probabilities, and let pi denote the column vector of state probabilities on trial i. Therefore Tpi-l = pi . Before the learning trials begin, all words are in state 1. The initial distribution of probabilities p, is thus the column vector [0, 0, 0, 0, 11'. Given this initial vector, the state probabilities on trial i are

    pi,5 = 1 - (1 - a ) i + l - a4 [(I - a ~ ) ~ + l - (1 - A ) ~ + ~ ] , A - a4

    a(1 - a)X(l - 4) - (1 - A)', (, -)(A - 4 )


    P~2, - 3'"~ " -Xa))[(1 - X ) ' - (1 - o-)'],

    P,.1 = (1 -- X)'.

    Estimation o] the Parameters

    The main consideration that led to this model was the distribution of trials to first recall, which is shown in Fig. 2. According to the model, the probabil i ty of first recall on trial i, F, , is given by P,1,2 q- P~+I ,i - P~ .~ - P~ ,1 , which by (7) is

    (8) F, =. @"v[(1 -- X ) ' - (1 - ~)'] when # X 6r A

    = iX2(1 -- X) ~-1 when = X.

    Equat ion (8) describes a negative binomial distribution in the special ease that = X. The min imum ehi-square estimators of X and based on the data shown in Fig. 2 are X = = 0.495. This is the center of a confidence ellipse within which X 2 is less than 9.5, the 5-percent, significance level. The


    txl d ..d


    TABLE 2 Proportion of Words Forgotten on Trial t - 1 but Recalled on Trial t as a

    Function of the Number of Previous Recalls ( j ) (Number of occurrences in parentheses)

    extreme values of the ellipse are reached a t a = 0.60, X = .42, or vice versa, since (8) is symmetric in a and X.

    The final estimates of'X and u were chosen so as to be maximally con- sistent with the data shown in Table 2. Here each entry represents the transitional probability for the recall of a word on trial t , given that it was forgotten on trial t - I after having been recalled j times previously ( j 2 I). For the various combinations of t and j , these relative frequencies range from .54 to .79. Now, the present model predicts that a word which has been recalled at Ieast once, but has then been forgotten on one or more consecutive trials, will be recalled again on the next trial with probability a. A word that has been forgotten is in state 4, and its chances of moving either to state 3 or to state 5 on the next trial are u(1 - 6) and a+, respectively. The entries in Table 2, then, are estimates of a. They are all larger than .495, which was found to be the minimum chi-square estimator for this parameter. Therefore, we chose the largest estimate of u consistent with the first-recall data, or .60. The corresponding value of X is in this case '42.

    According to (3) and (5), the entries in Table 2 should be illdependent of j, the number of times a word has been recalled previously, as well as of t , the trial on which it is recalled again. It is evident, however, that these proportions are greater for j 2 2 than for j = 1. The data are in this respect a t variance with the model.

    The next characteristic of the data that we examined was the learning curve, the proportion of recalls as a function of the number of trials. This function appears in Fig. 3. From the model, the proportion of recalls on trial i, Ri , is given by Pi,, f P,,, , which by (7) is

    Note that if u = X this yields a negative expoilential function; and even when a Z X the third term of (9) is likely to be rather small. Using the


    1.0 1 I I I - I I g3 hA .A O .A .8


    (i) Let us first determine what proportion of the items that were recalled for the first time on trial i will be recalled again on trial i + j. The model states that an item that is initially recalled on trial i has moved on this trial from state 1 or state 2 into state 3 or state 5. Once it has done so, it cannot revert to state 1 or state 2. Thus the nine transitional probabilities that appear in the intersection of the first three columns and rows of T, the matrix operator in (6), form a closed set. Therefore, in order to predict what state a word will be in on the jth trial after it was first recalled, we have simply to apply this set of nine transitional probabilities to the vector which represents a set of state probabilities on the previous trial, where these states are now 3, 4, and 5.

    Let us call the new matrix operator U, and let us designate by R;.k the probability that a word which was recalled for the first time on trial i will be in state k on trial i + j. Let r denote the column vector of these state probabilities. Then Ur j _ l = r i :

    (10) 1 - - o" I - - o" i-1,4 = Ri,4 j ~ 1.

    0 ~-(1 - 4) o-(1 - 4) LR i - . , . J LR , ,~ J

    The column vector of initial probabilities, ro , is [4, 0, 1 - ~b]', since a proportion of the words are fixed (go into state 5) on the trial on which they are first recalled, while a proportion i - ~b are selected but not fixed (go into

    TABLE 3

    Number of Words Reca...