IN - Patrick Suppessuppes-corpus.stanford.edu/articles/psych/121.pdfand formal way. ... especially the grammar of language in terms of automata theory, ... understood of course that

.~- - - . ~- ~ -- ~

~~ ~

~~

STOCHASTIC MODELS IN MATHEMATICAL LEARNING THEORY

Patrick Suppes

Originally in this lecture I thought I would give a survey of the work in stochastic models of learning that has taken place over the past decade, but after talking to anumber of you informally and getting a better sense of the inter- ests of the group, I decided that a concentration on learning theory would not be as interesting or useful a s a description of my recent work that connects learning theory on the one hand, especially stimulus-response theory, with automata theory and language learning on the other. In my lecture on Monday, I mentioned that one open problem is the question of whether our theories are too simple, or whether we simply have not pursued them far enough to deal with complex behaviour. In subsequent conversation, Professor Ewens raised the following irn- portant objection. It is of course appropriate to investigate the reduction of the complex to the simple. But to do this, as in the case that I cited of the reduction of classical mathematical analysis to set theory, it was important and fundamental to develop classical analysis first. In other words, in most historical cases of interest, the reduction depended upon much prior work at various levels of complexity.

The problem that faces anyone in psychology who believes that the fundamental notions of stimulus-response theory can serve as a reduction is this. What sorts of behaviour have been formulated sufficiently in a mathematically precise way to attempt an analysis in terms of the simple concepts of stimulus- response theory? If we look, for example, a t what generally goes under the heading of cognitive psychology, i t is hard to see how seriously to begin an attempted reduction, The wriltings of Piaget a re highly suggestive, but they do not represent an integrated systematic body of knowledge, organized in an explicit and formal way. Ef we attempted a reduction of Piagetian theory, our aim would be obscure, because the imprecise formulation of the Fiaget corpus does not define what is to be done. Fortunately, a great deal of work that does give an explicit challenge to behavioural psychology now exists. I refer to the impõrtant and intellectually challenging work of linguists, particularly the transformational linguists. The analysis of language, especially the grammar of language in t e r m s of automata theory, provides something of a complexity and a formal exactness that constitutes a real reductionist challenge to stimulus-response theory.

T o forestall any misunderstanding, I must emphasize at once that what I say togay does not constitute anything like afina1 or a fully satisfactory analysis of language behaviour. I would be surprised, and I am sure you would not believe me, if I claimed that I had such an analysis. There is no deeper problem that confronts us than the understanding of language learning and language behaviour. I want none of you to think that the analysis I give this morning solves all prob- lems by showing how in one clear sense automata theory can be reduced to stimulus-response theory.

265

patguest

Typewritten Text

Mathematics in the Social Sciences in Australia. Canberra: Australian Government Publishing Service, 1972, pp. 265-273.

. ~ . . ~ - ~- ~ .~ ~

~ ~ _ . ~ - - ~

~ ~ . ~ ~ -

Automata theory can take another direction that has interesting and significant psychological connections. I have in mind the study of mathematical thinking and algorithmic behaviour, especially the learning of the algorithms of ari thmetic by young students. In spite of a very keen current interest in this appliC%tiOn of autornata theory, I shall not discuss it further in this lecture.

I return now to language learning. Let me first recall for YOU the situation regarding autornata and grammars. A finite automaton generates a regular language. There are various equivalent definitions of regular languages, but for convenience we can use the one that says a regular language is a language that has a one-sided linear grammar. Rewrite rules are of the form A--+XB. Such languages are certainly restricted. Next in thenatural hierarchy of automata, we encounter pushdown automata, which generate context-free languages. Linear bounded automata that generate context-sensitive languages follow. Although a good claim can be made that natural languages are not context-free, for our purposes it is sufficient to discuss the relation between pushdown automata and finite automata. The difference between the two is that the pushdown automata have an unbounded capacity for memory storage. A s the name indicates, in a pushdown automaton one must read from the top of the memory stack, but an indefinitely large number of words can be pushed into that stack. Even though we may wish to claim that simply as a language English is at the very least context- free and therefore requires a restricted infinite device like a pushdown automaton, in another sense strong claims certainly can be made that human-language processing can be represented by a finite device. One way of making the distinction is this. It is evident from the examination of any corpus of spoken speech that the length of sentences has a low upper bound and the depth of imbedding also has a low upper bound, If a uniform upper bound can be imposed for spoken speech, then a strictly finite device can do all of the processing required. It is not to the point to discuss this argument now, but i t is evident that an upper bound of a few thousand words certainly would constitute auniform bound for spoken speech as we know it in any natural language, given that our analysis is in terms of segmentation of something like sentences. In any event, i t would be a measurable step forward to show how a psychological theory could account for the most general type of finite device.

A better case for thinking of written English a s being represented o r generated by a pushdown automaton, o r some still richer infinite device, can be made because of the unlimited tape storage and the possibility of continual re-scanning of the text, which reduces considerably the memory load on the writer or reader, Whatever we may say about written natural languages, the case for a finite processing device for speakers and hearers of the spoken languge seems very strong. I shall not attempt to discuss it further here.

I turn now to stimulus-response theory. Let me begin by sketching how the theory postulates that an organism works. We may think of a sequence of t r i a l s and concentrate on what happens on a single trial. The organism begins the trial

266

~~ --

~~ ~~ ~

~

~~ ~~ ~ - - ~-

in a certain state of conditioning. The state of conditioning represents the way in which the relevant stimuli are conditioned to the various possible responses, We may represent this conditioning by a partition of the set s of stimuli with each element of the partition associated with the response to which the stimuli in that element of the partition are conditioned. At the beginning of the trial, certain stimuli are presented or are present in the environment, This subset T of S I call the presentation set. The organism then samples stimuli from this set T . On the basis of the stimuli sampled, a response is made. The response rule is that the probability of making response r is the proportion of conditioned stimuli sampled that are conditioned to response r. For example, if 15 stimuli are sampled, 5 of them are conditioned to response r 3 to response r and the remaining 7 are unconditioned, then the probability of making response r is 5/$. If the sample of stimuli contains no conditioned stimuli, then the probabihty of response is some fixed guessing probability. After a response is made, a reinforcing event occurs, In general this reinforcing event transmits information to the organism about which of the possible responses was appropriate. Following the reinforcing event, the sampled stimuli become conditioned to the reinforced response. This process is not deterministic, but is assumed to be probabilistic. That is, there is a probability c that each sampled stimulus not conditioned to the reinforced response will become so conditioned at the end of the trial. A new state of conditioning is thus entered, and the organism is ready for the next trial,

1’ 2’

This sketch of the theory has been formulated in t e rms of the experimental concept of discrete trials. Clearly, in a natural environment experience cannot be so segmented. The theory can be generalized to a continuous-time formulation in which the sequence of sampling, responding, and reinforcing is not fixed, but occurs according to time-dependent probability distributions. The conceptual foundations of such a continuous-time approach are to be found in Suppes and Donio (1967). However, for the purposes of reducing automata theory to stimulus-response theory, it is natural to remain within the discrete-trial framework, because automata theory also is formulated in such terms.

A second task is that of converting the sketch I have just given into a systematic formal statement of the theory, with axioms formulated in appropriate mathematical form. Such a formulation is to be found in Suppes (1969), and I shall not repeat it here. It should be clear from the sketch of the theory that I have drawn, nevertheless, that a proper mathematical formulation of the theory can be given within the general framework of stochastic processes.

I turn now to finite automata, and begin with an explicit definition.

Definition, The structure a = < A, C , M, s , F > is a finite automaton O if and only if:

. the set A is finite and nonempty (the set of internal states),

. the set Z is finite and nonempty (the alphabet that the automaton accepts),

2 67

~ ~~ ~

~

.~

~

, M is a function from A x z to A (M is the transition table of the automaton),

. s is a member of A (s is the initial state of the automaton),

. the set F is a subset of A (F is the set of final states). Q U

The standard definitions for this basic characterization of finite automata will not be given here, For example, it is clear how we define formally the notion of a tape or sequence of symbols of the alphabet being accepted by the automaton; namely, at the beginning of the tape the automaton starts in the initial state, and at the end of the tape the automaton is in one of the final states. For some applications a separate output alphabet and a separate output function are useful, but they are not necessary for our purposes and do not constitute an essential generalization,

From a general mathematical and conceptual viewpoint, the striking thing about finite automata is that we postulate very little about their internal structure, The conditions of finiteness are almost the only essential restrictions, It is surprising how much we can do with the general theory, but for deeper analysis it is evident that further restrictive assumptions are required. In the present context we want to speak at a maximum level of generality to obtain a general representation theorem. I shall restrict myself to connected automata. A finite automaton is connected when every state can be reached from the initial state by a sequence of inputs. States that cannot be reached are of no interest, and there is a strong behavioural sense of equivalence in terms of which we may show that any automaton is equivalent to a connected automaton.

Theorem. Given any connected finite automaton there exists a stimulus- response model that is asymptotically isomorphic to it.

Asymptotic here denotes the number of tr ials going to infinity. It must be understood of course that this sense of an asymptotic notion does not mean that we cannot realize the automaton in an experimental context. It is just the natural use of asymptotic notions in theory, customary jn all parts of probability, not simply in learning theory,

In order to make the mathematical content of the theorem definite, I would have to give the axioms of the stimulus-response theory that characterize stimulus-response models and to define the isomorphism of automata. It should be apparent from the definition above how isomorphism is defined, and I have said a good deal in an informal way about the conceptual basis of stimulus- response theory. The formal details are contained in Suppes (1969) and -will not be repeated here.

What I would like to do is to discuss the interpretation of the theorem and remark on i ts significance. First of all, themost important step in attempting to establish the theorem is to decide what in a stimulus-response model will represent the internal states of the automaton and what will represent the alphabet of the automaton. Because of the language I have used thus far it is natural to

2 68

think that the states of conditioning of the stimulus-response model should correspond to the internal states of the automaton. However, this is a mistake, because each state of conditioning corresponds to adifferent automaton, In other words, within the stimulus-response model there is represented a collection of automata, with each state of conditioning corresponding to a different automaton. The correspondence that does work is to make the internal states of the automaton coincide with the responses in the stimulus-response model. These responses we can think of a s both overt and covert on the part of the organism.

Next, what about the alphabet of the automaton? This correspondence seems more obvious. The alphabet corresponds either to stimuli or to an appropriate subset of stimuli. A letter of the alphabet in the simplest analysis will correspond to a stimulus, andina more generaland more subtle analysis, to a set of stimuli. In an experimentally realistic representation, we undoubtedly would represent a symbol of the alphabet of the automaton by a number of stimuli corresponding to distinctive features. Thus if the letters of the alphabet are ordinary letters of the Roman alphabet, and not words as is often the case in automata applications, and we are considering how children learn to read, various features of each grapheme correspond to stimuli, as for example, the presence of a vertical segment, the presence of a horizontal segment, the closed or open nature of the figure, etc.

h thinking about the proof, two additional questions occur. First, what is a reinforcement schedule that asymptotically will lead to the automaton? Second, what representation of states of conditioning will lead to a proof of the theorem, given the reinforcement schedule? To keep the notation of our discussion simple, and because we are dealing withthe intuitive argument and not a formal proof, I concentrate on a simple two-by-two automaton, that is, an automaton with two internal states and a two-letter alphabet. We may think of the organism making left and right responses that correspond to the two internal states; the two letters of the alphabet a r e the color stimuli red and green displayed above the response levers. These responses and stimuli correspond to some experiments we did early in 1968 with pigeons. The transition table of the automaton looks like this.

L R

L r

L g

R r

R g

1 O

O 1

O 1

1 O

269

It should be clear how to read the table. A 1 stands for the response that should be given when the preceding response is that shown on the left and the stimulus displayed is that indicated. In the table I have used L for the left response, R for the right response, r for the red stimulus, and g for the green stimulus. Jt is important to note that already in this transition table for the two-by-two automaton the correct response depends not only upon the stimulus display on the trial, but also upon the previous response. The transition table tells the experi- menter exactly how the reinforcement schedule should be organized. A s shown in the table, if e be the event of reinforcing the left response and e be the event of reinforcing the right response, then the part of the reinforcement schedule corresponding justto the first line of the table may be written its follows:

L R

Prob(e r & L ) = 1. L,n l n n- l

In writing this conditional probability for the reinforcement schedule, I have followed the usual familiar notation of subscripts to indicate the trial number. The format of this reinforcement schedule is classical. It is slightly more complex than the usual one, because of the dependence on both the current stimulus and the sre2eding response. What is interesting is that the reinforcement schedule for an automaton of arbitrary complexity will assume exactly the same general form. It especially should be noted that we do not have to consider the history of previous responses and stimuli occurring before the response of the immediately preceding trial.

Concerning the second question, I shall not say too much about the states of conditioning, I do want to remark that in the simplest analysis each response- stimulus pattern, that is, each pattern consisting of the response on trial n - 1 and the displayed stimulus. on trial n, is represented in the model as a single stimulus. This is not wholly realistic, but is done for the sake of simplifying the notation. It does not constitute a loss of generality in terms of the mathematical course of the argument. When this simplifyingassumption is made, then the state of conditioning consists essentially of stating what the conditioning connection is for each of these patterns. The theory is so stated that we have a Markov chain in these states of conditioning. Whatwe want to show is that with probability l the process becomes absorbed ásymptotically in the appropriate subset of states that represent the automaton. The thing to note is that matters will not come out right if we treat the red and green stimuli as stimulus elements, conditioned independently of the preceding responses. In other words, we cannot use a component analysis of stimuli, but must use a pattern analysis in which the pattern of the preceding response and present stinmlus is treated as a unit. It is important to note a t the same time, however, that this conception of patterns does not originate with this work on representation of automata in stimulus-response theory. It is a train of thought that has been used extensively experimentally ever since the first article on pattern models by Estes in 1959, Extensive experimental applications to multiperson interactions a r e to be found in Suppes and Atkinson (1960).

27 0

One remark to be made in the spirit of classical stimulus-response con- ceptions is that the sampling of the response that occurred on the previous trial does not fit naturally into many classical discussions. It is easy, however, to modify the formal setup to accommodate these classical ideas by introducing the notion of a stimulus trace, which ina more contemporary idiom we may want to refer to as an encoding in short-term memory.

Two corollaries to the main theorem stated above are worth mentioning, The first is that any regular language is generated by some stimulus-response model. This follows more or less directly from what I have said already, The second is related to some of the themes discussed in Hayward Alker's opening lecture. Within psychology one of the books that has come to the fore in the discussion of plans and purposive behaviour is Plans and the Structure of Behavior by Miller, Galanter, and Pribram (1960). Those of you who have looked at the book know how strongly the authors object to the conditioned reflex as a model of psychological processes. In dealing with their criticisms I want to make a distinction between the general theory of conditioning and particular classical conditioning experiments. As Miller, Galanter, and Pribram claim, classical conditioning experiments are, in automata terms, trivial when we talk of complex behaviour. I do not mean that other aspects of a physiological or neurophysiological sor t a re unimportant. It is just that at the level of behaviour they a r e trivial, and we can say why in an exact way. Classical conditioning experiments exempli€y the most trivial automaton, namely, the automaton with one state and a one- letter alphabet. Because of the one-state, one-letter restriction, we can all agree that these experiments do not exhibit complexbehaviour, But it is a mistake, and I would claim a mistake of a fundamental nature, to generalize from these experiments to a condemnation of conditioning theory itself. In order to show in an explicit way the kind of mistake that is made, we can look a t a representation of tote hierarchies, as introduced by Miller and Chomsky, to provide substance to the analysis found in the earlier book., In Volume 2 of the Handbook of Math- ematical Psychology, Miller and Chomsky (1963) show that tote hierarchies in the sense of Plans and the Structure of Behavior can be represented as finite automata. Their results may be used to establish the corollary that any tote hierarchy is isomorphic to some stimulus-response model at asymptote. In a sharp and explicit formal sense we thereby have a reduction of tote hierarchies to stimulus-response models and a corresponding reduction of the notion of purposive behaviour, a s introduced in the theory of totes.

In order to emphasize the limitations of what I have said today, let me close by mentioning two important aspects in which the analysis I have given seems unsatisfactory. Neither aspect represents a special problem of the stimulus- response approach to automata and language learning, but it is Characteristic of all current approaches.

If we examine the structure of automata and the way in which autornata generate a language, we find a fundamental puzzle is this. What are the input

data that produce the output? This is not the same sort of problem for compre- hension. h the case of generating speech, however, analyzing, identifying, and giving some psychological reality to the input is a major unsolved problem.

The second aspect 1 wish to mention is that the account given of language learning, or perhaps more appropriately the account implicit in what I have said about a stimulus-response approach, does not come to grips in any fundamental way with the problem of meaning and communication. h any fully satisfactory theory, we must be able to analyze not just the grammar, but the meaning of what is said and why it is said when it is said, We can break this problem into two parts. First there is the more or less classical semantical problem of analysis, the assigning of meaning to the utterances of a language. I emphasize strongly, however, that a classical assignment of meaning or the provision of a classical semantics of truth is not a complete answer to this problem. We further need an account of why a particular meaaingful sentence is said when it is. In other words, what in the context of the environment and preceding conversation determines the occurrence of the sentence that is in fact uttered? The classical semantical analyses that derive from contemporary work in logic do not give us sufficient tools for solving this problem.

Either one of these major aspects of language behaviour is left in an m- satisfactory state by what I have said today, and also by current psycholinguistic theories of a different sort. It must be emphasized that the outlines of a fully satisfactory theory are as yet far from clear,

272

REFERENCES

(1) Estes, W.K,,

(2) Miller, G A . , Chomsky, N. p

(3) Miller, G,A., Galanter, E., Pribram, K.H. ,

(5) Suppes, P., Atkinson, R.C. ,

(6) Suppes, P., &J

Donic, J. ,

6@omponent and pattern models with Markovian inter- pretations’, In R.R, Bush & W.K. Estes (Eds.), Studies in mathematical learning theorys Stanford, California: Stanford University Press, 1959, Pp. 9-52.

-

‘Finitary models of language usersJ r In R.D. Luce, R.R. Bush, & E. Galanter (Eds.), Handbook of mathematical psychology, Vol. 2 s New York: Wiley, 1963, P. 419-492.

Plans and the structure of behaviour, New York: Holt, 1960.

‘Stimulus-response theory of finite automata’, Journal of Mathematical Psychology, 1969, - 6, 327-355.

Markov learning. models for multiPerson interactions. Stanford, California: Stanford University Press, 1960.

’Foundations of stimulus- sampling theory for continuous-time processes J D Journal of Mathematical Psy- chology, 1967, Ay 202-225.

273

Documents

IN - Patrick Suppessuppes-corpus.stanford.edu/articles/psych/121.pdfand formal way. ... especially the grammar of language in terms of automata theory, ... understood of course that