30
Psychological Review Copyright 1981 by the American Psychological Association, Inc. 1981, Vol. 88, No. 1, 16-45 0033-295X/81/8801-0016S00.75 The Imagery Debate: Analogue Media Versus Tacit Knowledge Zenon W. Pylyshyn Center for Advanced Study in the Behavioral Sciences Stanford, California The debate over the nature of mental imagery, especially with respect to the interpretation of recent findings on the transformation of images, has failed to focus on the crucial differences between the so-called "analogue" and "prepo- sitional" approaches. In this paper I attempt to clarify the disagreements by focusing on the alleged spatial nature of images and on recent findings concerned with "rotation" and "scanning" of mental images. It is argued that the main point of disagreement concerns whether certain aspects of the way in which images are transformed should be attributed to intrinsic knowledge-independent properties of the medium in which images are instantiated or the mechanisms by which they are processed, or whether images are typically transformed in certain ways because subjects take their task to be the simulation of the act of witnessing certain real events taking place and therefore use their tacit knowledge of the imaged situation to cause the transformation to proceed as they believe it would have proceeded in reality. The fundamental difference between these two modes of processing is examined, and certain general difficulties inherent in the analogue account are discussed. It is argued that the tacit knowledge account is more plausible, at least in the cases examined, because it is a more general account and also because certain empirical results demonstrate that both "mental scanning" and "mental rotation" transformations can be critically in- fluenced by varying the instructions given to subjects and the precise form of the task used and that the form of the influence is explainable in terms of the semantic content of subjects' beliefs and goals—that is, that these operations are cogni- tively penetrable by subjects' beliefs and goals. Functions that are cognitively penetrable in this sense, it is argued, must be explained, at least in part, by reference to computational cognitive processes whose behavior is governed by goals, beliefs, and tacit knowledge rather than by properties of analogue mech- anisms. The study of mental imagery continues to perceptionlike experiences has become one be a major concern in cognitive psychology, of the focal points of the new mentalistic Since regaining acceptance about 15 years psychology. The purpose of this article is to ago, the study of processes underlying the comment on some of the recent theoretical sort of reasoning that is accompanied by work in this area in the light of the debate over the nature of mental imagery that has This article was written while I was a fellow at the been recurring in the literature over the past Center for Cognitive Science at MIT and at the Center 6 Or 7 years. The various positions in this for Advanced Study in the Behavioral Sciences at Stan- debate have been summarized in a number ford. The support of both these institutions » gratefully of , inc l uding most recen tl y in Shepard acknowledged, as well as the assistance provided by the /,A-ic \nna\ 5 i j n * Alfred P. Sloan Foundation and by a leave fellowship ( 1975 > 1978 )' Kosslyn and Pomerantz from the Social Science and Humanities Research (1977), Kosslyn, Pinker, Smith, and Shwartz Council of Canada. I would also like to thank Ned (1979), Paivio (1977), Anderson (1978), and Block, Jerry Fodor, and Bob Moore for their helpful Pylyshyn (1973, 1978, 1979a, 1979b). What 'Tqutts for reprints should be sent to Zenon W. l sha » d ° in > thi * article is P ick OUt what \ Pylyshyn, Department of Psychology, University of consider to be the most substantive Strand Western Ontario, London, Ontario, Canada N6A 5C2. in this disagreement and discuss it in relation 16

The Imagery Debate: Analogue Media Versus Tacit Knowledgenmr.mgh.harvard.edu/mkozhevnlab/wp-content/uploads... · The Imagery Debate: Analogue Media Versus Tacit Knowledge Zenon W

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Psychological Review Copyright 1981 by the American Psychological Association, Inc.1981, Vol. 88, No. 1, 16-45 0033-295X/81/8801-0016S00.75

The Imagery Debate: Analogue MediaVersus Tacit Knowledge

Zenon W. PylyshynCenter for Advanced Study in the Behavioral Sciences

Stanford, California

The debate over the nature of mental imagery, especially with respect to theinterpretation of recent findings on the transformation of images, has failed tofocus on the crucial differences between the so-called "analogue" and "prepo-sitional" approaches. In this paper I attempt to clarify the disagreements byfocusing on the alleged spatial nature of images and on recent findings concernedwith "rotation" and "scanning" of mental images. It is argued that the mainpoint of disagreement concerns whether certain aspects of the way in whichimages are transformed should be attributed to intrinsic knowledge-independentproperties of the medium in which images are instantiated or the mechanismsby which they are processed, or whether images are typically transformed incertain ways because subjects take their task to be the simulation of the act ofwitnessing certain real events taking place and therefore use their tacit knowledgeof the imaged situation to cause the transformation to proceed as they believeit would have proceeded in reality. The fundamental difference between thesetwo modes of processing is examined, and certain general difficulties inherentin the analogue account are discussed. It is argued that the tacit knowledgeaccount is more plausible, at least in the cases examined, because it is a moregeneral account and also because certain empirical results demonstrate that both"mental scanning" and "mental rotation" transformations can be critically in-fluenced by varying the instructions given to subjects and the precise form of thetask used and that the form of the influence is explainable in terms of the semanticcontent of subjects' beliefs and goals—that is, that these operations are cogni-tively penetrable by subjects' beliefs and goals. Functions that are cognitivelypenetrable in this sense, it is argued, must be explained, at least in part, byreference to computational cognitive processes whose behavior is governed bygoals, beliefs, and tacit knowledge rather than by properties of analogue mech-anisms.

The study of mental imagery continues to perceptionlike experiences has become onebe a major concern in cognitive psychology, of the focal points of the new mentalisticSince regaining acceptance about 15 years psychology. The purpose of this article is toago, the study of processes underlying the comment on some of the recent theoreticalsort of reasoning that is accompanied by work in this area in the light of the debate

over the nature of mental imagery that hasThis article was written while I was a fellow at the been recurring in the literature over the past

Center for Cognitive Science at MIT and at the Center 6 Or 7 years. The various positions in thisfor Advanced Study in the Behavioral Sciences at Stan- debate have been summarized in a numberford. The support of both these institutions » gratefully of , including most recently in Shepardacknowledged, as well as the assistance provided by the /,A-ic \nna\ 5 i j n *Alfred P. Sloan Foundation and by a leave fellowship (1975> 1978)' Kosslyn and Pomerantzfrom the Social Science and Humanities Research (1977), Kosslyn, Pinker, Smith, and ShwartzCouncil of Canada. I would also like to thank Ned (1979), Paivio (1977), Anderson (1978), andBlock, Jerry Fodor, and Bob Moore for their helpful Pylyshyn (1973, 1978, 1979a, 1979b). What

'Tqutts for reprints should be sent to Zenon W. l sha» d° in> thi* article is Pick OUt what \Pylyshyn, Department of Psychology, University of consider to be the most substantive StrandWestern Ontario, London, Ontario, Canada N6A 5C2. in this disagreement and discuss it in relation

16

THE IMAGERY DEBATE 17

to some of the most persuasive recent em-pirical findings and the most widely acceptedtheoretical accounts of these findings. Forthis purpose I shall make extensive referenceto the overview article by Kosslyn et al.(1979), since it represents the most explicitformulation of the "imagistic" (or "picto-rial" or "analogical") position to date. Indoing this I shall be highly selective in thequestions I shall address. There is much inthe imagery literature that can be (and hasbeen) debated, not all of which is equallysignificant from a theoretical standpoint.Thus one could argue over whether imagesare continuous or discrete, concrete or ab-stract, holistic or articulated, pictorial ordiscursive (whether they depict, like pic-tures, or refer, like descriptions), and whetherthey constitute a fundamentally differentform of cognition or are merely a species ofa single form used in all cognitive processing(and at what level they are considered to bethe same or different). There have even beenarguments over whether images are epiphe-nomenal or whether they are functional incognition; but questions such as the lattercannot even be addressed until one takes atheoretical stand concerning the propertiesof images. One cannot say of something thatit is or is not epiphenomenal until one hasa clear statement of what that something is.For example, to the extent that image refersto what I experience when I imagine a scene,then surely that exists in the same sense thatany other sensation or conscious content does(e.g., pains, tickles, etc.). If, on the otherhand, image refers to a certain theoreticalconstruct that is claimed to have certainproperties (e.g., to be spatially extended)and to play a specified role in certain cog-nitive processes, then the appropriate ques-tion to ask is not whether the construct isepiphenomenal but whether the theoreticalclaims are warranted, and indeed whetherthey are true.

In my view, however, the central theoret-ical question in this controversy is whetherthe explanation of certain imagery phenom-ena requires that we postulate special typesof processes or mechanisms, such as onescommonly referred to by the term analogue.I shall discuss one plausible interpretationof this notion—one that does indeed repre-

sent a fundamental difference in approachfrom the one I have been advocating. In ad-dition to this issue, several of the other dis-tinctions mentioned above can also betouched upon if we focus as sharply as pos-sible on one particular claim often made inthe imagery literature, namely, the allegedspatiality of images, and on a set of proto-typical experimental findings that have beentaken as establishing this particular propertyof images, namely, those that demonstrate"mental rotation" and "mental scanning" ofimages. In this regard I shall argue that theonly real issue that divides the proponentsof what has unfortunately become known asthe "images versus propositions" debate isthe question of whether certain aspects ofcognition, generally (though not exclusively)associated with imagery, ought to be viewedas governed by tacit knowledge—that is,whether they should be explained in termsof processes which operate upon symbolicencodings of rules and other representations(such as beliefs and goals) or whether theyshould be viewed as intrinsic properties ofcertain representational media or of certainmechanisms that are not alterable in no-mologically arbitrary ways by tacit knowl-edge. I have elsewhere referred to suchmechanisms as constituting the "functionalarchitecture" of the mind (Pylyshyn, 1980a,1980b). In this article I will present argu-ments and evidence in support of the viewthat most of the empirical phenomena in-volving transformations of images (such asthe image scanning results of Kosslyn et al.,1979) are better explained according to thetacit knowledge theory.

The Appeal to Properties of an AnalogueMedium

In discussing the question of whether im-ages are epiphenomenal, Kosslyn et al. (1979)assert that "none of the models of imagerybased on Artificial Intelligence researchtreat the images that people report experi-encing as functional representations" (p.536). This strange yet widely held view isbased in part on a misconception concerningwhat in fact is reported in imagery. As Hebb(1968) has pointed out, what people reportis not properties of their image but of theobjects that they are imaging. Such prop-

18 ZENON W. PYLYSHYN

erties as color, shape, size, and so on areclearly properties of the objects that arebeing imagined. This distinction is crucial.The seemingly innocent scope slip that takesimage of object X with property P to mean(image of object X) with property P insteadof the correct image of (object X with prop-erty P) is probably the most ubiquitous anddamaging conceptual confusion in the wholeimagery literature.

To see that this slip is not a mere way ofspeaking but carries considerable weight inexplanations of imagery phenomena, con-sider the case of the generally accepted "spa-tial" character of images. Take, for example,the elegant experiments by Kosslyn (1973,1975) and by Kosslyn, Ball, and Reiser(1978) involving "mental scanning" of im-ages, which show that the further away anitem is from the place on an image that iscurrently being focused on, the longer ittakes to see or focus on and report that itemin the image. I shall take up the question ofthe interpretation of these results in theTacit Knowledge and Mental Scanning sec-tion below. For the present I simply wish topoint out that the< story that goes with Kos-slyn's interpretation inherits its plausibilityand compellingness from a systematic equiv-ocation over which particular entity has theproperty length in precisely the manner sug-gested in the previous paragraph.

For example there can be no disputing theKosslyn et al. (1979) claim that "these re-sults seem to indicate that images do rep-resent metrical distance" (p. 537). But in thevery next sentence this format-neutral claimbecomes transformed into the substantiveassertion that "images have spatial extent"—that is, that the image itself has rather thanrepresents length or size. This transforma-tion, moreover, is essential to the particularaccount of the scanning experiments thatKosslyn et al. wish to promote. That is be-cause the naturalness of the scanning notioncomes from the lawfulness of

D

s' (1)

tively. If Equation 1 were literally applicableto the image, then this account of the scan-ning results would be a principled one, sincethe equation represents a universal principleor basic fact of nature. If, on the other hand,we were to keep with the first way Kosslynet al. put their claim (viz., that images rep-resent, rather than have distance), we would,instead, have to appeal to a different sort ofregularity, one that might for instance beexpressed roughly by

= F(D', S'), (2)

where D' = R\(D) is some representation ofdistance using encoding /?,, S' = R2(S) issome representation of mean speed us-ing encoding R2, and F is a function thatmaps pairs of representations D' and 5"onto real time such that for all distances,d, and speeds, s, it will be the case that

In this equation, of course, T, D, and Sare to be interpreted as real time, real phys-ical distance, and real mean speed, respec-

Now Equation 2 is clearly not a law ofnature. There can be no general universallaw governing the amount of time that ittakes to transform representations, since ob-viously that depends upon both the form ofthe representations and the available oper-ations for transforming them. The equationT = R\(D) -T- R2(S) is far from expressinga nomological law. In fact, if F is to be re-alized computationally, we must view Equa-tion 2 as asserting that there is some process,P, which, given the representations D' andS' as inputs (together with other specifica-tions such as a beginning and ending state)takes T sec to complete, where T in this casehas to equal D + S. Obviously, unless thevarious representations and the process P areespecially selected, Equation 2 will be false.For example, it is a nontrivial exercise todesign an algorithm that always terminatesafter D -r S sec when given two expressionsrepresenting the numerals for D and S (ex-cept for the degenerate case in which thealgorithm calculates D -r- S and then simplywaits idly for that amount of time to go by).Now by systematically leaving out the wordsrepresentation of or by using ambiguous de-scriptions, such as saying that images "pre-serve relative metrical distances" (which canbe interpreted as meaning either that theyhave or that they represent distances), it ispossible to create the illusion of having the

THE IMAGERY DEBATE 19

explanatory power provided by Equation 1while at the same time avoiding the onto-logical claim that goes with it (viz., that im-ages are actually laid out in space some-where in the brain).

Another way to put the point about therelative explanatory power of the literal ac-count based on Equation 1, compared withthe representation account based on Equa-tion 2, is in terms of the degrees of freedomin these two explanatory principles. If weassume that it is literally the case that phys-ical space is involved, then the form of therelation among distance, speed, and timewould be fixed as in Equation 1. If, on theother hand, only a representation of spaceis involved, and thus the regularity is ex-pressed by Equation 2, the form of the func-tion F is actually a free empirical parameterthat is obtained by observing instances of thevery phenomena that require explaining.This means that an explanation based onEquation 2 has more degrees of freedom andhence less explanatory power than an expla-nation based on Equation 1. Even more se-riously, however, if we take Equation 2 asthe appropriate formulation, then we needa theoretical account of why the relationholds and by what mechanisms it is realized.Even if we maintain that the cognitive sys-tem has evolved that way for one reason oranother we still want to know what cognitivemechanisms are responsible for that behav-ior. There have traditionally been two ap-proaches to providing such an account.

l.The first is to say that a subject makesEquation 2 come out (perhaps voluntarily,though often unconsciously) because he orshe has tacit knowledge of Equation 1. Inother words, regardless of the form of his orher representation, the subject knows thatEquation 1 holds in the world and thereforemakes it be the case (using some form ofsymbolic analysis, the exact nature of whichneed not concern us here) that the amountof time spent imagining the scanning willconform to this relation. We shall discussthis possibility in greater detail in the TacitKnowledge and Mental Scanning sectionbelow.

2. The second way is to say that Equation2 is the case because of properties of therepresentational medium. This is just to say

that the observed function has that form asa consequence of the intrinsic lawful rela-tions that hold among the particular physicalproperties that in fact represent distance andmean speed in the brain. For example, ifdistance were represented by the electricalpotential between two points separated bya certain electrical capacitance, and meanspeed were represented by current flow, then(within limits) the time taken would havethe form given by Equation 2. This corre-sponds to what I would call the analogueview.

It should be appreciated that Alternatives1 and 2 represent two fundamentally differ-ent ways of explaining the underlying pro-cess responsible for the observed behavioralregularities. Alternative 1 appeals to sym-bolically encoded facts about the world andto rules for transforming representations anddrawing inferences. It is a "cognitivist" ap-proach such as advocated by Fodor (1975,1980), Chomsky (1980), Newell and Simon(1976), and others. On the other hand, Al-ternative 2 represents what I would call theanalogue approach to mental representationand mental processing. The term analoguehas been used to refer to a wide range ofcharacteristics of models and representa-tions covering everything from the mathe-matical continuity of representations to thesimple requirement that the representationgo through intermediate states representingthe intermediate states that the actual sys-tem being represented would go through(e.g., Shepard, 1975). All of these capturesomething of what we intuitively mean byanalogue. In my view, however, the only as-pect of analogues that is relevant to the im-agery debate (i.e., that differentiates amongthe major competing views) is the one raisedby the distinction between Alternatives 1and 2,—that is, an analogue process (rep-resented by Alternative 2) is one whose be-havior must be characterized in terms of in-trinsic lawful relations among properties ofa particular physical instantiation of a pro-cess, rather than in terms of rules and rep-resentations (or algorithms). Whenever peo-ple appeal to an "analogue representationalmedium" (e.g., Attneave, 1974) or to a "sur-face display" (e.g., Kosslyn, et al., 1979),they take it for granted that this medium

20 ZENON W. PYLYSHYN

incorporates a whole system of lawfully con-nected properties or intrinsic constraints(some of which have mathematical proper-ties isomorphic to Equation 1 above) andthat it is precisely this set of properties andrelations that determines how objects rep-resented in that medium will behave. Suchpeople specifically contrast these accountswith ones like our Alternative 1, whichclaims that how the representation will be-have is a function of what the person knowsabout the actual behavior of the things rep-resented, rather than of properties of themedium in which it is represented.

Although there are various conceptions ofwhat analogue processing is (as I suggestedabove), I suspect that the other senses areactually derivative from the sense I amadopting. Thus, for instance, any process canbe made to go through an appropriate se-quence of intermediate states, and even todo so in very small (quasi-continuous) steps—even a purely verbal process. Yet we wouldnot want to count such a model as analogueif the mechanism were not naturally con-strained to go through such a sequence. Thuswe would count the process as analogue ifits going through particular intermediatestates were a necessary consequence of in-trinsic properties of the mechanism or me-dium, rather than simply being a stipulatedrestriction that we arbitrarily imposed on amechanism that could carry out the task ina quite different way. Palmer (1978) hastaken a similar position with regard to thedistinction between analogical and nonan-alogical processes. From this, however, Pal-mer draws the unwarranted conclusion thatonly biological evidence will distinguish be-tween the two forms of processing. But, asI have argued at some length (Pylyshyn,1979b, 1980b), if we contrast analoguemechanisms with ones that operate on rep-resentations or tacit knowledge, the distinc-tion can be seen to be a functional one thatcan be empirically decided by behavioralcriteria. An example of one such criterionis discussed below. Other criteria are dis-cussed in Pylyshyn (1979b, 1980b).

The Appeal to Tacit KnowledgeThe distinction between analogue pro-

cesses and rule-governed or cognitive pro-

cesses (also referred to as computational orinformational processes) is one that, in itsmost general form, needs to be drawn withsome care, since after all, both are physicallyrealized in the brain, although in quite dif-ferent ways. The issue reduces to the ques-tion of when different forms of explanationof the behavior are appropriate. I have at-tempted to develop the general argument atlength elsewhere (Pylyshyn, 1980a, 1980b).For the present purposes, however, a briefsketch of that discussion will do, since theonly cases relevant to the imagery debate areunproblematic.

The operation of some processes can beexplained perfectly well by giving an accountof how various of their physical propertiesare causally connected, so that, for instance,altering some physical parameter here (e.g.,by turning a knob) leads to specifiablechanges in another parameter there becauseof some law connecting these two properties.Such a physical causal account will not do,however, to explain connections that are in-dependent of the particular physical formthe input takes yet that follow a single gen-eral principle that depends only on the se-mantic content of what might be called theinput message. Thus, for example, if beingtold over the telephone that there is a firein the building, seeing the word fire flash ona screen, hearing what you take to be a firealarm, smelling smoke in the ventilator duct,seeing flames in the hallway, and so on with-out limit, all lead to the same building-evac-uation behavior, the relevant generalizationcannot be captured by a purely causal input-output story, since each such stimulus wouldinvolve a distinct causal chain and the setof such chains need have no physical lawsin common. In that case, the generalizationcan only be stated by postulating internalbelief and goal states (e.g., the belief thatthe building is on fire and knowledge aboutwhat one ought to do in such circumstances,as well as other tacit knowledge and the ca-pacity to make inferences). Such processesare explainable only in terms of the media-tion of rules and representations (since it isclear, for example, that neither a behavioralprinciple such as "Make sure you don't gettoo close to a fire" nor a logical principlesuch as modus ponens expresses a physical

THE IMAGERY DEBATE 21

law and that they hold regardless of whatkind of physical substance they are instanti-ated in).

A corollary of this explainability claim isthat if a certain behavior pattern (or input-output function) can be altered in a way thatis rationally connected with the meaning ofcertain inputs (i.e., what they refer to, asopposed to their physical properties alone),then the explanation of that function mustappeal to operations upon symbolic repre-sentations such as beliefs and goals: It must,in other words, contain rule-governed cog-nitive or computational processes. A func-tion that is alterable in this particular wayis said to be cognitively penetrable. The cri-terion of cognitive penetrability (amongother considerations) will be used in laterdiscussions as a way of deciding whetherparticular empirically observed functionsought to be explained by Alternative 1 or byAlternative 2 above. Specifically, I shallmaintain that if the form of certain imagetransformation functions reported in the lit-erature can be altered in a particular sortof rationally explicable manner by changingwhat the subject believes the stimulus to beor by changing the subject's interpretationof the task (keeping all other conditions thesame), then the explanation of the functionmust involve such constructs as beliefs,goals, or tacit knowledge, rather than theintrinsic properties of some medium—thatis, some part of the explanation must takethe form of Alternative 1.

The essence of the penetrability conditionis this: Suppose subjects exhibit some be-havior characterized by a function, f\ (say,some relation between reaction time anddistance or angle or perceived size of animagined object), when they believe onething, and some different function, /2, whenthey believe another. Suppose further thatwhich particular / they exhibit bears somelogical or rational relation to the content oftheir belief: For example, they might believethat what they are imagining is very heavyand cannot accelerate rapidly under someparticular applied force, and the observed /might then reflect slow movement of thatobject on their image. Such a logically co-herent relation between the form of / andtheir belief (which we refer to as the "cog-

nitive penetrability of/") must be explainedsomehow. Our claim is that to account forthis sort of penetrability of the process, theexplanation of / itself will have to containprocesses that are rule governed or compu-tational, such as processes of logical infer-ence, and that make reference to semanti-cally interpreted entities (i.e., symbols). Theexplanation cannot simply say that there aresome causal (biological) laws which resultin the observed function/(i.e., it cannot citean analogue process), for exactly the samereason that an explanation of this kind wouldnot be satisfactory in the building-evacua-tion example above: because the regularityin question depends on the semantic content(in this case of beliefs) and on logical rela-tions that hold among these contents. Al-though in each particular case some physicalprocess does cause the behavior, the generalexplanatory principle goes beyond the set ofall observed cases (i.e., there may be tokenreduction but no type reduction of such prin-ciples to physical principles; (see Fodor,1975). A process that is sensitive to the log-ical content of beliefs must itself contain atleast some inferential (or other content-de-pendent) rule-governed process. It should beemphasized that cognitive penetrability re-fers not merely to any influence of cognitivefactors on behavior but to a specific kind ofsemantically explicable (e.g., rational or log-ically coherent) relationship. The exampleswe shall encounter in connection with dis-cussions of imagery will be clear cases of thissort of influence (for more on this particularpoint, see Pylyshyn, 1980a). It should alsobe noted that being cognitively penetrabledoes not prevent a process from having an-alogue components: It simply says that itshould not be explained solely in terms ofanalogues with no reference to tacit knowl-edge, inference, or computational processes.

The concept of tacit knowledge—as a gen-eralization and extension of the everydaynotion of knowledge (much as the physicists'concept of energy is an extension of the ev-eryday notion)—is one of the most powerfulideas to emerge from contemporary cogni-tive science (c.f. Fodor, 1968), althoughmuch remains to be worked out regardingthe details of its form and function. It isalready clear, however, that tacit knowledge

22 ZENON W. PYLYSHYN

cannot be freely accessed or updated by ev-ery cognitive process within the organism,nor can it enter freely into any logically validinference. For example, much of it is notintrospectable or verbally articulable (rele-vant examples of the latter would includeour tacit knowledge of grammatical or log-ical rules, or even of most social conven-tions). A great deal needs to be learnedabout the control structures of the cognitivesystem that constrains our access to tacitknowledge in various elaborate ways. Theexistence of such constraints is no doubtwhat makes it possible for people to holdcontradictory beliefs or to have beliefs thatare only effective within certain relativelynarrow classes of tasks. For example, itmight well be that many people only haveaccess to their tacit knowledge of physicswhen they are acting upon the world (e.g.,playing baseball) or perhaps when they areengaged in something we call visualizingsome physical process, but not when theyhave to reason verbally or answer certainkinds of questions in the abstract. Nonethe-less, in all of these cases it would clearly beinappropriate to view such visualizing asbeing controlled by a medium or "surfacedisplay" that caused the laws of physics tohold in the image. A better way to view thecause of the regularities in the movement ofobjects in the visualized scene is in terms ofsubjects' tacit knowledge about the physicalworld and in terms of the inferences thatthey make from this knowledge. I shall con-sider other such examples in the next sectionwhen I argue that the appearance of auton-omous unfolding of imagery sequences maybe very misleading.

Incidentally, when one constructs a com-puter model using something called a matrixdata structure rather than something calledan analogue representational medium, oneis not thereby relieved of the need to makea distinction between Alternatives 1 and 2.Because the existence of a computer modeloften carries the implication that there canno longer be any ambiguity or terminologicalconfusions, it is worth examining one suchmodel briefly.

In describing the Kosslyn and Shwartz(1977) computer model, Kosslyn et al. (1979)appear ready to admit that much of the

model's explanatory and predictive capacityderives from what they refer to as the "cath-ode ray tube proto-model." In this proto-model there is no problem in seeing how aprincipled account of the scanning resultscan be derived. The CRT is a real physicaldevice to which properties such as distanceapply literally, and thus our earlier expla-nation of the observed scanning functionapplies in virtue of the applicability of Equa-tion 1. But as we have already noted, suchan explanation is a principled one only whenit refers to a physical system whose intrinsiclawful behavior is described by Equation 1.Thus the explanatory power of the CRTproto-model only transfers to the humancognition case if there is something in thebrain to which Equation 1 also applies. Onthe other hand, the version of the model thatuses the matrix data structure, in whichthere is no actual physical CRT, lends itselfequally to either one of the following twointerpretations. In the first interpretation,the part of the model that contains the two-dimensional image (i.e., the 2-D matrix andits relevant access operations) is consideredas merely a simulation of the physical screen,in which case the model really does assumethe existence of a spatially laid out patternin the brain. In the second interpretation, thematrix and the set of relevant accessing op-erations is viewed as a specific proposal forhow the function F required by Equation 2might be realized. In the latter case, how-ever, when we give a theoretical interpre-tation of the claims associated with that partof the model, we still have a choice of thetwo basic views I have been calling Alter-natives 1 and 2, exactly as we did when wewere examining the informal account of thescanning results (e.g., Does the adjacencyrelation in the matrix represent subjects'knowledge that the elements referred to arenext to one another, or is it an intrinsic con-straint of that particular format?).

Appealing to a matrix in explaining cer-tain imagery results is only useful if matricesconstrain the representations or operationson representations in specified ways. If theydo constrain the form of representations,then they function essentially as a simulationof an underlying analogue representationalmedium. (It might be noted in passing that

THE IMAGERY DEBATE 23

if we were to take the matrix structure se-riously, we would be stuck with the una-voidable conclusion that mentally repre-sented space is necessarily nonisotropic. Thisis a formal consequence of the fact that amatrix is a tesselation of cells of some fixedshape and hence has certain essential non-isotropic properties. For example, if the cellsare assumed to be square, then regardlessof how fine we make them, scanning diag-onally will be faster by a factor of the squareroot of two than scanning vertically or hor-izontally. Such an entailment cannot easilybe glossed over, except by viewing the matrixas merely a metaphor for some unspecifiedspatial characteristics.)

The distinction between analogue andwhat I have sometimes referred to as prop-ositional, but is perhaps better thought of assimply symbolic, is fundamental to a widerange of issues in the foundations of cogni-tive science (see Pylyshyn, 1980b). In thespecific case of models of mental scanning,the distinction is important because Alter-native 1 allows for the possibility that theresults of mental scanning experiments mayrepresent a discovery about what subjectsbelieve and what they take the goal of theexperiments to be, rather than a discoveryabout what the underlying mechanisms ofimage processing are. A consequence of theformer alternative is that if subjects per-ceived the task differently or had differenttacit beliefs about how the objects in ques-tion would move or about properties ofspace, then the experimental results couldbe quite different. On the other hand, ifAlternative 2 were correct, then manipula-tion of such things as the form of the taskand the instructions should not have a cor-responding, rationally explainable effect(provided, of course, that imagery was stillbeing used). Otherwise we would have to saythat the medium changes its properties tocorrespond to what subjects believe aboutthe world, in which case appealing to theexistence of an analogue medium wouldserve no function.

Before turning to a discussion of some spe-cific theoretical proposals, let me summarizethe picture I have presented. Figure 1 illus-trates the structure of alternatives availablein explaining a variety of imagery findings.

We can, first, choose a literal spatially ex-tended brain-projection model. Althoughthere is no a priori reason for excluding thisalternative, it does raise some special prob-lems if we try to explain the full range ofimagery phenomena this way, and as far asI can tell, no one since Wertheimer has takenit seriously (though some, for instance Ar-bib, 1972, have come close). In any case theliteral approach can be viewed as a specialcase of the analogue approach I shall bediscussing in detail later. Continuing downour tree of alternatives, if we take the func-tional, as opposed to the literal or structural,approach, our task becomes to explain howthis function could be realized by some pos-sible mechanism (e.g., how Equation 2 couldbe realized in the case of mental scanning).Here we come to what I take to be the fun-damental bifurcation between the two campsin the imagery debate, between those whoadvocate the analogue (or intrinsic propertyof a medium) view and those who advocatethe symbolic or tacit knowledge view. Muchconfusion arises in this debate because thereis considerable equivocation regarding ex-actly what the referents of ambiguous phrasessuch as spatial representation or preservesmetric spatial information are intended tobe from the point of view of this tree of al-ternatives. However, once the problem hasbeen formulated so as to factor away themisleading implications associated with theuse of a physical vocabulary, or a vocabularythat is appropriate for describing the rep-resented domain as opposed to the psycho-logical processes or mechanisms (see thenext section), we are left with a basic em-pirical question: Which aspects of an organ-ism's function are attributable to intrinsic(analogue) processes, and which are attrib-utable to transactions on a knowledge base?It is to this empirical question that I nowturn.

The Autonomy of the Imagery Process

It seems to me that the single most in-triguing property of imagery, and the prop-erty that appears, at least on first impression,to distinguish it from other forms of delib-erate rational thought, is that it has a certainintrinsic autonomy—both in terms of re-

24 ZENON W. PYLYSHYN

Explanation of "Spatial Representation"(Reason why images arc "Spatial")

Literal Space(e.g. Formula #1)

Functional Space(e.g. Formula #2)

Tacit Spatial Knowledge(Alternative Explanation 1)

Analogue Medium(Alternative Explanation 2)

Figure I . Theoretical positions on the nature of spatial representations.

quiring that certain properties of stimuli(e.g., shape, size) must always be repre-sented in an image and with respect to theway in which dynamic imagery unfolds overtime. Consider the second of these. The lit-erature contains many anecdotes suggestingthat in order to imagine a certain property,we first have to imagine something else (e.g.,to imagine the color of someone's hair, wemust first imagine the person's head or face;to imagine a certain room, we must firstimagine entering it from a certain door; toimagine a figure in a certain orientation, wemust first imagine it in a standard orienta-tion and then imagine it rotating; to have aclear image of a tiny object, we must firstimagine "zooming in" on it, and so on).Sometimes imagery even seems to resist ourvoluntary control. For example, in conduct-ing a study of mental rotation of images, Iinstructed subjects to imagine moving arounda figure, pictured as painted on the floor ofa room. A number of subjects reported con-siderable difficulty in one of the conditionsbecause the path of the imaginal movementwas impeded by a wall visible in the pho-tograph. They reported that they could notmake themselves imagine moving around thefigure because they kept bumping into thewall! Such responsiveness of the imaginationto involuntary processes and unconcious con-trol is one of the main reasons why imageryis associated with the creative process: Itappears to have access to tacit knowledgeand beliefs through other than deliberate in-tellectual routes.

Other examples involving imaginal move-

ment may be even more compelling in thisrespect. Imagine dropping an object andwatching it fall to the ground or throwinga ball and watching it bounce off a wall.Does it not naturally obey physical laws?Imagine rotating the letter C counterclock-wise through 90°. Does it not suddenly ap-pear to have the shape of a U without yourhaving to deduce this? Imagine a squarewith a dot inside it. Now imagine the widthof the square elongating until it becomes awide rectangle. Is the dot not still inside thefigure? Imagine the letters A through Ewritten on a piece of paper in front of you.Can you not simply see by inspection thatthe letter D is to the right of the letter fi?In none of these examples is there anyawareness of what Haugeland (1978) calls"reasoning the problem through." The an-swer appears so directly and immediatelyavailable to inspection that it seems absurdto suggest, for example, that knowledge oftopological properties of figures is relevantto the elongating square example or thattacit knowledge of the formal properties ofthe relation "to the right of (that it is ir-reflexive, antisymmetric, transitive, con-nected, acyclic, etc.) is involved in the arrayof letters example. Such considerations havesuggested to people that various intrinsicproperties of imaginal representations arefixed by the underlying medium and that weexploit these fixed functional capacities whenwe reason imagistically. I believe that thisintuition is the primary motivation for thewidespread interest in analogue processes.

Now in general these are not implausible

THE IMAGERY DEBATE 25

views. One should, however, be cautious inwhat one assumes to be an intrinsic functionthat is instantiated by the underlying bio-logical structure, as opposed to one that iscomputed from tacit knowledge by the ap-plication of rules to symbolically representedbeliefs, goals, and so on. In the previous sec-tion (as well as in Pylyshyn, 1980b) I haveattempted to provide some necessary (thoughnot sufficient) conditions for a function'sbeing instantiated in this sense. The condi-tion that I have found to be particularly use-ful in clarifying this distinction, especiallyin the case of deciding how to interpret ob-servations such as those sketched above, isthe one I called the cognitive impenetrabilitycriterion. Recall that a function was said tobe cognitively impenetrable if it could notbe altered in a way that exhibits a coherentrelation to the meaning of its inputs. Forexample, although a function might stillcount as being cognitively impenetrable ifit varied with such things as practice orarousal level or ingestion of drugs, it wouldnot be viewed as cognitively impenetrable ifit changed in rationally explainable ways asa function of such things as whether a sub-ject believes that the visually presented stim-ulus depicts a heavy object (and hence vis-ualizes it as moving very slowly) or whetherthe subject views it as consisting of one ortwo figures, or as depicting an old womanor a young lady (in the well-known illusion),and as a consequence behaves in a way ap-propriate to that reading of the stimulus. Iargued that cognitively penetrable phenom-ena such as the latter would have to be ex-plained in terms of a cognitive rule-governedprocess, acting upon semantically inter-preted representations and involving suchactivity as logical inferences, problem solv-ing, guessing, associative recall, and so on,rather than in terms of the sort of naturallaws that explain the behavior of analogueprocess.

Now many functions that appear at firstto be biologically instantiated, and thereforealterable only in certain highly constrainedlaw-governed respects, could turn out oncloser inspection to be arbitrarily alterablein logically coherent ways by changes in sub-jects' beliefs and goals (i.e., they could turnout to be cognitively penetrable) and there-fore to require a cognitive process account

(based on appeal to tacit knowledge andrules). The tremendous flexibility of humancognition, especially in respect to the morecentral processes involved in thinking andcommonsense reasoning, may very well notadmit of many highly constrained (nonpro-grammable) functions. It may illuminate thenature of the appeals to tacit knowledge ifwe consider some additional everyday ex-amples. For instance, imagine holding inyour two hands, and then simultaneouslydropping, a large and a small object or twoidentically shaped objects of differentweights. Which object in your image hits theground first? Imagine turning a large heavyflywheel by hand. Now imagine applying thesame torque to a small aluminum pulley.Which one completes one revolution in yourimage first? Imagine a transparent yellowfilter and a transparent blue filter side byside. Now imagine slowly superimposing thetwo filters. What color do you see in yourimage through the superimposed filters?Form a clear and stable image of your fa-vorite familiar scene. Can you now imagineit as a photographic negative, or as being outof focus, or in mirror image inversion, orupside down? Imagine a transparent plasticbag containing a colored fluid, being heldopen with four parallel rods at right anglesto the mouth of the bag, and in such a waythat the cross section of the bag is a square.Now imagine the four rods being movedapart so that, with the plastic bag still tightaround them, the rods now give the bag arectangular cross section. As you imaginethis happening, does the fluid in the bag rise,fall, or stay at the same level (in other words,how does volume vary with changes of cross-sectional shape, perimeter remaining con-stant)? Imagine a glass half full of sugar andanother nearly full of water. Imagine thesugar being poured into the glass with waterin it. Examine your image to see the extentto which the resulting height of water rises(if at all).

These examples, it seems to me, are notin principle different from the ones in thefirst list I presented. For many people theseimaginings also unfold naturally and effort-lessly, without any need to reason throughwhat would happen. Yet it seems clearer inthese cases that whatever happens as thesequence unfolds under one's "mind's eye"

26 ZENON W. PYLYSHYN

is a function of what principles one believesgovern the events in question. In fact, mostpeople tend to get several of these exampleswrong. Clearly the laws of dynamics or op-tics and the principles of geometry that de-termine the relation, say, between the perim-eter and the area of a figure are not intrinsic(built in) to the representational media orto the functional mechanisms of the mind.Not only must one have tacit knowledge ofthem, but the way in which the imaginalevents unfold naturally can usually be influ-enced with considerable freedom simply byinforming the subject of the appropriateprinciple. Thus what seems to be a naturaland autonomous unfolding process is cog-nitively penetrable—that is, it is under thecontrol of an intellectual process, with allthat this implies concerning the interventionof inferences and "reasoning through." AsHarman (1973) has argued, our intuitionsconcerning when there are or are not infer-ences taking place must give way before thelogical necessity to posit such processes. Themind, it seems, is faster than even themind's eye.

Another particularly intriguing demon-stration, by Ian Howard, shows that even inthe case of a simple task involving the rec-ognition of physically possible events, knowl-edge of physical principles is crucial—whichsuggests that in the case of imaging, suchknowledge would be even more indispensablein explaining why images undergo transfor-mations in certain systematic ways. Howard(1978) showed that over half the populationof undergraduate subjects he tested couldnot correctly recognize trick photographs oftilted pitchers containing colored fluids whosesurface orientations were artificially set atvarious anomalous angles relative to the hor-izontal. Using the method of random pre-sentation and repeating the study with bothstereoscopic photographs and motion pic-tures, Howard found that the subjects whofailed the recognition task, for levels as muchas 30° off horizontal, could nevertheless cor-rectly report that the fluid surface was notparallel to shelves visible in the background,thus showing that it was not a failure of per-ceptual discrimination. What was particu-larly noteworthy, however, was that post-experimental interviews, scored blindly by

two independent judges, revealed that everysubject who scored perfect on the recognitiontest (i.e., no stimulus with orientation morethan 5° off horizontal failed to be correctlyclassified as anomalous) could clearly artic-ulate the principle of fluid level invariance,whereas no subject who made errors gaveeven a hint of understanding the relevantprinciple. In this case, unlike what typicallyhappens in other areas such as phonology,evidence of the relevant knowledge was ob-tainable through direct interviews. Evenwhen such direct evidence is not available,however, indirect behavioral evidence thattacit knowledge is involved can frequentlybe obtained—for example by demonstratingcognitive penetrability of the phenomenonto new knowledge.

Once examples of this kind are presented,no one finds the claim that tacit knowledgeis required for some imaginings at all sur-prising. In fact, Kosslyn et al. (1979) admitthat both image formation and image trans-formation can be cognitively penetrable andhence not explainable by appealing to prop-erties of the imaginal medium. But giventhat these examples are not distinguishablefrom the earlier ones in terms of the appar-ent autonomy of their progression, why dowe continue to find it so compelling to viewphenomena such as those associated withmental scanning as providing evidence foran intrinsic property of the imaginal medium(or, as Kosslyn et al. put it, for the "spatialstructure of the surface display")? Is it be-cause processes such as mental scanning aremore resistant to voluntary control? Is itreally inconceivable that we could "searchfor" objects in a mental image without pass-ing through intermediate points, or that wecould compare two shapes in different ori-entations without necessarily first imaginingone of them being at each of a large numberof intermediate orientations?

I believe that what makes certain spatialoperations resistant to a natural interpreta-tion in terms of knowledge-governed reason-ing through is primarily the "objective pull"I discussed in Pylyshyn (1978), which resultsin the tendency to view the cognitive processin terms of properties of the represented ob-jects (i.e., the semantics of the representa-tion) instead of the structure of the repre-

THE IMAGERY DEBATE 27

sentation itself (i.e., the syntax of therepresentation). This tendency, however,leads to a way of stating the principles bywhich mental processes operate that deprivesthe principles of any explanatory value. Thisinvolves appealing to principles that are ex-pressed in terms of properties of the repre-sented object rather than in terms of thestructure or form of the representation itself.But expressing a principle in terms of prop-erties of the represented domain begs thequestion of why processing occurs this way.The mechanism has no access to the prop-erties of the represented domain except in-sofar as they are encoded in the form of therepresentation itself. Consequently, a prin-ciple of mental processing must be stated interms of the formal structural properties ofits representations, not in terms of what theyare taken to represent in the theory.

Consider the following example. In de-scribing their model, Kosslyn et al. (1979)take care to abstract the general principlesof operation of the model—a step that isessential if the model is to be explanatory.Such principles include, for example, "Men-tal images are transformed in small steps,so the images pass through intermediatestages of transformation" (p. 542), or "Thedegree of distortion will be proportional tothe size of the transformational step" (p.542). Shepard (1975) also cites such prin-ciples in discussing his image transformationresults.

In each case these principles only makesense if terms such as size or small stepsrefer to the represented domain. In otherwords, the intended interpretation of the firstof the above principles would have to besomething like the following: Representa-tions are transformed in such a way thatsuccessive representations correspond tosmall differences in the scene being depicted.However, what we need is a statement thatrefers only to the structure of the represen-tation and its accessing process. We need tobe able to say something like, "Represen-tations are transformed in small structuralsteps or degrees," where small is relative toa metric defined over the formal structureof the representation and process. For ex-ample, relative to a binary representation ofnumbers and a machine with a bit-shifting

operation, the formal (or syntactic) trans-formation from a representation of the num-ber 137 to a representation of the number274 is smaller than the transformation froma representation of 137 to a representationof 140 (since the former requires only oneshift operation), even though it clearly cor-responds to a larger (semantic) transfor-mation in the represented domain of abstractnumbers. It is thus important to distinguishbetween the intrinsic syntactic domain andthe extrinsic semantic domain when speak-ing (typically ambiguously) about transfor-mations of representations. Not only are thetwo domains logically distinct, but, as wehave seen, they involve two quite differentsimilarity metrics and their behavior is gov-erned by quite different principles.

Cognitive principles such as those invokedby Kosslyn et al. (1979), Shepard (1975),and Anderson (1978) would only be theo-retically substantive (i.e., explanatory) ifthey specified (a) how it was possible to haveformal operations that had the desiredsemantic generalization as their conse-quence—that is, how one could arrange aformal representation and operations uponit so that small steps in the formal repre-sentations corresponded to small steps in therepresented domain—and (b) why these par-ticular operations, rather than some otherones that could also accomplish the task,should be used (this is the issue of makingthe underlying theory principled, or restrict-ing its degrees of freedom by reducing thenumber of free parameters in it). Simplyasserting that representations do, as a matterof fact, have this property (because, for ex-ample, they are said to "depict" rather thanmerely "represent"; c.f. Kosslyn et al., 1979)is not enough. One reason that it is notenough is that such a property is simply stip-ulated in order to conform to the data athand: It is a free empirical parameter. An-other reason is that, as with our earlier cases,there are two distinct options available toaccount for how this can happen. They cor-respond to our options 1 and 2: We can ap-peal either to tacit knowledge or to intrinsicproperties of a representational medium. Inother words, we can make the account prin-cipled by relating the process either to therationality of the method adopted, given the

28 ZENON W. PYLYSHYN

organism's goals and tacit knowledge, or tothe causal relations that hold among phys-ical properties of the representational me-dium.

Before examining in greater detail theproposal that many imagery phenomena, in-cluding specifically those dealing with men-tal scanning, should be explained in termsof tacit knowledge, we need to touch on twoadditional points, since they frequentlymuddy the discussion. For this reason weshall make a brief digression.

Constraints of Habit and the ExecutiveRetreat

The first point is that we must distinguishhere, as we do in other areas of theory eval-uation, between what typically or frequentlyor habitually happens and what must happenbecause of some lawful regularity. Insofaras the origin of visual imagery no doubt liesin visual experience, and insofar as we fre-quently see things happen in certain ways,this could easily influence the way in whichwe typically imagine certain kinds of events.For example, since our visual experiences areprimarily with common, middle-sized ob-jects, certain typical speeds of acceleration,deceleration, and trajectory shapes are muchmore common than others. If this sort ofexperience did influence our tendency tomost frequently imagine movements in cer-tain ways, it would clearly not be somethingattributable to the nature of the mechanismor to properties of the representational me-dium.

Examples of habitual modes of processingdetermined by the nature of our experiencerather than by our fixed functional capacitiesare frequent in many areas of cognition. Forexample, when we learn to read, the teachermonitors our performance auditorily. Con-sequently, we first learn to read out loud andlater to suppress the actual sound. As a re-sult, many of us continue to read by con-verting written text into a phonetic formprior to further analysis. But there is goodreason to believe that this stage of processingis not a necessary one (e.g., Forster, 1976).In fact, there is a plausible view that thishabitual mode of processing is responsiblefor the slow reading speed that some of us

suffer from and that we can be trained toabandon.

Although knowing the habitual modes ofprocessing is useful if one is interested indescribing the typical, or in accounting forvariance, or in developing practical tools(say for education), providing an explana-tion of behavior requires that we understandthe nature of the underlying mechanism (ormedium). This, in turn, requires that we em-pirically establish how imaginal processingis constrained, or which of its functions areindependent of particular beliefs or goals(i.e., we must discover the cognitively im-penetrable properties of imaginal process-ing). Thus it becomes important to ask whichparticular characteristics (if any) the use ofimagery forces on us, rather than to reportthe strategies that tend to go along with theuse of imagery. For example, rather thanasking whether, in using imagery, subjectstypically take more time to locate (or oth-erwise focus their attention on) objects rep-resented as more distant or to report thepresence of features in an image that theydescribe as smaller, we should ask whethersubjects must do so whenever the imagemode of representation is being used. A va-riety of experimental findings (e.g., Kosslynet al, 1979, Bannon, Note 1; Spoehr &Williams, Note 2) have demonstrated that,left to their own devices, people habituallysolve certain kinds of problems (typicallyones involving metrical or geometrical prop-erties) by visualizing some physically pos-sible event taking place (e.g., they imaginethemselves witnessing the stimulus changingin certain characteristic ways). Yet no one,to my knowledge, has tried to set up an ex-perimental situation in which subjects werediscouraged from carrying out the task bythis habitual means to determine whetherthey were constrained to do so by some cog-nitive mechanism or medium. Later in thispaper I shall report several studies carriedout with this goal in mind.

The second point that needs clearing upconcerns the grain of truth in Anderson's(1978) claim that the form of representationcannot be determined unequivocally by ap-peal to behavioral data alone. Anderson'sargument is that for any model using someparticular form of representation, one can

THE IMAGERY DEBATE 29

always conjure up another behaviorally in-distinguishable model that uses a differentform of representation simply by makingcompensatory changes in the accessing pro-cess. As I tried to show (Pylyshyn, 1979b),this cannot be done in general without con-siderable loss in explanatory power. One ofthe considerations that led Anderson to theindeterminism view is the apparent unre-solvability of the so-called imaginal versusprepositional representation debate. Nosooner does one side produce what they taketo be a damaging result than the other sidefinds a way to compensate for this by ad-justing the process that accesses the repre-sentation while leaving fixed the assumedproperties of the representation itself. In myview, the lesson to be learned from this ob-servation is simply that in adjudicating sucha debate, as in determining the correct in-terpretation of any empirical phenomenon,one should not appeal solely to data (for evenadding neurophysiological and any otherclass of data does not solve the problem,since no finite amount of data alone can everuniquely determine a theory), but one shouldalso consider the explanatory power of themodel—that is, how well it captures impor-tant generalizations, how constrained it is(i.e., how many free parameters it has), howgeneral it is, and so on. Anderson's views onthis criterion notwithstanding, the issue isnot fraught with vagueness and subjectivity.Although it clearly is not simple to apply inpractice, the notion of explanatory power iscrucial to the conduct of scientific inquiry,inasmuch as we need to distinguish betweensuch predictive devices as curve fitting orstatistical extrapolations and genuine casesof law-like explanatory principles. I shallhave more to say about the issue of predictiveversus explanatory adequacy in the conclud-ing section.

These remarks are intended as an intro-duction to the most common rejoinder madeto arguments (such as those based on theinformal examples considered above, as wellas experimental observations of cognitivepenetrability) that we should attribute theproperties and behavior of images to tacitknowledge rather than to intrinsic propertiesof an imaginal medium. This rejoinder con-sists of the counterproposal that we retain

the analogue medium but simply modify theprocesses that generate, transform, and in-terpret the representation and thus enablethe analogical model to account for suchfindings. For example, my objections to cer-tain particular analogue models of imageprocessing, which are based on demonstra-tions that certain imaginal processes are cog-nitively penetrable, can often be sidesteppedby merely adding an additional layer of ex-ecutive process that varies the generationand transformation of images in response tocognitive factors. Sometimes this sort of ex-ecutive overlay can be made to produce thedesired behavior in an imagery model, butrarely can this be done without adding var-ious ad hoc contrivances and consequentlylosing explanatory power.

Consider, for the sake of a concrete ex-ample, the case of the phenomenon called"mental rotation." I claimed (Pylyshyn,1979a) that the operation of mentally ro-tating a whole image is not one of the func-tions that is instantiated by the knowledge-independent functional capacities of thebrain and hence should not be explained byappealing to properties of some analoguemedium. My conclusion in this case wasbased on the empirical finding that the slopeof the relationship between the relative ori-entations of two figures and the time it takesto carry out certain comparisons betweenthem (such as deciding whether they areidentical)—which is generally taken as thebehavioral measure of rate of mental rota-tion—depends on various cognitive factorssuch as figural complexity and the difficultyof the actual postrotation comparison task.In these studies the difficulty of the com-parison task was varied by requiring subjectsto decide whether one figure was embeddedin the other and then varying the "goodness"or gestalt value of the embedding (see Py-lyshyn, 1979a, for details). One counterar-gument to my conclusion suggested by Kos-slyn et al. (1979)—and one, incidentally,that I considered in my original paper—isthat these findings are compatible with aholistic analogue view because an executiveprocess might have determined, on the basisof some property of the stimulus or probefigure, what rate of rotation to use and setthis as the value of a parameter to the ro-

30 ZENON W. PYLYSHYN

tation function. A number of responsesmight be made to this suggestion.

First, although nothing in principle pre-vents one from making rotation rate a pa-rameter of the analogue, such a proposalweakens the explanatory power of the modelconsiderably, for any behavioral property,not just rate of rotation, could be made aparameter (including, for example, the formof the function relating reaction time toorientation). The more such parametersthere are, the more the model becomes anexercise in curve fitting. Because these pa-rameters are not constrained a priori, eachcontributes to the degrees of freedom avail-able for fitting the observed data and hencedetracts from the explanatory power of themodel. This is another way of saying thatunless we have some independent means oftheoretically assigning a rotation rate toeach stimulus, such a parametric feature ofthe model will be completely ad hoc. Thusthere is considerable incentive to try to ac-count for the comparison times in some prin-cipled way, either on the basis of some in-trinsic property of the representationalmedium or else in terms of some aggregatecharacteristic of the cognitive process itself(such as, for example, the number of basicoperations carried out in each condition).

It is this very consideration that leads meto agree with Kosslyn et al. (1979) when, indiscussing the relative merits of the analogueview of mental rotation—as opposed to thealternative prepositional account proposedby Anderson (1978), in which a parameterdescribing the orientation of a figure is in-crimentally recomputed—they state, "Thusthe question now becomes: Is incrementaltransformation an equally motivated as-sumption in both theories, or is it integralto one and added on as an afterthought inthe other?" (p. 545). From this point of view,the analogue proposal is clearly less ad hoc,since it posits a universal constraint that isassociated with the medium itself (not withthat phenomenon alone) and that is there-fore not a free empirical parameter. How-ever, it should be noted that this account isonly principled when it refers to the intrinsicanalogue medium model of rotation, not tothe symbol-structure (or matrix) view I dis-cussed earlier. In the latter case the princi-

ples (e.g., rotation proceeds by applicationof transformations that correspond to smalldistances) appeal to properties of the rep-resented domain, rather than to intrinsicproperties of the representation, precisely asdoes Anderson's ad hoc incrimental param-eter adjustment proposal. In both cases noprinciple based on some independently de-termined property of the representation orof the structure of the process is given forwhy this should be so. On the other hand,the trouble with the principled analogue viewis that it appears to be false as it stands, asI argued in Pylyshyn (1979a).

Despite the fact that there is strong in-centive to account for observed properties,such as rotation rate, in the principled wayssuggested above, it could still turn out thatthe best we can do at the present time is toappeal to something like a rotation rate pa-rameter. In fact, as I have argued (Pylyshyn,1980b), there will necessarily be some prim-itive functions that are themselves not ex-plainable in terms of symbol manipulationprocesses. These constitute what I called the"functional architecture" of the mind. Thereis no a priori reason why a one-argumentversion of the operator ROTATE(speed) can-not be such a process. In such a model, thespeed parameter would be viewed as beingadjusted by some physical means (e.g., adigital-to-analogue converter) on the basisof a cognitive analysis of the stimulus andthe subject's beliefs and goals. The questionof whether this is the correct story is ulti-mately an empirical one, just as was theoriginal question of whether ROTATE is aninstantiated analogue function (i.e., an in-strinsic property of the medium, or what Ihave called Alternative 2). What has to bedone in that case is to expose this new pro-posal to empirical tests such as those thatassess cognitive penetrability. Of course, asI pointed out, each such retreat from theoriginal holistic analogue hypothesis bringsus closer to Alternative 1 (the tacit knowl-edge explanation), as more of the determi-nants of the phenomena are put into the classof logical analyses and inferences.

Finally it should be pointed out that theparticular proposal for parametrizing rota-tion rate does not, in any case, apply to theexperimental results I reported. In these

THE IMAGERY DEBATE 31

studies the slope of the reaction time versusangle curve was shown to be a function notmerely of properties of the stimulus figureor the comparison figure but of the difficultyof the comparison task itself (i.e., the taskof deciding whether the probe was an embed-ded subfigure of the stimulus). The holisticanalogue model assumes that the compari-son phase can only be carried out after thestimulus figure has been rotated into theappropriate (independently determined) ori-entation (indeed, that is the very phase ofthe process that is responsible for the linearrelation between angle and time). In factKosslyn et al. (1979) appear to implicitlyaccept this particular order of events whenthey propose the alternative that "peoplemay choose in advance slower rates for'worse' probes" (p. 546). The trouble withthis alternative, however, is that we foundrotation rate to be not only a function of thenature of the stimulus and of the probe butalso of the relation between them, specifi-cally of how well the probe fits as an embed-ded part of the stimulus. Since this partic-ular feature of the comparison phase cannotbe known in advance of rotation, it could notpossibly be used as a basis for setting a rateparameter.

Now I have no doubt that one could comeup with some kind of executive process thatutilized a holistic analogue and yet exhibiteddifferent rates of apparent rotation for thedifferent conditions, as observed. Howeverthe fact that one could design such an ex-ecutive would itself be of little interest. Itwould require some strong independent mo-tivation for going to such lengths in orderto retain the analogue rotation components.As I suggested earlier, the main attractionof the analogue model is that it is both prin-cipled (i.e., it posits a universal property ofmind) and constrained. It is constrained be-cause it permits only one way to transforman image of a figure in one orientation intoan image of that figure in another orienta-tion, in contrast with the unlimited numberof ways in which an arbitrary symbol struc-ture can in principle be transformed. Thisconstraint would have constituted a powerfulexplanatory principle. But now as we locatemore and more of the explanatory burdenin the executive process, there remains less

and less reason to retain the ROTATE ana-logue operation, although as I stated above,we will always need to posit some knowl-edge-independent functional properties orcapacities (i.e., analogue in my sense).

Having thus outlined two general meth-odological considerations that need to bekept in mind when interpreting empiricalfindings bearing on the contrast between theintrinsic property of the medium view andthe tacit knowledge view, I am ready to con-sider the specific case of the mental scanningphenomena in some detail.

Tacit Knowledge and Mental Scanning

In examining what takes place in studiessuch as those discussed by Kosslyn et al.(1979), it is critical to note the differencebetween the following two tasks:

la. Solve a particular problem by usinga certain prescribed form of representation,or a certain medium or mechanism.

Ib. Attempt to recreate as accurately aspossible the sequence of perceptual eventsthat would occur if you were actually ob-serving a certain real event happening.

The reason this difference is critical is thatquite different criteria of success apply inthese two cases. For example, solving a prob-lem by using a certain representational for-mat does not entail that various incidentalproperties of a known situation even be con-sidered, let alone simulated. On the otherhand, this is precisely what is required ofsomeone solving Task Ib. In this case failureto duplicate such conditions as the speedwith which an event occurs would constitutea failure to carry out that task correctly.Take the case of imagining. The task ofimagining that something is the case, or ofconsidering an imagined situation in orderto answer questions about it, does not entail(as part of the specification of the task itself)that it take any particular length of time.On the other hand, the task of imaginingthat an event is actually happening beforeyour very eyes does entail, for a successfulrealization of this task, that you consider asmany as possible of the characteristics of theevent, even if they are irrelevant to the dis-crimination task itself, and that you attempt

32 ZENON W. PYLYSHYN

to place them into the correct time relation-ships.

For instance, in discussing how he imagedhis music, Mozart claimed (see Mozart's let-ter reproduced in Ghiselin, 1952), "Nor doI hear in my imagination, the parts succes-sively, but I hear them, as it were, all atonce" (p. 45). He felt that he could hear awhole symphony in his imagination all atonce and apprehend its structure and beauty.Clearly he had in mind a task that is bestdescribed in terms of 1 a. Even the word hear,taken in the sense of having an auditorylikeimaginal experience, need not entail any-thing about the duration of that experience.We can be reasonably sure that Mozart didnot intend the sense of imagining implied by1 b, simply because if what he claimed to bedoing was imagining witnessing the realevent of, say, sitting in the Odeon Conser-vatoire in Munich and hearing his Sym-phony Number 40 in G Minor being playedwith impeccable precision by the residentorchestra under the veteran Kapellmeister,and if he had been imagining that it wasactually happening before him in real timeand in complete detail—including the mi-nutest flourishes of the horns and the trillsof the flute and oboe, all in the correct tem-poral relations and durations—then he wouldhave taken very close to 22 minutes for thistask. If he had not taken that long to imagineit, this would only signify that he had notquite been doing what he had alleged, thatis, he had not been imagining witnessing theactual real event in which every note wasbeing played at its proper duration, or elsewe might conclude that what he had in factbeen imagining was not a good performanceof his symphony. In other words, if it takesn sec to witness a certain event, then an ac-curate mental simulation of the act of wit-nessing that same event should also take nsec, simply because how well the latter taskis performed is by definition dependent onhow accurately it mimics various propertiesof the former task. On the other hand, thesame need not apply merely to the act ofimagining that the event has a certain setof properties, that is, imagining a situationto be the case but without the added re-quirements as specified in the Ib version ofthe task. These are not empirical assertions

about how people imagine and think: Theyare simply claims about the existence of twodistinct natural interpretations of the spec-ification of a certain task.

Applying this to the particular case ofmental scanning, one must be careful to dis-tinguish between the following two tasks thatsubjects might set themselves:

2a. Using a mental image and focusingyour attention on a certain object in thatimage, decide as quickly as possible whethera second named object is present elsewherein that image.

2b. Imagine yourself in a certain real sit-uation in which you are viewing a certainscene and are focusing directly on some par-ticular object in that scene. Now imaginethat you are looking for (or scanning toward,or glancing up at, or seeing a speck movingacross the scene toward, etc.) a secondnamed object in the scene. When you suc-ceed in imagining yourself finding (andseeing) the object (or when you see the speckarrive at the object), press this button.

The relevant differences between Tasks 2aand 2b should be obvious. As in the previousexamples, the criteria of successful comple-tion of the task are different in the two cases.In particular, Task 2b includes, as part ofits specification, such requirements as thatsubjects should attempt to imagine variousintermediate states (corresponding to onesthat they believe would be passed throughin actually carrying out the correspondingreal task) and that they spend more timevisualizing those episodes that they believe(or infer) would take more time in the cor-responding real task. The latter conditionsare clearly not part of the specification ofTask 2a, as there is nothing about Task 2athat requires that such incidental featuresof the visual task be considered in answeringthe question. In the words of Newell andSimon (1972), the two tasks have quite dif-ferent "task demands."

To show that subjects are actually car-rying out Task 2b in the various studies re-ported by Kosslyn (and therefore that theproper explanation of these findings shouldappeal to subjects' tacit knowledge of thedepicted situation rather than to propertiesof their imaginal medium), I shall attemptto establish several independent points. First,

THE IMAGERY DEBATE 33

it is independently plausible that the meth-ods used in experiments reported in the lit-erature should be inviting subjects to carryout Task 2b rather than Task 2a. Second,the arguments against experimental demandeffects raised by Kosslyn et al. (1979) do notbear on the above proposal. Third, this al-ternative view has considerable generalityand can account for a variety of imaginalphenomena. And fourth, there is indepen-dent experimental evidence showing thatsubjects can indeed be led to carry out Task2a rather than Task 2b, and when they do,the increase in reaction time with increasein imagined distance disappears.

Task Demands of Scanning Experiments

With respect to the first point, all pub-lished studies that I am aware of in whichlarger image distances led to longer reactiontimes used instructions that quite explicitlyrequired subjects to imagine witnessing theoccurrence of a real physical event. In mostscanning experiments subjects are asked toimagine a spot moving from one point toanother, although in a few (e.g., in Kosslyn,1973; Kosslyn, Ball, & Reiser, 1978, Ex-periment 4) they were asked to imagineshifting their attention or their glance fromone imagined object to another in the sameimagined scene. In each case, what subjectswere required to imagine was a real physicalevent (since terms like move and shift referto physical processes) about the duration ofwhich they would clearly have some reason-able tacit knowledge. For example, theywould know implicitly that it takes a movingobject longer to move through a greater dis-tance, that it takes longer to shift one's at-tention through greater distances (bothtransversely and in depth), and so on. Al-though subjects may or may not be able tostate these regularities, they plainly do havethat tacit knowledge, as evidenced by thecritical precision necessary to make realisticmotion pictures by splicing pan and zoomsequences. (The exact time relationshipsneeded to make such sequences appear re-alistic, especially in the case of splicing to-gether takes of slower and more deliberatemovements of actors and of points of view,seem to depend on one's prior interpretation

of the actions. Hence the process involvedin detecting poor film editing, like the pro-cess of imagining realistic scenarios, wouldseem to be knowledge dependent and there-fore cognitively penetrable.)

The Arguments Against DemandCharacteristics

Kosslyn et al. (1979) appear to recognizesonic of the force of the tacit knowledge po-sition, but in responding to it they concernthemselves only with the possibility that"experimental demand characteristics," orunintentional influences due to the experi-mental setting and subjects' expectations,might have been responsible for the outcomeof the experiments. Although recent resultsby Richman, Mitchell, and Reznick (1979)and Mitchell and Richman (1980) indicatethat phenomena such as those found in men-tal scanning experiments can be broughtabout by experimental demand factors, ithas not yet been established that this is infact the correct explanation for all such re-sults. Kosslyn et al. have argued that it isunlikely that demand factors could explainall their results. On the other hand, neitherhave they provided any definitive controlstudies to rule out this alternative (the"pseudo-experiment" described by Kosslynet al. is inadequate in this respect, inasmuchas simply asking subjects what they expectis the best way to invite acquiescence effects,as opposed to genuine expectations or othertypes of demand biases).

However, whether the case for experi-mental demand effects will stand up to em-pirical tests or whether the Kosslyn et al.counterarguments are correct is not relevantto the present proposal. There is a majordifference between the contaminating ef-fects of experimental demands, or subjects'expectations of the outcome or their desireto please, and the entirely legitimate taskdemands, or requirements placed on the so-lution process by the specifications of thetask itself. In the latter case what is at issueis not a contamination of results but simplya case of subjects solving the task as theyinterpret it (or as they choose to interpretit, for one reason or another) by bringing tobear everything that they know about a class

34 ZENON W. PYLYSHYN

of physical events, which they take to be theones that they are to imagine witnessing. Ifthey take the task to be the one characterizedin Task 2b, then they will naturally attemptto reproduce a temporal sequence of repre-sentations corresponding to the sequencethey believe would arise from actually view-ing the event of scanning across a scene (orseeing a spot move across the scene). Thus,beginning with the representation corre-sponding to "imagining seeing the initialpoint of focus," the process would continueuntil a representation was arrived at whichcorresponded to "imagining seeing the namedpoint." Of course, according to this way ofviewing what is going on, there is no needto assume that the process halts as a resultof a certain imagined state's being reached,or when a certain visual predicate is satis-fied. It could just as plausibly stop whensome independent psychophysical mecha-nism had generated a time interval corre-sponding to an estimate of expected duration(we know such mechanisms exist, since sub-jects can generate time intervals correspond-ing to known magnitides with even greaterreliability than they can estimate them; c.f.Fraisse, 1963). In other words, it could justas easily be independently estimated timeintervals that drive the imagined statechanges.

For the purpose of this account of thescanning results, we need assume little ornothing about intrinsic constraints on theprocess or even about the content of the se-quence of representations that are gener-ated. Such a sequence could, for example,simply consist of a sequence of beliefs suchas that the spot is now here and now it isthere—where the locative demonstrativesare pointers into the symbolic representationbeing constructed and updated. Though thesequence is almost certainly more complexthan this, there is no need to assume that itis constrained by any special property of therepresentational medium, as opposed to sim-ply being governed by what subjects believeor infer about some likely intermediatestages of the event being imagined and aboutthe relative times at which they would occur.Now such beliefs and inferences could ob-viously depend on anything that the subjectmight tacitly know or believe concerning

what usually happens in the correspondingperceptual situations. Thus the sequencecould in one case depend on tacit knowledgeof the dynamics of physical objects, in an-other on tacit knowledge of some aspects ofeye movements or of what happens when onehas to glance up or refocus on a more distantobject, or even on tacit knowledge of howlong it takes to notice or to recognize certainkinds of visual patterns (e.g., it might eventake subjects longer to imagine trying to seesomething in dim light or against a cam-ouflage background for this reason). Thusnone of the examples and contrary evidencethat Kosslyn et al. (1979) cite against oneor another of the alternative "experimentaldemand" explanations is to the point here,since the exact domain of knowledge beingappealed to can vary from case to case, asis to be expected if imagining is viewed asa species of commonsense reasoning, as op-posed to a process that has access to a specialsort of representational medium with ex-traordinary functional properties (e.g., beingcharacterized by Euclidean axioms).

Sometimes experiments involving super-imposing images on actual visual stimulihave been cited against the demand char-acteristics view (e.g., Kosslyn et al., 1979).However, such experiments differ from stud-ies of imaginal thinking in several importantrespects that make them largely irrelevantto the present discussion. When a subject isinstructed to view a display and then toimagine a stationary or a moving patternsuperimposed on it (as in the studies byHays, 1973; Finke, 1979; Shulman, Rem-ington, & McLean, 1979; and those men-tioned in Shepard, 1978), there is no needto posit an internal medium of representa-tion to explain the stable, geometrical re-lationships that hold among features of theresulting construction. The perceived back-ground itself is all we need in this case. Forexample, when a subject thinks of an imag-inary spot as being here and then there (asin the discussion above), the locative termscan in this case be bound to places in a per-ceptual construction that are under directstimulus control and that are generally ve-ridical with respect to relative spatial loca-tions. This is essentially equivalent to bind-ing the internal symbols to the actual places

THE IMAGERY DEBATE 35

in the stimulus, which, being in the actualstimulus, will maintain their locations rela-tive to one another regardless of subjects'beliefs about space or about what they areviewing (assuming only that perception isfree from major time-varying distortions).In fact, Pylyshyn, Elcock, Marmor, andSander (1978) have developed a model ofhow indexical binding of internal symbolsto primitive perceptual features can be car-ried out within a limited-resource compu-tational system and how such bindings canbe used by the motor system to enable it to,say, point to the bound features.

Thus in such superposition cases if, forinstance, the subject imagines a spot movingfrom perceived location A to perceived lo-cation B, then all that is required to ensurethat the spot crosses some location C is (a)that the successive locations where the pointis imagined to be actually correspond to acertain path on the stimulus (i.e., that thesuccessive mental locatives in fact refer toa certain sequence of adjacent places on thestimulus) and (b) that place C actually beon that path, somewhere between A and B.In the pure imagery case, by contrast, thecorresponding notions of path, lying on, andbetween are not available in the same literalsense (i.e., there are only representations ofpaths). In other words, subjects must notonly imagine the spot to be moving with re-spect to an imagined background but theymust have tacit knowledge of such things asthat if C lies between A and B, then goingfrom A to B requires passing through C.Another way to put this is to say that thegeometrical properties of the layout that isbeing viewed (e.g., the relative locations offeatures in it) remain fixed because of theway the world being viewed is (in this case,rigid), and different geometrical character-istics of the layout can simply be "noticed"or "perceived" by the viewer, including therelative position of a place being attendedto (i.e., a place that is bound to an internallocative indexical symbol). On the otherhand, what remains fixed and what can benoticed in a purely constructed image de-pends either on intrinsic properties of somemedium of representation or on subjects'tacit knowledge about the behavior of thesorts of things they are imagining and their

ability to draw inferences from such knowl-edge—exactly the dichotomy we are exam-ining. This issue is closely connected withthe general problem of reasoning about ac-tions, which in artificial intelligence researchraises a technical problem called the "frameproblem." The relevance of such issues tothe imagery controversy is discussed in Py-lyshyn (1978, I980b).

The Generality of the Tacit KnowledgeView

With respect to the generality of expla-nations based on appeal to tacit knowledge,one could point to a variety of findings thatfall nicely within this explanatory frame-work. For instance, the list of illustrativeexamples presented in the last section showsclearly that in order to imagine the episodeof seeing certain physical events, one needsto have access to tacit knowledge aboutphysical regularities. In some of these casesone might even say that one needed an im-plicit theory, since a variety of related gen-eralizations must be brought to bear in orderto correctly predict what some imagined pro-cess would do (e.g., the sugar solution or thecolor filter case). In other cases simply theknowledge (or recollection) that certainthings typically happen in certain ways andthat they take certain relative amounts oftime will suffice.

Several of Kosslyn's findings, allegedlyrevealing properties of the "mind's eye,"might also be explainable on this basis—in-cluding the finding (Kosslyn, 1975) that ittakes longer to report properties of objectswhen the objects are imagined as beingsmall. Consider that the usual way to inspectan object is to take up a viewing position atsome convenient distance from the objectwhich depends on its size (and in certaincases on other things as well; e.g., considerimagining a deadly snake or a raging fire).So long as we have a reasonably good ideaof the object's true size we would imagineviewing it at the appropriate distance. Nowif someone instructed me to imagine someobject as especially small, I might perhapsthink of myself as being further away or asseeing it through, say, the wrong end of atelescope. In any case if I were then asked

36 ZENON W. PYLYSHYN

to do something, such as report some of itsproperties, and if the instructions were toimagine that I could actually see the prop-erty I was reporting (which was the case inthe experiments reported), or even if I simplychose to make that my task for some obscurereason, I would naturally try to imagine theoccurrence of some real sequence of eventsin which I went from seeing the object assmall to seeing it as big enough so I couldeasily discern certain details (i.e., I wouldvery likely take the instructions as meaningthat I should carry out Task Ib). In thatcase I would probably imagine somethingthat was in fact a plausible visual event, suchas a zooming-in sequence (and indeed thisis what many of Kosslyn's subjects re-ported). If that were the case then we wouldnaturally expect the time relations to be asactually observed.

Although the above story may sound quitea bit like the one Kosslyn (1975) himselfgives, there is one difference that is crucialfrom a theoretical standpoint. In this versionof the account, no appeals need to be madeto knowledge-independent functional prop-erties of a medium, and especially to prop-erties of a geometrical sort. The represen-tational medium, although it no doubt hassome relevant intrinsic properties that re-strict how things can be represented, playsno role in accounting for any of the partic-ular phenomena we have been examining.These phenomena are seen as arising from(a) subjects' tacit knowledge of how thingstypically happen in reality and (b) their abil-ity to carry out such psychophysical tasksas to generate time intervals correspondingto inferred durations of certain possiblephysical events. This is not to deny the im-portance of different forms of representa-tion, of the nature of such inferential ca-pacities as alluded to above, or of the natureof the underlying mechanisms. It is simplyto suggest that the particular findings wehave been discussing do not necessarily tellus anything about such matters.

Although we intuitively feel that the visualimage modality (or format, or medium) se-verely constrains both the form and the con-tent of potential representations, it is no easymatter to say exactly what these constraintsare (and the informal examples given earlier

should cast at least some suspicions on thevalidity of such intuitions in general). Itseems clear, for example, that we cannot im-age any arbitrary object whose properties wecan describe, and this does give credence tothe view that images are more constrainedthan descriptions. Although it is doubtlesslytrue that imagery is in some sense not asflexible as discursive symbol systems (suchas language), it is crucial to know the natureof this constraint before we can say whetherit is a constraint imposed by the medium ormerely a habitual way of doing things or ofinterpreting the task demands, or whetherit might even be a limitation attributable tothe absence of certain knowledge or a failureto draw certain inferences. Once again Iwould argue that we cannot say a prioriwhether certain constraints implicated in theuse of imagery ought to be attributed to thefunctional character of the biological me-dium of representation (the analogue view)or to the subject's possession and use (eithervoluntarily or habitually) of certain tacitknowledge.

Consider the following proposals made byKosslyn et al. (1979) concerning the natureof the constraints on imagery. The authorsclearly take such constraints to be given bythe intrinsic nature of the representationalmedium. They suggest that something theycall the "surface display" (a reference totheir cathode ray tube proto-model) givesimagery certain fixed characteristics. Forexample, they state,We predict that this component will not allow cognitivepenetration: that a person's knowledge, beliefs, inten-tions, and so on will not alter the spatial structure thatwe believe the display has. Thus we predict that a personcannot at will make his surface display four-dimen-sional, or non-Euclidean, (p. 549)

Now it does seem to be obviously true thatone cannot image a four-dimensional or non-Euclidean space. Yet the very oddness of thesupposition that we might be able to do soshould make us suspicious as to the reasonfor this.

To see why little can be concluded fromthis fact, consider the following. Suppose asubject insisted that he or she could imaginea non-Eulidean space. Suppose further thatmental scanning experiments were consis-tent with this claim (e.g., scan time con-

THE IMAGERY DEBATE 37

formed to, say, a city block metric). Wouldwe believe this subject or would we concludethat what the subject really did was tosimulate such properties in imagery by filling in thesurface display with patterns of a certain sort in thesame way that projections of non-Euclidean surfaces canbe depicted on two-dimensional Euclidean paper? (Kos-slynetal . , 1979 p. 547)

Of course we would conclude the latter. Butthe reason for doing so is exactly the reasonwe gave earlier for discounting one possibleinterpretation of what Mozart might havemeant when he claimed to be able to imaginea whole symphony instantaneously. Thatreason, you will recall, had to do entirelywith the implications of one particular senseof the phrase imagine a symphony—namelythat the Task 2b sense demands that certainconditions be fulfilled. If we transpose thisto the case of the spatial property of visualimagery, we can see that this is also the rea-son why the notion of imagining four-di-mensional space in the sense of Task 2b isincoherent. The point is sufficiently centralthat it merits a brief elaboration.

Let us first distinguish, as I have been in-sisting we should, the sense of imagining(call it imagine, X) that means to think ofX or to consider the hypothetical situationthat X is the case (or mentally construct asymbolic model or a mental description ofa possible world in which X is the case) fromthe sense of imagining (call this one imag-ine, X) that means to imagine that you areseeing X or to imagine yourself observing theactual event X happening. Then the reasonfor the inadmissibility of four-dimensionalor non-Euclidean imaginal space becomesclear, as does its irrelevance to the questionof what the properties of an imaginal me-dium are. The reason we cannot imagines

such spaces is that they are not the sorts ofthings that could be seen. Our inability toimagine,, such things has nothing to do withintrinsic properties of a surface display, butwith a lack of a certain sort of knowledge:We do not know what it would be like to seesuch a thing. We have no idea, for example,what kind of configuration of light and darkcontours there would have to be, what sortsof visual features would need to appear, andso on. Presumably congenitally color-blindpeople cannot imagines a colored scene for

similar reasons. In this case it would hardlyseem appropriate to attribute this failure tosomething's being wrong with their surfacedisplay. On the other hand, we do know, innonvisual (i.e., nonoptical) terms, what anon-Euclidean space is like, and we canimagine, there being such a space in reality(certainly Einstein did) and thus solve prob-lems about it. Perhaps, given sufficient fa-miliarity with the facts of such spaces, wecould even produce mental scanning resultsin conformity with non-Euclidean geome-tries. There have frequently been reports ofpeople who claimed to have an intuitivegrasp of four-dimensional space in the sensethat they could do such things as mentallyrotate a four-dimensional tesseract andimagines its three-dimensional projectionfrom a new four-dimensional orientation (forexample, Hinton, 1906, has an interestingdiscussion of what is involved). If this weretrue, then they might be able to do a four-dimensional version of the Shepard mentalrotation task.

Of course if we drop all this talk aboutthe geometry of the display and consider thegeneral point regarding the common con-ceptual constraints imposed on vision andimagery, there can be no argument: Some-thing is responsible for the way we cognizethe world. Whatever it is probably also ex-plains both the way we see it and the waywe image it. But that is as far as we can go.From this we can no more draw conclusionsabout the geometry, topology, or other struc-tural property of a representational mediumthan we can draw conclusions about thestructure of a language by considering thestructure of things that can be described inthat language. There is no reason to believethat the relation is anything but conven-tional—which is precisely what the formalist(or computational) version of functionalismclaims (see Fodor, 1980).

Incidentally, the distinction between thetwo senses of imagine discussed above alsoclarifies why various empirical findings in-volving imagery might tend to occur to-gether. For example, there is a brief reportin the authors' response section of Kosslynet al. (1979) of a study by Kosslyn, Joli-coeur, and Fliegel showing that when stimuliare sorted according to whether subjects

38 ZENON W. PYLYSHYN

tend to visualize them in reporting certainof their properties (i.e., whether subjects typ-ically imagines them in such tasks), then itis only those stimulus-property pairs that areclassified as mental image evokers that yieldthe characteristic reaction time functions inmental scanning experiments. But that ishardly surprising, since anything that leadscertain stimuli to be habitually processed inthe imagines mode will tend to exhibit allsorts of other characteristics associated withimagines processing—including the scanningtime results and such phenomena as the"visual angle of the mind's eye" or the re-lation between latency and imagined size ofobjects (see the summary in Kosslyn et al.,1979). Of course nobody knows which fea-tures of a stimulus or task tend to elicit theimagines habit or why some stimuli shoulddo so more than others, but that is not aproblem that distinguishes the analoguefrom the tacit knowledge views.

Some Empirical Evidence

Finally I shall consider some provisionalevidence suggesting that subjects can be in-duced to use their visual image to carry outa task such as 2a that does not entail imag-ining oneself seeing a natural sequence ofevents happening. Recall that the questionwas whether mental scanning effects (i.e.,the linear relation between time and dis-tance) should be viewed as evidence for anintrinsic property of a representational me-dium or as evidence for such things as whattacit knowledge (of geometry and dynamics)people have and what they take the task tobe. If the former were the correct interpre-tation, then it must not merely be the casethat people usually take more time for re-trieving information about more distant ob-jects in an imagined scene. That could arise,as we have already noted, merely from somehabitual or preferred way of imagining ora preferred interpretation of the task de-mands. If the phenomenon is due to an in-trinsic property of the imaginal medium,then it must be a necessary consequence ofusing this medium; that is, the linear (or atleast monotonic) relation between time andrepresented distance must hold whenever in-

formation is being accessed through themedium of imagery.

As it happens, there exists a strong pref-erence for interpreting tasks involving doingsomething imaginally as tasks of type 1 b—that is, as requiring one to imagines an actualphysically realizable event happening overtime. In most of the mental scanning cases,it is the event of moving one's attention fromplace to place or of witnessing somethingmoving between two points. It could also in-volve imagining such episodes as drawing orextrapolating a line and watching its pro-gression (which may be what was involved,for example, in the Spoehr & Williamsstudy, Note 2). But the question remains:Must a subject imagine such a physicallyrealizable event in order to access informa-tion from an image, or more precisely, inorder to produce an answer which the subjectclaims is based on examining the image?

A number of studies have been carried outin our laboratory which suggest that con-ditions can be set up so that a subject usesan image to access information, yet does sowithout having to imagine the occurrence ofsome particular real life temporal event (i.e.,the subject can be induced to imaginet ratherthan imagines). I will mention only two ofthese studies for purposes of illustration. Thedesign of the experiments follows very closelythat of experiments reported in Kosslyn,Ball, and Reiser (1978; see Bannon, Note1, for more details). Subjects had to mem-orize a map containing approximately sevenvisually distinct places (e.g., a church, a cas-tle, a beach) up to the criterion of being ableto reproduce it with the relative location ofplaces within 6 mm of the correct location.Then they were asked to image the map infront of them and to focus their attention ona particular named place, while keeping therest of the map in view in their mind's eye.We then investigated various conditions inwhich they were given different instructionsfor what to do next, all of which (a) em-phasized that the task was to be carried outexclusively by consulting their image and(b) required them to notice, on cue, a secondnamed place on the map and to make somediscriminatory response with respect to thatplace as quickly and as accurately as pos-sible.

THE IMAGERY DEBATE 39

So far this description of the method iscompatible with the Kosslyn et al. (1978)experiments. Indeed, when we instructedsubjects to imagine a speck moving from theplace of initial focus to the second namedplace, we obtained the same kind of stronglylinear relation between distance and reactiontime as did Kosslyn et al. When, however,the instructions specified merely that sub-jects should give the compass bearing of thesecond place—that is, to say whether thesecond place was N, NE, E, SE, and so forthof the first, there was no relation betweendistance and reaction time. (In this experi-ment subjects were first given practice in theuse of the compass direction responses andwere instructed to be as fast and accurateas possible within the resolution of the eightavailable categories. In postexperiment in-terviews, subjects reported that they carriedout the task by consulting their image, asthey had been instructed.)

This result suggests that it is possible toarrange a situation in which subjects usetheir images to retrieve information and yetdo not feel compelled to imagine the occur-rence of an event that would be describedas scanning their attention between the twopoints (i.e., to imagines). Although this re-sult was suggestive, it lacked controls for anumber of alternative explanations. In par-ticular, since a subject must in any case knowthe bearing of a second place on the mapbefore scanning to it (even in Kosslyn's ex-periments), one might wish to claim, for in-dependent reasons, that in this experimentthe relative bearing of pairs of points on themap was retrieved from a symbolic, as op-posed to imaginal, representation, in spiteof subjects' insistence that they did use theirimage in making their judgements. Althoughthis tends to weaken the imagery story some-what, since it allows a crucial spatial prop-erty to be represented off the display (andso raises the question, Why not representother spatial properties this way?) and be-cause it discounts subjects' reports of howthey were carrying out the task in this casewhile accepting such reports in other com-parable situations, it is nonetheless one pos-sible avenue of retreat.

Consequently, a second instructional con-dition was investigated, aimed at making it

more plausible that subjects had to consulttheir image in order to make the response,and to make it more compelling that theymust have been focused on the second placeand mentally seeing both the original andthe second place at the time of the response.The only change in the instructions that wasmade for this purpose was to explicitly re-quire subjects to focus on the second placeafter they heard its name (e.g., church) and,using it as the origin, give the orientation ofthe first place (the place initially focused on)relative to the second. Thus the instructionsstrongly emphasized the necessity of focus-ing on the second place and of actuallyseeing both places before making the ori-entation judgment. Subjects were not toldhow to get to the second place from the first,but only to keep the image before their"mind's eye" and to use this image to readoff the correct answer. In addition, for rea-sons to be mentioned shortly, the identicalexperiment was run (using a different groupof subjects) entirely in the visual modality,so instead of having to image the map, sub-jects could actually examine the map in frontof them. Eight subjects were run in the im-age condition and eight in the vision one.Each subject was given 84 trials, thus pro-viding four times for each of the 21 inter-point distances.

What we found was that in the visual con-dition, there was a significant correlationbetween response time (measured from thepresentation of the name of the secondplace) and the distance between places,whereas no such relation held in the imaginalcondition. In doing the analysis, distanceswere grouped into small, medium, and large,and a linear regression was carried out onthe grouped data. In the visual conditionthere was a significant correlation betweendistance and reaction time (r = .50, p < .05).In the imaginal condition there was no sig-nificant correlation (/• = —.03, ns). The meanreaction time in the visual condition was 2.60sec and in the imaginal condition was 2.90sec. Such results indicate quite clearly thateven though the linear relation between dis-tance and time (the scanning phenomenon)is a frequent concomitant of imaging a tran-sition between seeing two places on an im-age, it is not a necessary consequence of us-

40 ZENON W. PYLYSHYN

ing the visual imagery modality andconsequently that it is not due to an intrinsic(hence knowledge- and goal-independent)functional property of the representationalmedium for visual images.

Yet perhaps not surprisingly, results suchas these can be accommodated without toomuch trouble by the Kosslyn et al. model.That model has been conveniently providedwith the option of "blinking" its way to asecond location—or of regenerating a newimage from symbolic information. In thatcase it would clearly be able to respond infixed time, regardless of the distance be-tween places. Several remarks can be madeconcerning this alternative.

First, the existence of both scan and blinktransforms can be used simply to ensure thatno empirical data could falsify the assump-tion of an intrinsic medium of representa-tion. Whether or not this is the case dependson what, if any, additional contraints areplaced on the use of these transforms. Kos-slyn et al. (1979) do suggest that people willuse whichever transform is most efficient.Thus they ought to scan through short dis-tances but blink over longer ones. This, how-ever, presupposes that they know in advancehow far away they will have to move overthe image—and hence that distance infor-mation is available for arbitrary pairs ofplaces without consulting the image andwithout requiring scanning. Clearly this as-sumption is inconsistent with the originalassumption regarding how spatial informa-tion is accessed from images. We shall returnto this point briefly in the concluding section,when we consider where the predictive powerof such imagery models comes from.

Second, if the correct explanation for ourresults is that subjects used the blink trans-formation and hence generated new imagesinstead of using their initial ones to locatethe second place (as they were instructed todo, and as they reported having done), thenwe should be able to see the effect of thisin the overall response times. Since it tookour subjects 1 or 2 sec to generate the initialimage, it is very unlikely that they were re-generating a completely new image andmaking the required orientation judgmentin the 2.9 sec it took for them to respond.Perhaps they were only regenerating the two

critical places within the existing outline intheir image. But even that seems implausiblefor the following reason. The average reac-tion time to make orientation judgments inthe visual condition, where no image had tobe generated, was only 300 msec shorterthan the average time to make the judgmentin the imagery condition. This indicates thatif an image had to be regenerated in theimagery condition, as assumed by the blinktransformation explanation, it would havetaken less than 300 msec to regenerate suchan image. Since, according to Kosslyn,Reiser, Farrah, and Fliegel (Note 3), it usu-ally takes several seconds to generate evensimple images—and never less than 1 seceven for images containing only one simplepart—there is insufficient time to both re-generate parts of an image and make anorientation judgment in the total 2.9 sec ittook subjects to respond. Hence subjectscould not have been using a blink transfor-mation to regenerate their image in that case.

These experiments demonstrate that, atleast in the one situation investigated, im-ages can be examined without the putativeconstraints of the surface display postulatedby Kosslyn and others. It is also reasonableto expect that other systematic relations be-tween reaction time and image propertiesmay disappear when appropriate instruc-tions are given that are designed to encour-age subjects to interpret the task as in lainstead of Ib. For example, if subjects couldbe induced to generate what they consideredsmall but highly detailed and clear images,then the effect of image size on time to re-port the presence of features (e.g., Kosslyn,1975) might disappear as well. There is evensome evidence that this might be the casefrom one of Kosslyn's own studies. In oneof the studies reported in Kosslyn et al.(Note 3), the time to retrieve informationfrom images was found to be independentof the size of the image. From the descriptionof this experiment, it seems that a criticaldifference between it and the earlier ones(Kosslyn, 1975), in which an effect of imagesize was found, is that in this case subjectshad time to study the actual objects, withinstructions to practice generating equallyclear images of each of them, and were alsotested with these same instructions (which

THE IMAGERY DEBATE 41

I assume encouraged them to entertainequally detailed images at all sizes). Thusit seems that it is possible, when subjects areencouraged to have detailed informationreadily available, for subjects to put as finea grain of detail as they wish into their imag-inal constructions (though presumably thetotal amount of information in the image isstill limited along some dimension, even ifnot the dimension of resolution). Unlike thecase of real vision, however, such imaginalvision need not be limited by problems ofgrain or resolution or any other difficultyassociated with making visual discrimina-tions. Of course, as we have already noted,subjects can exhibit some of the behavioralcharacteristics associated with such limita-tions (e.g., taking longer to recall fine de-tails), but that may very well be because theyknow what real vision is like and are simu-lating the relevant behavior as best they can,rather than because of the intrinsic natureof the imaginal medium.

Conclusions: What Is the TheoreticalClaim?

It has often been said that imagery models(such as that of Kosslyn & Shwartz, 1977,or Shepard, 1975) contribute to scientificprogress because they make correct predic-tions and because they motivate further re-search. Although I would not want to denythis claim, it is important to ask what it isabout such imagery models that carries thepredictive force. It is my view that there isonly one empirical hypothesis responsible forthe predictive success of the whole range ofimagistic models and that nearly everythingelse about such models consists of free em-pirical parameters added ad hoc to accom-modate particular experimental results. Theone empirical hypothesis is just this: Whenpeople imagine a scene or an event, whatgoes on in their minds is in many ways sim-ilar to what goes on when they observe thecorresponding event actually happening.

It is to the credit of both Shepard (1978)and Paivio (1977) that they recognize thecentral contribution of the perceptual met-aphor. For example Shepard (1978) states,

Most basically, what I am arguing for here is the notionthat the internal process that represents the transfor-

mation of an external object, just as much as the internalprocess that represents the object itself, is in large partthe same whether the transformation, or the object, ismerely imagined or actually perceived, (p. 135)

Paivio (1977) has been even more direct inrecognizing and approving of the metaphor-ical nature of this class of models when heasserts,

The criteria for a psychological model should be whatthe mind can do, so why not begin with a psychologicalmetaphor in which we try to extend our present knowl-edge about perception and behavior to the inner worldof memory and thought. . . . The perceptual metaphor. . . holds the mirror up to nature and makes humancompetence itself the model of mind. (p. 71)

One difficulty with metaphorical expla-nation in general is that by leaving open thequestion of what the similarities are betweenthe primary and secondary objects of themetaphor, it remains flexible enough to en-compass most eventualities. Of course thisopen-endedness is also what gives metaphorstheir heuristic and motivational value andis what provides the feeling of having cap-tured a system of regularities. But in the caseof the perceptual metaphor for imagery, thissort of capturing of regularities is, to a largeextent, illusory, because it is parasitic uponour informal commonsense knowledge ofpsychology and our tacit knowledge of thenatural world. For example, I have arguedthat the reason I imagine things happeningmore or less the way that they actually dohappen in the world is not because my brainor my cognitive endowments are structuredto somehow correspond to nature but simplybecause I know how things generally hap-pen—because I have been told, or have in-duced, what some of the general principlesare. In other words I have a tacit physicaltheory which is good enough to predict mostordinary everyday natural events correctlymost of the time. Now the claim that ourimagery unfolds the same way as our per-ceptual process trades on this tacit knowl-edge in an even more insidious way, becauseit does so in the name of scientific expla-nation.

The story goes like this. The claim thatimagery is (in some ways) like perceptionhas predictive value because it enables us topredict that, say, it will take longer to men-tally scan longer distances, to report the vi-

42 ZENON W. PYLYSHYN

sual characteristics of smaller imagined ob-jects, to rotate images through larger angles,to mentally compare more similar images,and so on. It does this because we know thatthese generalizations hold in the correspond-ing visual cases. But notice that the reasonwe can make such predictions is not that wehave a corresponding theory of the visualcases. It is simply that our tacit common-sense knowledge is sufficiently accurate toprovide us with the correct expectations insuch cases. An accurate analogy would beto give, as a theory of Mary's behavior, thestatement that Mary is a lot like Susan,whom we know very well. This would enableus to perhaps make very accurate predictionsof Mary's behavior, but it would scarcelyqualify as an adequate explanation of whyshe behaves as she does. Another parallel,even closer in spirit to the metaphorical ex-planation of imagery, would be if we gaveas the explanation of why it takes longer torotate a real object through a greater anglethat this is the way we typically perceive ithappen, or if we explained why it takes moretime to visually compare two objects of sim-ilar size than two objects of very differentsizes by saying that this is what happens inthe mental comparison case. In both thesecases we would be able to make the correctpredictions as long as we were informallywell enough acquainted with the second ofeach of these pairs of situations. Further-more, in both cases there would be somenontrivial empirical claim involved. For in-stance, in these cases it would be the claimthat perception is generally veridical or thatwhat we see generally corresponds to whatwe know to be the case. Although these arereal empirical claims, no one takes them tohave the theoretical significance that is at-tributed to corresponding theories of im-agery, even though both may in fact havethe same underlying basis.

Of course some models of imagery appearto go beyond such mere metaphors. For ex-ample, Kosslyn and Shwartz (1977) actuallyhave a computer model of imagery that ac-counts for a very wide range of experimentalfindings. However, as we suggested in refer-ring to its use of the blink transformation,unless the model incorporated a greaternumber of principled constraints, it is much

too easy for it to accommodate any finding.It can do this because the model is, in fact,just a simulation of some largely common-sense ideas about what happens when weimage. That is why the principles it appealsto are invariably stated in terms of propertiesof the represented domain (e.g., a principlesuch as images must be transformed throughsmall angles clearly refers to what is beingrepresented, since images themselves do notactually have orientations, as opposed to rep-resenting them). Yet this is how we oftenexplain things in informal, everyday terms.We say that we imagine things in a certainway, because that is the way they really are.As I have already remarked, a theory of theunderlying process should account for howimagery can come to have this character, notuse this very property as an explanatoryprinciple.

Another consequence of the model's beinga simulation of commonsense views is thatanything that could be stated informally asa description of what happens in the mind'seye can easily and naturally be accommo-dated in the model. For example, in arguingagainst the expectation or demand expla-nation of their scanning results, Kosslyn etal. (1979) cite one subject who said that heor she thought objects close together wouldtake longer to image because it would beharder to see them or tell them apart. Thisis exactly the sort of process that could veryeasily be accommodated by the model. Allthat has to be done is to make the grain ofthe surface display whatever size is requiredto produce this effect. There is nothing toprevent this sort of tuning of the model tofit each situation. Such properties thereforehave the status of free parameters, and un-fortunately there is no limit to how manysuch parameters may be implicit in themodel. Similarly, in our experiment (inwhich subjects judged the compass bearingof one place relative to a second) we founda negative correlation for some of the sub-jects between reaction time and distancebetween focal points on the imagined map.The computer model would have little dif-ficulty accommodating that result if it turnedout to be a general finding. In fact it is hardto think of any result which could notbe naturally accommodated—including, as

THE IMAGERY DEBATE 43

Kosslyn et al. (1980) themselves suggest, thepossibility of representing non-Euclideanspace. What is crucial is not merely that suchresults could be accommodated, but that thiscould be done without violating any funda-mental design criteria of the model and with-out threatening any basic principle of itsoperation—without, for example, violatingany constraints imposed by the hypothesizedsurface display. What this amounts to is thatthe really crucial aspects of the model havethe status of free parameters rather thanstructural constants.

Now to some extent Kosslyn appears torecognize this flexibility in his model. In thelast section of Kosslyn et al. (1979), the au-thors insist that the model ought to be eval-uated on the basis of its heuristic value. Iagree with that proposal. The ad hoc qualitycharacteristic of early stages of some sci-entific modeling may well be unavoidable.However we should make every effort to berealistic and rid ourselves of all the attendingillusions. One of the illusions that goes withthis way of thinking about imagery is thatthere is an essential core of the model thatis not merely heuristic but is highly princi-pled and highly constrained. That core iscontained in the postulated properties of thesurface display (or in the "cathode ray tubeproto-model"). This immutable core involvesthe assumption that there is an internal dis-play medium with intrinsic geometrical (orgeometry-analogue) properties. For exam-ple, Pinker (1980) claims that the "arraystructure" captures a set of generalizationsabout images. But without knowing whichproperties of the array structure are doingthe work, and whether such properties canbe altered by changes in what the subjectbelieves, this sort of capturing of general-izations may be no better than merely listingthem. We need to know what it is about theintrinsic character of arrays that requiresthem to have the properties that Pinker sug-gests (e.g., "that they represent shape, size,and locations implicitly in an integral fash-ion, that they are bounded in size and grain,that they preserve interpoint distances" p.148). If there is nothing apart from stipu-lation that requires arrays to have theseproperties, then each of these properties isprecisely a free parameter.

For example, if the facts supported theconclusion that size, shape, and orientationwere naturally factored apart in one's mentalrepresentation (as I believe they do), or thatgrain size is nonhomogeneous and varies asa function of what the subject believes thereferent situation to be like (e.g., how brightlylit, how detailed in its design, how importantdifferent features are to the task at hand),does anybody believe for one moment thatthis would undermine the claim that an ar-ray structure was being used? Clearly allthat it would require is some minor adjust-ment in the system (e.g., making resolutiondepend on additional features of the image,allowing the cognitive process to access anorientation parameter in memory and so on).But if that is the case, then it is apparentthat claiming an array structure places noconstraints on the sorts of phenomena thatcan be accommodated—that is, that prop-erties of this structure are free empiricalparameters. Thus, although one may havethe impression that there is a highly con-straining core assumption that is crucial tothe predictive success of the model, the waythis impression is maintained is simply bygiving the rest of the system enough degreesof freedom to overcome any effort to em-pirically reject that core assumption. So,whereas the intuitive appeal of the systemcontinues to hang on the unsupported viewthat properties of imagery are determinedby the intrinsic properties of an internal dis-play medium, its predictive power may infact come entirely from a single empiricalhypothesis of imagery theory (viz., the per-ception metaphor).

The phenomenon of having the real appealof a theoretical system come from a simpli-fied (and strictly false) view of the systemwhile its predictions come from more com-plex (and more ad hoc) aspects is common-place in science. In fact, even the initial suc-cess of the Copernican world view might beattributable to such a characteristic. Coper-nicus published his epoch-making proposal,de Revolutionibus, in two volumes. The firstvolume showed how the solar-centered sys-tem could in principle elegantly handle cer-tain aspects of stellar and planetary motions(involving reverse movements) without thenecessity of such ad hoc devices as epicycles.

44 ZENON W. PYLYSHYN

In the second volume Copernicus worked outthe details of his system for the more com-prehensive case. That required reintroducingthe ad hoc mechanisms of epicycles. In fact,Copernicus's system only did away with thefive major Ptolomaic epicycles and retainedall the complexity associated with the largernumber of minor ones needed to make thetheory fit the observations. Of course in timehis system was vindicated because the dis-covery of the general principle of gavitationmade it possible to subsume the otherwisead hoc mechanisms under a universal lawand hence to remove that degree of freedomfrom the theory.

The lesson for imagery is clear enough.In order for a theory of imagery to be prin-cipled it is necessary to locate the knowl-edge-independent functional properties cor-rectly. We must be critical in laying afoundation of cognitively impenetrable func-tions to serve as the basic architecture of aformal model. The reason is not simply thatin this way we can get to the most primitivelevel of explanation: It is rather that we canonly get a principled and constrained (andtherefore not ad hoc) model if we first fixthose properties which are the basic func-tional capacities of the system. This does notmean, of course, that we must look to biologyto provide us with a solution (though we canuse help from all quarters), because the fixedfunctional capacities can be inferred behav-iorally and specified functionally, as they arewhen the architecture of computers is spec-ified. But it does mean that unless we setourselves the goal of establishing the correctfunctional architecture or medium in orderto properly constrain our models in the firstplace, we could well find ourselves in theposition of having as many free parametersas we have independent observations.

Reference Notes

1. Bannon, L. An investigation of image scanning. Un-published doctoral dissertation, Department of Psy-chology, University of Western Ontario, London,Canada, 1981.

2. Spoehr, K. T., & Williams, B. E. Retrieving distanceand location information from mental maps. Paperpresented at the 19th annual meeting of the Psycho-nomic Society, San Antonio, Texas, November 1978.

3. Kosslyn, S. M., Reiser, B. J., Farah, M., & Fliegel,

S. L. Generating visual images. Unpublished man-uscript (submitted for publication).

References

Anderson, J. R. Arguments concerning representationsfor mental imagery. Psychological Review, 1978, 85,249-277.

Arbib, M. The metaphorical brain. New York: Wiley,1972.

Attneave, F. How do you know? American Psycholo-gist, 1974, 29, 493-499.

Chomsky, N. Rules and representations. The Behav-ioral and Brain Sciences, 1980, 3(1), 1-62.

Finke, R. A. The functional equivalence of mental im-ages and errors of movement. Cognitive Psychology,1979, //, 235-264.

Fodor, J. A. The appeal to tacit knowledge in psycho-logical explanation. Journal of Philosophy, 1968, 65,627-640.

Fodor, J. A. The language of thought. New York:Thomas Y. Crowell, 1975.

Fodor, J. A. Methodological solipsism considered as aresearch strategy for cognitive psychology. The Be-havioral and Brain Sciences, 1980, 3 ( 1 ) , 63-110.

Forster, K. I. Accessing the mental lexicon. In R. J.Wales & E. Walker (Eds.), New approaches to lan-guage mechanisms. Amsterdam: North-Holland, 1976.

Fraisse, P. The psychology of time. New York: Harper& Row, 1963.

Ghiselin, B. The creative process. New York: NewAmerican Library, 1952.

Hays, J. R. On the function of visual imagery in ele-mentary mathematics. In W. G. Chase (Ed.), Visualinformation processing. New York: Academic Press,1973.

Harman, G. Thought. Princeton, N.J.: Princeton Uni-versity Press, 1973.

Haugeland, J. The nature and plausibility of cognitiv-ism. The Behavioral and Brain Sciences, 1978, 2,215-260.

Hebb, D. O. Concerning imagery. Psychological Re-view, 1968, 75, 466-477.

Hinton, C. H. The fourth dimension. London: GeorgeAllen & Unwin, 1906.

Howard, I. P. Recognition and knowledge of the water-level principle. Perception, 1978, 7, 151-160.

Kosslyn, S. M. Scanning visual images: Some structuralimplications. Perception & Psychophysics, 1973, 14,90-94.

Kosslyn, S. M. The information represented in visualimages. Cognitive Psychology, 1975, 7, 341-370.

Kosslyn, S. M., Ball, T. M., & Reiser, B. J. Visualimages preserve metric spatial information: Evidencefrom studies of image scanning. Journal of Experi-mental Psychology: Human Perception and Perfor-mance, 1978, 4, 46-60.

Kosslyn, S. M., & Shwartz, S. P. A data driven sim-ulation of visual imagery. Cognitive Science, 1977,/, 265-296.

Kosslyn, S. M., Pinker, S., Smith, G., & Shwartz,S. P. On the demystification of mental imagery. TheBehavioral and Brain Sciences, 1979, 2(3), 535-581.

Kosslyn, S. P., & Pomerantz, J. R. Imagery, proposi-

THE IMAGERY DEBATE 45

tions, and the form of internal representations. Cog-nitive Psychology, 1977, 9, 52-76.

Mitchell, D. B., & Richman, C. L. Confirmed reser-vations: Mental travel. Journal of Experimental Psy-chology: Human Perception and Performance, 1980,6, 58-66.

Newell, A., & Simon, H. A. Human problem solving.Englewood Cliffs, N.J.: Prentice-Hall, 1972.

Newell, A., & Simon, H. A. Computer science as em-pirical inquiry. Communications of the Associationfor Computing Machinery, 1976, 19, 113-126.

Paivio, A. U. Images, propositions and knowledge. InJ. M. Nicholas (Ed.), Images, perception, and knowl-edge. Dordrech Holland: Reidel, 1977.

Palmer, S. F. Fundamental aspects of cognitive repre-sentation. In E. H. Rosch & B. B. Lloyd (Eds.), Cog-nition and categorization. Hillsdale, N.J.: Erlbaum,1978.

Pinker, S. Explanations in theories of language and ofimagery. The Behavioral and Brain Sciences, 1980,3(\), 147-148.

Pylyshyn, Z. W. What the mind's eye tells the mind'sbrain: A critique of mental imagery. PsychologicalBulletin, 1973, 80, 1-24.

Pylyshyn, Z. W. Imagery and artificial intelligence. InW. Savage (Ed.), Perception and cognition: Issuesin the foundations of psychology. Minneapolis: Uni-versity of Minnesota Press, 1978.

Pylyshyn, Z. W. The rate of "mental rotation" of im-ages: A test of a holistic analogue hypothesis. Memory& Cognition, 1979, 7, 19-28. (a)

Pylyshyn, Z. W. Validating computational models: A

critique of Anderson's indeterminacy of represen-tation claim. Psychological Review, 1979, 86, 383-394. (b)

Pylyshyn, Z. W. Cognitive representation and the pro-cess-architecture distinction. The Behavioral andBrain Sciences, 1980, 3(1), 154-169. (a)

Pylyshyn, Z. W. Computation and cognition: Issues inthe foundations of cognitive science. The Behavioraland Brain Sciences, 1980, 3(1), 111-132. (b)

Pylyshyn, Z. W., Elcock, E. W., Marmor, M., Sander,P. Explorations in perceptual-motor spaces. In Pro-ceedings of the second international conference of theCanadian Society for Computational Studies of In-telligence. Toronto, Canada: University of Toronto,Department of Computer Science, 1978.

Richman, C. L., Mitchell, D. B., & Reznick, J. S. Men-tal travel: Some reservations. Journal of Experimen-tal Psychology: Human Perception and Performance,1979, 5, 13-18.

Shepard, R. N. Form, formation, and transformationof internal representations. In R. L. Solso (Ed.), In-formation processing in cognition: The Loyola Sym-posium. Hillsdale, N.J.: Erlbaum, 1975.

Shepard, R. N. The mental image. American Psychol-ogist, 1978, 33, 125-137.

Shulman, G. L., Remington, R. W., & McLean, J. P.Moving attention through visual space. Journal ofExperimental Psychology: Human Perception andPerformance, 1979, 15, 522-526.

Received October 11, 1979Revision received January 20, 1980 •