A model of mental imagery

Int. J. Man-Machine Studies (1985) 23, 313-334

A ,model of mental imagery

BRYANT A. JULSTROM AND ROBERT J. BARON

Department of Computer Science, The University of Iowa, Iowa City, Iowa 52242, U.S.A.

(Received 23 February 1985)

A spatial image is a representation of a scene which encodes the spatial location, distance away, surface orientation and movement of each visible surface in the scene. Mental images are spatial images which are held and transformed by a neural network called spatial memory. Spatial memory is a large two-dimensional array of processors called spatial locations which operate in parallel under the control of a single supervising processor. Though held in spatial memory, mental images are independent of it, and can be transformed, shifted and rotated by transforming and moving image parts among the spatial locations. The architecture and control structure of spatial memory are presented as are details of its operation in translating, scaling and rotating mental images of three-dimensional objects. Computer simulations of spatial memory are summarized, and spatial memory is compared with other models of mental imagery.

Introduction

Mental images are images in the mind which can be modified, transformed, and examined by the mind's eye, while mental imagery refers collectively to mental images and to the mental processes which manipulate and inspect them. Many researchers have investigated the physiology and performance of mental imagery (e.g. Block, 1981 ; Kosslyn, 1980; Kosslyn et al., 1979; Paivio, 1979); others have constructed models for its underlying processes (e.g. MacGregor and Lewis, 1979; Wooldridge, 1979). This paper presents a neural network model, called spatial memory, for the network which underlies mental imagery.

The images which spatial memory maintains and transforms are called mental images. They are essentially pictorial--geometrically similar to projections of the objects or visual scenes they represent. The components of each mental image include not only color and brightness information but also surface orientation, visual flow (movement) and depth (distance from viewer) for the visible surfaces in the image. Spatial memory is a two-dimensional, highly parallel array of processors, a design which is consistent with the computat ional demands of image processing and with the physiology of the visual cortex. A supervising processor, not presented here, controls the operation of spatial memory as it transforms the image. Transformations which this model can perform on an image include translation, scaling, and rotation in both two and three dimensions.

Using digitized images to represent the contents of spatial memory, a set of FORT- RAN programs has simulated the behaviour of this model and produced several of the accompanying illustrations. We compare spatial memory with three other models of mental imagery.

313

0020-7373/85/090313 + 22503.00/0 (~) 1985 Academic Press Inc. (London) Limited

314 8. A. JULSTROM A N D R. J. BARON

The spatial image

When an image of a scene is projected on a surface, the resulting image is a two- dimensional representation of the scene. If the surface is composed of receptors which encode the brightness of each part of the projected image, we call the pattern of activity generated by the receptors a visual image. If the visual image is augmented to include the spatial location of each object surface within it, the resulting representation is called a spatial image. Since this representation encodes information about the depth (distance away) of the visible surfaces, it is said to be 2(I/2)-dimensional. A visual image may also include other information derived from the visual scene such as movement (visual flow) and the orientations of visible surfaces in the scene.

The above discussion motivates the following definition. A spatial image is a 2(1/2)- dimensional representation of a visual scene in which each element includes four vectors: C, P, O and F. The vector C = (r, b, g) represents the colour and brightness of the corresponding field in the image. The magnitude of C, denoted ICI, represents the brightness of the field, while r, b, and g encode the magnitudes of the red, blue, and green components of the light received from the field. The vector P has three components which indicate the position of the encoded surface relative to a suitably chosen set of viewer-centred co-ordinates. The vector O indicates the orientation of the encoded surface and the vector F represents the visual flow of that surface. Figure 1 illustrates the representation components for a piece of a visible surface.

v

FIG. 1. The vectors P (position), O (surface orientation), and F (visual flow), which are components of each surface patch. The vector C encodes colour and brightness and is also part of each surface patch.

(Adapted from a figure in Schmidt, 1982.)

A MODEL OF MENTAL IMAGERY 315

When a digitizing process divides a spatial image into discrete segments, the representation generated for each segment is called a surface patch. Thus, a digitized spatial image consists of an array of surface patches which include the vectors C, P, O and F for the corresponding fields in the visual scene. The distribution of surface patches in a spatial image may be uniform but need not be so (Fig. 2).

U

D

FIG. 2. A non-uniform distribution of surface patches on a visual field. This arrangement provides finer resolution in the centre of the field than towards its edges.

The structure of spatial memory

Spatial memory is a uniform two-dimensional array of processing elements called spatial locations. Each spatial location has its own local memory and computational circuitry, and each spatial location holds and manipulates a surface patch. When spatial memory processes the representation of an object, a spatial location which does not correspond to part of the object holds a default "background" surface patch.

Mental images are images held in spatial memory. Mental images are iconic: each is geometrically similar to a projection of the object it represents.

Each spatial location is labelled with its geometric co-ordinates on the two- dimensional surface which makes up spatial memory and each spatial location knows its own co-ordinates. Each spatial location can send its current surface patch and other information to all other locations within a fixed radius of communication. In con- sequence, each spatial location can receive information from all other spatial locations within that radius.

The pathways from surrounding locations into each spatial location terminate in concentric rings of communications terminals. The inputs from a neighbouring location terminate in the ring of terminals proportional to its distance away, and in the particular terminal most in line with the neighbouring location. Figure 3 illustrates the input connections to one spatial location.

The terminals in each spatial location determine the source of incoming information; each spatial location can select, through its terminals, neighbouring spatial locations

316 B . A . J U L S T R O M A N D R. J. B A R O N !

FIG. 3. T h e input connections to one spatial location. Within the concentric rings are the location's communications terminals, and each terminal receives connections from spatial locations whose directions

are in line with the terminal and whose distances are proportional to the terminal's ring.

as sources from which to accept information. Since surface patches can be transmitted over these communication links, spatial memory can alter and transform the image which it holds.

Two communication pathways link each spatial location with permanent visual memory and with other parts of the visual system, which include permanent visual storage and networks which generate spatial images from sensory inputs. The pathways

Lower- level visuol networks

v isu ol verbol - visuol memory networks

Spotiol I Jj ~ memory

FIG. 4. The data paths between spatial memory and other components of the visual system.

form two channels over which new images may be sent to spatial memory and images resident in spatial memory may be sent to permanent memory. Figure 4 illustrates the data paths between spatial memory and other components" of the visual system.

We will not describe either permanent visual memory or the networks which generate spatial images from input sensory data. Baron (1985) describes these processes in detail.

A M O D E L O F M E N T A L I M A G E R Y 317

The control of spatial memory

A single supervising processor controls the operation of spatial memory. The supervising processor informs the spatial locations of the transformation they must perform; it initiates the steps of the transformation, and it directs exchanges of data with permanent visual memory. Communications pathways connect the supervising processor directly to each spatial location. The supervising processor always broadcasts control information to all spatial locations simultaneously; it does not select any particular locations to receive messages or to be excluded from receiving messages.

The supervising processor broadcasts to the spatial locations two types of messages, instructions and descriptions. An instruction directs all spatial locations to perform, in unison, one of seven actions. Two of these instructions, "Accept Image" and "Transmit Image", mediate the exchange of image representations (spatial images) between spatial memory and permanent visual memory. The "Accept Image" instruction causes spatial memory to replace its current spatial image with one from permanent visual memory or from the image-generating parts of the visual system. In response, each spatial location accepts a new surface patch, part of the new spatial image, to replace its current surface patch. Any spatial location which does not receive a surface patch following the "'Accept Image" instruction holds the default background patch. On receiving the "Transmit Image" instruction, each spatial location transmits its current surface patch to permanent memory. This "reading" of spatial memory is non-destruc- tive; the spatial image persists in spatial memory and may be further manipulated.

The remaining five instructions and all the descriptions deal with manipulating the image currently held in spatial memory. The first of these instructions, "'Prepare to Transform", notifies the spatial locations that they will be performing a transformation. It also specifies the type of that transformation: translation, scaling or rotation in two or three dimensions. The "Compute Destination'" instruction tells each spatial location to determine which spatial location should receive its current surface patch (transformed according to the pending transformation). The "Broadcast Destination" instruction causes each location to broadcast an indication of which location will receive its transformed surface patch, and also instructs the spatial locations to analyse the broadcasts of other spatial locations and determine from which one it will receive information. The "Transform Patch" instruction tells each location to transform its surface patch according to the pending transformation, and the "Broadcast Patch" instruction commands each location to broadcast its transformed surface patch and to receive a new surface patch from the location determined in response to the "'Broadcast Destination" instruction.

The four descriptions give the parameters of the transformations spatial memory will perform on the spatial image it holds. The description of a translation contains two parameters, the horizontal and vertical components of the shift. The description of a scaling transformation contains three parameters. Two specify the centre of the scale change on spatial memory and the third indicates the magnitude of the change. Three parameters also describe a two-dimensional rotation. The first two indicate the centre of the rotation and the third indicates its magnitude. The description of a three-dimensional rotation contains seven parameters. Three give the axis of rotation in the 2(1/2)-D co-ordinates of the spatial image, the fourth, fifth and sixth components describe a point through which the axis passes, in those same co-ordinates and the

318 B. A. JULSTROM AND R. J. BARON

a

FIG. 5. The parameters of the description of a three-dimensional rotation. They include the axis a of rotation and a point p on that axis in the 2(1/2)-dimensional co-ordinates of spatial memory, and the magnitude 0

of the rotation.

last component gives the magnitude of the rotation. Figure 5 illustrates the parameters of this description.

These seven instructions and four descriptions suffice for spatial memory to translate, scale and rotate in two dimensions and to rotate in three dimensions, the spatial image of an object.

Transforming the image held in spatial memory

If spatial memory encodes an up-to-date spatial image of an object and the object is moved in space--shif ted relative to the viewer or rotated about an axis--the representations of the visible surfaces of the object, the surface patches, must shift among the spatial locations of the spatial memory. In addition, the vectors making up each surface patch must be modified depending on the transformation. For example, if the image rotates in the image plane, all surface orientation and visual flow vectors must undergo the same rotation (Fig. 6).

We define a mental transformation to be a transformation performed by spatial memory directly on the representation it holds. In a mental transformation, spatial memory modifies the spatial image of a visible object so that the representation is the same as if a new image were created after a physical modification of the visible object.


~ 0 -ie ,

/

FIG. 6. Transforming surface patches when spatial memory rotates an image. The patches are moved corresponding to the rotation, and the vectors F (visual flow) and O (surface orientation) in the patches

undergo a similar rotation.

For example, a mental rotation of an object's spatial image is a transformation which directly modifies the image as if the object itself were rotated about some axis in space. Each visible surface element on the object is represented by the surface patch at one spatial location before the transformation and generally at a different spatial location after the transformation. For a particular surface patch, the spatial location which holds it before a transformation is its source location, and the spatial location which holds it after a transformation is its destination location.

Mental transformations in spatial memory proceed in several steps which are gov- erned by a sequence of messages broadcast by the supervising processor to the spatial locations. The supervisor begins a mental transformation by broadcasting to all locations the "Prepare to Transform" instruction, which tells them that a transformation will be performed. The supervisor follows this instruction with a description of the transformation. The description informs each spatial location of the parameters of the transformation to be performed.

Next, the superior transmits the "Compute Destination" instruction. Upon receiving this instruction, each spatial location computes the destination of its current surface patch under the transformation whose description it has just received. The supervisor next transmits the "Broadcast Displacements" instruction, in response to which each spatial location broadcasts the relative position of its destination to all spatial locations within its communications radius.

As Fig. 3 illustrated, input connections from neighboring locations lead to characteris- tic terminals within each spatial location. These connections carry the relative displacement announcements. Within a spatial location, each communications terminal connects to specific other locations whose direction and distance correspond to the terminal's position in its own spatial location. Each terminal knows the displacements of the spatial locations to which it connects. When a spatial location receives displacement announcements, each terminal compares the broadcast displacement information

320 B. A. JULSTROM A N D R. J. BARON

which it receives with the relative displacement between its own spatial location and the locations to which it connects. If the displacement announcement and the connection source match, the terminal establishes a communications link from the broadcasting source location into its own loca t ionbi t prepares to receive a message from that source location. For example, if one terminal receives a connection from the spatial location "two units up and four to the right", it will only accept information preceded by the announcement "two down and four to the left" (Fig. 7 illustrates this example).

FIG. 7. Communica t ions lines into a spatial location's terminals from three neighbouring locations. A terminal will establish a communicat ions channel only if the displacement announcement it receives matches the displacement of the spatial location from which the announcement comes. Here, if all the locations broadcast the announcement "two down and four to the left", the terminals of spatial location a will establish

a channel only to location c, and not to either locations b or d.

When a terminal does not recognize the displacement announcement it receives, it does not establish a communications link; the terminal will not accept the message. Thus, in response to the "Broadcast Displacement" instruction, the spatial locations prepare the communications pattern necessary to carry out the transformation.

Next, the supervisor transmits the "Transform Contents" instruction. Upon receiving this instruction, each spatial location transforms its current surface patch corresponding to the specified image transformation. For example, if the image is to be rotated in the image plane, each surface orientation and visual flow vector in the surface patches must be similarly rotated. In general, these transformations are similar to those which compute the destination locations, though they are often simpler. Thus, they can be performed by the same circuitry in each spatial location. If the origin of the illumination in the represented scene is known, the brightness values can also be appropriately transformed: we omit those computations here.

Finally, initiated by the "Broadcast Patch" instruction from the supervising processor, all spatial locations broadcast their transformed surface patches. All these messages are received simultaneously, and each spatial location now holds a surface patch in the transformed image. Because the terminals have established a communications pattern corresponding to the transformation, it follows that each part of the image moves from its current spatial location to its appropriate destination location. Any spatial location which is not the destination for any surface patch replaces its current


surface patch with the default background patch. The entire spatial image is transformed.

Thus, this sequence of commands from the supervising processor controls every mental transformation in spatial memory:

"Prepare to Transform" "Description of the Transformation" "Compute Destinations" "Broadcast Displacements" "Transform Contents" "Broadcast Patch"

Note that using these commands, spatial memory can shift the spatial image of an object within spatial memory without fundamentally altering that image. Thus, the representation of an object is independent of its location in spatial memory.

Translation, scaling and rotation in the image plane

The sequence of messages described above and issued by the supervising processor initiates and controls all transformations of an iconic image held in spatial memory. The "Prepare to Transform" instruction tells the spatial locations the kind of transformation they will implement, and the description that follows immediately provides the parameters of the transformation. Within the spatial locations different transformations are distinguished in two ways, (i) by the computations the spatial locations perform to determine the destination locations, and (ii) by the computations used to transform the surface patches.

The simplest transformation performed by spatial memory is translation, a rigid shift of the image within spatial memory. The description of a translation consists of its horizontal and vertical displacements in the co-ordinates of spatial memory, and these values are the displacement values broadcast by the spatial locations. No further computations are necessary, and the components of the surface patches are left unchanged. Figure 8 shows an image shifted down and to the right; if each pixel in the figure corresponds to a spatial location, the description (10, 10) represents this transformation. The uniformly grey pixels in Fig. 8 have been left empty by the translation; they represent spatial locations which did not receive surface patches under the translation and now hold the background patch.

In a scaling transformation, the image in spatial memory is shrunk or expanded about a given spatial location and the spatial locations must compute the destinations for their surface patches. The description of a scaling transformation contains the centre of the transformation (xo Yc) and the scaling factor f. The scaling factor deter- mines how much the image shrinks or expands, and the surface patch held in a spatial location (Xo, Yo) has as its destination the location (x, y), where

x =f(Xo - xc) + xc = Xof+ xc(1 - f ) ,

and

Y =f(Yo -Yc)+Yc =Yof+Y~( 1 - f ) .

The geometry of Fig. 9 justifies these formulas.

322 B. A. J U L S T R O M A N D R. J. B A R O N

a la

FIG. 8. (a) An image held in spatial memory; (b) That image subjected to a translation in spatial memory. The shift of the image down and to the right has left some spatial locations empty--they now hold the

background patch.

J

FIG. 9. The geometry of the scaling transformation. As the image expands (or shrinks) by a factor of f around the surface patch at spatial location (x~, Yc), the surface patch at (Xo, yo) moves to (x, y). Thus

x-x~=f(xo-x~) and y-y~=f(yo-y~).

In each sur face patch, if the objec t is e x p a n d i n g or sh r ink ing but not o therwise moving, the va lues o f the visual flow vec to r F mus t be mu l t i p l i ed by the scal ing fac tor f, bu t the sur face or ien ta t ions remain unchanged . F igure 10 shows a scal ing t rans forma- t ion which e x p a n d s the image. In this example , every loca t ion is a d e s t i n a t i o n - - n o n e holds the b a c k g r o u n d patch.

The desc r ip t ion of a ro ta t ion in the image p l ane specifies the centre of the ro ta t ion (x~, Yc) and its magn i tude 0. Unde r such a t r ans fo rmat ion , the des t ina t ion loca t ion for


a b

FIG. 10. A scaling transformation which expands an image about its centre.

the surface patch held in the spatial location at (Xo, Yo) is (x, y) , where:

and

x = ( xo - xc ) cos O - ( yo - y~ ) sin O + x~,

y = (xo - x c ) sin O+(yo - Y c ) cos O+y~.

In this case, the componen t s o f both the visual flow vector F and the surface orientat ion vector O which lie in the image plane must undergo a similar t ransformat ion, as seen in Fig. 6. Figure 11 shows a rotat ion in the image plane; spatial locations represented by many corner pixels now hold the background patch.

a b

FIG. 11. A two-dimensional rotation which rotates the image in the image plane. Here the image rotates about its centre.


Mental representation of three-dimensional rotation

Rotations in three dimensions can be computed in several ways. Many texts describe the use of three-by-three matrices to represent rotations (e.g. Minsky, 1955). When four-component homogeneous co-ordinates describe locations in three-space, four-by- four matrices represent both rotations and translations (Newman & Sproull, 1973; Paul, 1981). However, quaternions provide a more efficient representation of rotations (Hamilton, 1969; Pervin and Webb, 1982) and will be used here.

Quaternions are expressions of the form q = (a, b, c, d), where a, b, c and d are real numbers. The sum of two quaternions is computed component-by-component: if ql = (al, b~, cl, dr) and q2 = (a2, b2, c2, d2), then ql + q2 = (a~ + a2, b~ + b2, cl + c2, d~ + dE). If we let q = (a, b, c, d) = a + b i+ c j+ dk and define multiplication by

i 2 = j2 = k 2 = ijk = - 1

then quaternions can be multiplied as polynomials in i, j, and k: q~q2 : (r, s, t, u), where:

r = a l a 2 - b ib2- c2c2- did2

s = alb2+ bla2+ c ld2 - die2

t = a lc2 - bid2+ cla2+ dlb2

u = aid2+ blc2- clb2+ dla2

With the operations of addition and multiplication, the quaternions form a four- dimensional system which preserves all the familiar arithmetic properties of the real and complex numbers except multiplicative commutativity.

A quaternion whose real (first) term is zero can represent a three-dimensional vector under the usual identification of i, j and k with the unit vectors along the three co-ordinate axes. With this association, a rotation through an angle 0 about an axis a (through the origin) is represented by the quaternion R, where:

R = cos (0/2) + sin (0/2)a.

A location vector v rotates to R v R -~, where R -~ is the inverse of R under quaternion multiplication. I f the axis a passes through a point p, not necessarily the origin, v rotates to R ( v - p ) R -~ +p. (See Appendix A for a fuller discussion of quaternions and rotation.)

Each spatial location holds the two-dimensional co-ordinates and the relative depth values of its current surface patch. Thus spatial memory holds a 2(1/2)-D representation of the image's visible surfaces. Upon receiving the "Compute Displacements" command from the supervising processor, each spatial location employs quaternion operations to compute the 2(1/2)-D position in spatial memory which will encode the same surface patch after rotation. The first two co-ordinates of this position indicate the spatial location which is the destination of the surface patch. Since each spatial location knows its own co-ordinates, it can subtract them from the co-ordinates of the destination location to obtain the displacement over which it must send its image information, its surface patch, according to the rotation. This displacement is identical to the visual flow of the represented surface, assuming the object were actually rotating in space.

For example, suppose the surface patch held in the spatial location at (100, 100) has relative depth value 300; its 2(1/2)-D co-ordinates are then (100, 100, 300). The


image of which this patch is a part will undergo a rotation through five degrees about the axis (1, 1, 1), which passes through the point (200, 200, 250). The quaternion R represents the rotation, where:

R =cos (5/2)~ sin (5/2)~ 1, 1, 1)

= (0.999, 0.044, 0-044, 0.044).

The inverse of R under quaternion multiplication is

R -1 = (0.995, -0.044, -0.044, -0.044),

so the surface patch at (100, 100, 300) must move under the rotation to the 2(1/2)-D coordinates (113.6, 87-6, 298.9):

R[(0, 100, 100, 300)- (0, 200, 200, 250)]R-'+ (0, 200, 200, 250)

=(0, 113.6, 87.6, 298.9).

The relative displacement in spatial memory over which the surface patch must be transmitted is then:

(113.6-100, 87.6-100) = (13.6, -12.4)

This last pair is broadcast by the spatial location at (100, 100) in response to the "Broadcas t D i sp lacemen t s" instruction.

Figure 12 illustrates such a rotation. The arrows on the box indicate the destination locations for a selection of surface patches. The right half of the figure shows the image of the rotated box.

/ \ \

, , ,, I . . . . . , A , : I. k

4 _X . . . . . "" " Y

V - J o b c

FIG. 12. A three-dimensional mental rotation carried out in spatial memory. The rotation is accomplished by moving the parts of the image to their appropriate destinations.

Visual flow represented in spatial m e m o r y

Each surface patch held in spatial memory includes a vector F which represents the visual flow--the apparent movement--of the corresponding visible surface in the original scene. To transform the image according to this visual flow pattern, the spatial locations need only apply as relative displacements the components of the visual flow vectors which lie in the image plane.

The supervisor transmits no description for the visual flow transformation--none is needed. Spatial memory implements visual flow by responding to the supervising

326 B. A~ JULSTROM A N D R. J. BARON

!i!i !iiiiiiiiiiiTi!iiiiiiiii!iiiIiiiiiiiiiiiiiiiIiiiiiiiiiiiiiiiiIiiiiiiiiiiiiiiii

~i~i~,~i~i!i!i!iiiiiiiiiiiiiiiii~i~i~ ~ i!i~i~!~i~i!!~i~i'i~i~i~ ~ i~i~i~i,i,i~i~i~i!i,i,i~i~i~i,i~i ~ ~i!i!i!i!!~iiii!iiiiii~!iiiiii!iii!iiii~iiiii!iiiiii~iiiiiiii~-.

~ ;~ ~ ~ ~i ~ ~ii~iii~i i ii~iii~iii~iii i~Zii~i~ i ii~ i~ ~i ~iiiiii~i~ii~i~ ~iiii

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

. . . . . . . . . ??? ?:: ? ? ? ? ? ? ? ? ? i

a b

F1o. 13. The applicat ion of visual flow vectors to an image in spatial memory. (a) an image and the visual flow vectors for a sample o f surface patches in the image; (b) the image which results when spatial memory

applies the visual flow vectors.

processor's "Compute Displacement" instruction by applying the visual flow vectors of its current surface patches. Figure 13 shows an image with a sampling of its visual flow vectors and the image as transformed by applying the visual flow vectors.

The model does not describe how spatial memory transforms the surface patches of an image during the visual flow transformation. In contrast to the other transformations described here in which each spatial location can transform appropriately its surface patch by applying only the information in the transformation's description; in the visual flow transformation, the spatial locations must also have more global information to transform their surface patches correctly. Each spatial location might require information on the visual flow being imposed by its neighbours, or even knowledge of the object whose image is being transformed; consider the difference between a blimp inflating and an object exploding.

The computer simulation

The preceding sections have described how spatial memory performs five different transformations on the spatial image which it holds. A set of FORTRAN programs simulated the operation of spatial memory in these transformations and generated the illustrations used above.

In these programs, two two-dimensional arrays represented the contents of spatial memory. The arrays represented the surface patches before and after a transformation. Each array contained integer values from 0 to 255 which encoded grey-scale values of the digitized image. Each individual value stood for one surface patch-- the contents of one spatial location. The entire digitized image thus represented the contents of spatial memory.

In the transformations which spatial memory performs, the crucial step is the transfer of surface patches from the source spatial locations to the corresponding destination locations. Since each location, acting as a source, broadcasts its transformed contents to every location within its communication radius, each location, acting as a destination, must select its source and accept a surface patch from it. That is, in constructing the image which represents the contents of spatial memory after a transformation, each spatial location must know where to look in the original image for its new surface patch.

In the model, this connecting function is performed by the communication terminals in each spatial location. By recognizing displacement broadcasts, they establish the


communications pattern through which spatial memory carries out a transformation of the image which it holds. In the programs the action of the terminals is simulated by an array DISPL.

DISPL is best thought of as a two-dimensional array, each of whose elements contains two components. These elements hold the relative displacements computed for each pixel (spatial location) in the untransformed image. For each pixel in the transformed image, the subroutine APPLY (given in Appendix B) scans DISPL within a specified radius of that pixel's co-ordinates for the displacement that would send a surface patch closest to that pixel. This displacement determined the source in the original image for that pixel and corresponds to the selection of a source by the spatial location's terminals. For the transformations described in "Translation, scaling and rotation in the image plane", these displacements are computed directly from the formulas given in that part. For example, in a scaling by a factor f about a point (x~, y~), the displacement associated with the location at (x, y)---that is, the displacement over which it should send its surface patch---is (dx, dy), where:

and

dx=f ( x - xc )+ xc-x,

dy =f(y-yc)+ yc-y.

Rotation in three dimensions is slightly more difficult. Each surface patch must be rotated corresponding to a rotation in three-space. The three-dimensional rotation program applies the quaternion operations described in "Mental representation of three-dimensional rotation" and in Appendix A. The program then stores the two- dimensional displacements corresponding to the rotation in the array DISPL. The subroutine APPLY scans DISPL for each pixel in the transformed image to find that pixel's source location exactly as in the two-dimensional transformations.

The process is similar for the application of visual flow, but the representation of the initial image was extended to include an array of visual flow vectors. The horizontal and vertical components of these vectors become the relative displacements in DISPL, which the visual flow program scans as in the other transformation programs.

Discussion

Maps, diagrams and scale models are iconic representations of the structures or objects which they depict. Because spatial relationships are represented explicitly, manipulating the representation corresponds to manipulating the represented object, and valid conclusions can be drawn quickly about the modelled entity. This observation led Sloman (1971) to describe analogical representations in which "the structure of the representation gives information about the structure of what is represented". For example, a schematic diagram of a subway system, while distorting the relative geo- graphic locations of its stations, represents explicitly their order and direction the passenger can easily discover which train to board. Similarly, a family tree reveals at a glance that grandmother's brother is her grand-daughter's great-uncle.

A two-dimensional picture of a three-dimensional scene is, as Sloman observes, also an analogical representation. Spatial memory maintains such a picture; each spatial


location within the memory holds a surface patch, and these patches together form a spatial--and therefore analogical--representation.

Appropriate transformations in spatial memory then correspond to motions in the represented scene; conclusions about the scene can be drawn from examination of the transformed image. (N.B. the current model does not represent such examinations.)

Other models of mental transformations, in particular, of mental rotation, have employed analogical representations of the visual scene. Kosslyn and Schwartz (1977) suggest that mental images operate as "surrogate-precepts", which can be examined by the "mind's eye". They model this process by constructing, transforming and examining a "surface image", which is a digitized line drawing. These images are constructed from "deep representations"--lists of point locations in polar coordinates and lists of assertions about the represented object which form the model's data base. The resolution of a surface image is greatest at the center of a region, less detail being maintained toward its edges. Thus, an object representation is not independent of its location in the surface image. This contrasts with spatial memory, which maintains uniform resolution throughout, and on which the location of an object representation does not matter.

The model of Kosslyn and Shwartz is able to rotate an object in the image plane but cannot simulate three-dimensional rotation; the deep representations encode no depth information, only two-dimensional outlines.

Baker (1973) describes a "highly parallel" two-dimensional array of processing cells, each connected only to its neighbors, which, like the model of Kosslyn and Shwartz, encodes a digitized line drawing of a scene. For finer resolution each processor records the location of a point within its cell. Each cell in the array knows its own location, and under the supervision of an external computer the array can simulate translation, scaling and rotation of the represented object. Since each point in the image can be labelled with depth information, this model can simulate rotation in three dimensions, and by retaining the object-identity of each point, a transformation can be restricted to one, or a set of, objects.

Funt (1983) constructs a very different model to implement mental rotation, though it, too, employs an analogical image representation. He proposes a logical hollow sphere of processors, arranged by geodesic subdivision of an icosohedron (see Kenner, 1976), with each processor connected to its five or six immediate neighbours. The represented object lies logically inside the sphere and each processor maintains a representation of that portion of the object's surface which falls within the radial "cone" of the processor. Under the control of its supervisory processor, this model rotates the object representation around any axis by passing representation parts among the processors. This model performs rotations easily (and scalings trivially), but cannot undertake translations or represent more than one object at a time. Funt observed also that the three-dimensional logical arrangement of the processors on the sphere (their interconnections) does not prevent a two-dimensional physical arrangement of the processors.

The parallelism and two-dimensional processor arrangements of all these models reflect the physiology of visual areas in the brain. Hubel and Wiesel (1979) have described in detail the functional organization of the primary visual cortex, whose hundreds of millions of cells are functionally divided into hypercolumns. Each hyper- column contains (approximately) the same computational machinery, and the cells


within each respond to stimuli in a specific part of the visual field. The hypercolumns form an essentially two-dimensional structure, and most connections between cells in the visual cortex are vertical within the hypercolumns, few connections run horizontally.

With reference to our model, the physiology of spatial memory resembles that of the primary visual cortex. Spatial memory is a two-dimensional array of spatial locations, each of which performs computations related to a small portion of the visible image, and these locations share limited, and strictly arranged, connections with their neighbours. It is tempting to associate the spatial locations of spatial memory with the hypercolumns of the visual cortex.

Consider again the operation of spatial memory. Spatial memory transforms the spatial image of an object by performing a series of four major steps, which are initiated and controlled by the spatial memory's supervising processor. First, each spatial location computes the relative displacement over which it should transmit its part of the representation, and this information is broadcast to all the spatial locations within its radius of communication. Each spatial location analyses these displacement announcements and prepares to receive a surface patch from the appropriate source location. Each spatial location transforms the components of its current surface patch in accordance with the transformation. Finally, each spatial location broadcasts its transformed surface patch. The appropriate destinations receive the patches and the object image is transformed.

This mechanism can only implement transformations which do not expose previously hidden surfaces of the object. To perform transformations which reveal hidden surfaces, information about the object's shape must be applied to describe correctly the emerging surfaces. Transformations which only hide surfaces initially visible can be performed in spatial memory if a spatial location receiving two or more representation patches in a shift operation can select the one with the smallest "depth" component. When one part of the object image "passes behind" another, both will be sent to the same receiving spatial location. The receiving location must keep only the surface patch whose relative depth value is smaller.

This model performs all possible mental rotations in the same time period. Studies with human subjects indicate that the time required to perform a mental rotation varies linearly with the magnitude of the rotation (Shepard & Cooper, 1982; Shepard & Metzler, 1971). This model will exhibit such behaviour if the radius of communication in the spatial memory is small--perhaps only to immediate neighbours--and the supervising processor breaks rotations into uniform steps corresponding to that radius.

Conclusions

We have suggested a model for the representation of objects and visual scenes and for performing a restricted set of manipulations on those representations, corresponding to translation, scaling and rotations in the original scene. The model is highly parallel and maintains an analogical representation of the object or visual scene.

This model does not implement the identification of objects, the control of more complex transformations than those mentioned, or the integration of this system with others, particularly the verbal systems which can lend meaning to the viewed image. Baron (1985) discusses several of these issues, and they are a fertile field for further inquiry.


References

BAKER R. (1973). A spatially-oriented information processor which simulates the motion of rigid objects. Artificial Intelligence, 4, 29-40.

BARON, R. J. (1985). Visual memories and mental images. Internaional Journal of Man-Machine Studies, 23, 275-311.

BLOCK, N. ed. (1981). Imagery. Cambridge, Massachusetts: MIT Press. FUNT, B. V. (1983). A parallel-process model of mental rotation. Cognitive Science, 7, 67-93. HAMILTON, W. R. (1969). Elements of Ouaternions. Chelsea, New York. HUBEL, D. H. & WlESEL, T. N. (1979). Brain mechanisms of vision. Scientific American, 241,

150-162. KENNER, H. (1976). Geodesic Math and How to Use It. Berkeley, California: University of

California Press. KOSSLYN, S. M. (1980). Image and Mind. Cambridge, Massachusetts: Harvard University Press. KOSSLYN, S. M. & SHWARTZ, S. P. (1977). A simulation of visual imagery. Cognitive Science, 1,

265-295. KOSSLYN, S. M., PINKER, S., SMITH, G. & SCHWARTZ, S. P. (1979). On the demistification

of mental imagery. The Behavioral and Brain Sciences, 2, 535-581. MAcGREGoR, R. J. & LEWIS, E. R. (1977). Neural Modeling. New York: Plenum Press. MINSKY, L. (1955). An Introduction to Linear Algebra. Oxford: The Clarendon Press. NEWMAN, W. M. & SPROULL, R. F. (1973). Principles of Interactive Computer Graphics. New

York: McGraw-Hill. PAIVIO, A. (1979). Imagery and Verbal Processes. Hillsdale, New Jersey: Lawrence Erlbaum

Associates, Publishers. PAUL, R. P. (1981). Robot Manipulators. Cambridge, Massachusetts: MIT Press. PERVIN, E. & WEBB, J. A. (1982). Quaternions in Computer Vision and Robotics. Technical

Report CMU-CS-82-150. Department of Computer Science, Carnegie-Mellon University. SCHMIDT, R. A. (1982). Motor Control and Learning. Champaign, Illinois: Human Kinetics

Publishers. SHEPARD, R. N. & COOPER, L. A. (1982). Mental Images and Their Transformations. Cambridge,

Massachusetts: MIT Press. SHEPARD, R. N. & METZLER, J. (1971). Mental rotation of three-dimensional objects. Science,

171, 701-703. SLOMAN, A. (1971). Interactions between philosophy and artificial intelligence: the Role of

intuition and non-logical reasoning in intelligence. Artificial Intelligence, 2, 209-225. WOOLDRIDGE, D. E. (1979). Sensory Processing in the Brain: An Exercise in Neuroconnective

Modeling. New York: John Wiley and Sons.

Appendix A. Quaternions and rotation

Quaternions, invented and described by Hamil ton in the mid nineteenth century, form a four-dimensional algebraic system with addition and multiplication. This system preserves all properties of the real and complex numbers except the commutativity of multiplication.

a + b i + c j + d k

where a, b, c and d are real numbers. Quaternion addition proceeds in the obvious way, term-by-term, while the following relations govern quaternion multiplication:

i 2 = j 2 = k 2 = i j k = - 1 .

As consequences of these relations, ij = k, ji = - k , jk = i, kj = - i , ki = j, and ik = - j .


With these rules, we can multiply quaternions as polynomials in i, j and k. For example,

( 2 + 3 i - k ) ( 2 i + j ) = 4i+ 2j + 6 i + 3 i j - 2 k i - k j

= 4 i + 2 j - 6 + 3 k + 2 j + i

= - 6 + 5 i + 4 j + 3 k .

Quaternion multiplication does not commute, but quaternions preserve the other arithmetic properties of real and complex numbers. Indeed, the set of quaternions for which the coefficients o f i, j and k are zero is isomorphic to the reals, and the set o f quaternions formed of a real term and just one of i, j or k; for example, { a + c j: a, c real}, is isomorphic to the complex numbers.

The identity element of quaternion multiplication is 1 + 0i + 0j + Ok, and every non- zero quaternion q has a multiplicative inverse q- i such that qq-i and q-lq both equal the identity. The magnitude Iql of a quaternion q is, analogously to vectors, the square root of the sum of the squares of the four real coefficients in q:

la + b i+ c j+ dkl-- SQRT(a2+ b2+ c2+ d2),

and the inverse q- i of a quaternion q = a + bi + cj + dk is given by this relation:

q-' -- (1/Iql)( a - b i - c j - dk).

For example, if q = 2 + i - 2 k , then Iql =3, and

q- I = (1/3)(2 - i - 2k) = 2/3 - i/3 + 2k/3.

A quaternion whose real term is zero can represent a three-dimensional vector, under the usual association of i, j and k with unit vectors on the three co-ordinate axes. Interestingly, the quaternion product of two such vectors q and r is:

qr = - ( q . r)+ (qxr),

where q. r and q • r are the usual inner and cross vector products. As Pervin and Webb (1982) observe, however, "The greatest strength of quaternions

is their ability to represent rotations". A rotation through an angle 0 about an axis a which intersects the co-ordinate origin, is represented by the quaternion:

R = cos (0/2) + sin (0/2)a.

The new location of a vector v following the rotation is RvR -1, where R -1 is the inverse of R under quaternion multiplication (Fig. 14).

Thus, a rotation through 90 ~ about the z-axis is represented by:

R = cos 45 ~ + sin 45~ + 0i + 0j + k)

= 0.707 + 0.707k

The inverse of R is 0 .707-0 .707k, and the new location of the point (1, 1, 1) under the rotation R represents is:

RvR-1 = (0.707 + 0.707k)(i + j + k)(0-707 - 0.707k)

= - i + j + k .


R v R -1 5'

F

f FIG. 14. If the quatemion R represents a rotation through 0 about an axis �9 which passes through the

origin, the location v rotates to RvR -~.

~ V - p

/ FIG. 15. If the quatemion R represents a rotation through an angle 0 about an axis a which passes through

a point p, the location v rotates about a to R ( v - p ) R - t + p .

A M O D E L OF M E N T A L IM AGE R Y 333

For a point v to rotate to RvR -~, the axis o f the rotation which R represents must pass through the origin of the co-ordinate frame. I f the axis of rotation passes through some point p but not through the origin, we must add two simple steps to the process just described to find the new location of v. In particular, we move the origin to p, rotate and restore the origin to its initial posit ion (Fig. 15).

Let p indicate a point on the axis a of a rotation through an angle 0. The point v will rotate about a. The vector ( v - p) has its tail at p and its head at v, and so indicates the position of v relative to an origin at p. The axis a intersects this origin, so the rotated position of v relative to p is R ( v - p ) R -~. To restore the origin to its initial location, simply add p; v rotates to R ( v - p ) R -~ +p.

Our model of mental rotation applies precisely these calculations to find the destination location of a point on a rotating object.

Appendix B. The subroutine APPLY

In the FORTRAN programs which represent the operation of spatial memory, the subroutine APPLY simulates the actions of the spatial locations in spatial memory as they perform a transformation. The array PIC represents the initial image in spatial

FIG. 16. Construct ing the array NPIC, which represents the t ransformed image in spatial memory, f rom PIC, which represents the initial image. The array DISPL contains relative displacements which correspond to the displacement announcements broadcast by the spatial locations. For example, the entry in DISPL at location a is ( - 4 , - 2 ) , representing a displacement of two down and four to the left. For each element of NPIC, APPLY scans the circular region of DISPL around b 's position to find the source position a whose displacement would move a closest to b. APPLY then assigns the value of PIC at a to posit ion b in NPIC. If the scan for position e reveals that the displacement at position a represents a shift closest to e as well,

then position c in NPIC will also receive the value from position a in PIC.

334 a . A, J U L S T R O M A N D R. J. B A R O N

memory and the array NPIC represents the transformed image. The array DISPL contains pairs of values which represent the relative displacement announcements broadcast by the spatial locations.

For each element of NPIC--that is, for each location in spatial memorylAPPLY scans the entries of DISPL within a specified RADIUS of that element's position. The subroutine looks in DISPL for the displacement which would move the position in which it is found closest to the position of the element in NPIC. The corresponding value from PIC is then assigned to that element of NPIC. Figure 16 illustrates this procedure, which represents the actions of the terminals establishing communications channels and receiving surface patches over them.

C C C C C

C

C

C

240 250 C

280 290 300 C

SUBROUTINE APPLY(NPIC,PIC,DISPL,NX,NY,RADIUS) ******************************************************

* This subroutine applies the displacements in DISPL * * to PIC to generate NPIC. It checks all points * * around the current one within a distance of RADIUS.*

INTEGER*2 PIC(NX,NY),NPIC(NX,NY) INTEGER DISPL(2,NX,NY),RADIUS,RSQ,DSQ

RSQ=RADIUS*RADIUS

DO 300 J=I,NY JJBOT=MAX0(I,J-RADIUS) JJTOP=MINO(NY, J+RADIUS)

DO 290 I=I,NX IIBOT=MAX0(I,I-RADIUS) IITOP=MIN0(NX, I+RADIUS)

MINDSQ=I0000 DO 250 JJ=JJBOT,JJTOP JDSQ=(J-JJ)*(J-JJ) DO 240 II=IIBOT, IITOP IDSQ=(I-II)*(I-II) IF (IDSQ+JDSQ.GT.RSQ) GO TO 240 DSQ=(II+DISPL(1,1I,JJ)-I)**2 + (JJ+DISPL(2,II,JJ)-J)**2 IF (DSQ.GE.MINDSQ) GO TO 240 I KEEP= I I JKEEP=JJ IF (DSQ.LE.I) GO TO 280 MINDSQ=DSQ CONTINUE CONTINUE

IF (MINDSQ.LE.3) GO TO 280 NPIC(I,J)=I80 GO TO 290 NPIC(I,J)=PIC(IKEEP,JKEEP) CONTINUE CONTINUE

RETURN END

Documents

A model of mental imagery