Perception & Psychophysics1984. 36 (2), 97-103
The perception of three-dimensional structurefrom rigid and nonrigid motion
JAMES T. TODDBrandeis University, Waltham, Massachusetts
The ability of observers to perceive structure from motion was examined for rigid and non-rigid transformations under both parallel and polar projection. The accuracy of form perceptionwas evaluated by asking observers to discriminate among a series of computer-generated surf-aces that varied in curvature. The results demonstrated that the accuracy of an observer's judg-ments is unaffected by the rigidity of an object's motion or the type of projection with which itis presented. These results are discussed in relation to current algorithms for computing struc-ture from motion that have recently been described in the literature.
Thirty years ago, Wallach and O'Connell (1953)reported a phenomenon they called the kinetic deptheffect, which dramatically demonstrated the importanceof motion for the perception of three-dimensional form.The methodology of this demonstration involved ashadow projection technique developed previously byMiles (1931). The shadows were produced by placing aseries of objects on a turntable between a point lightsource and a translucent display screen. When an ob-server on the other side of the screen saw the shadowof a stationary object it always appeared as a two-dimensional pattern. However, in many instances whenan object was rotated on the turntable, its shadow wouldsuddenly jump out in depth. Almost all of the observersin these moving conditions spontaneously reported thatthe shadow appeared as a solid object rotating in three-dimensional space.1
During the past decade there have been numerous at-tempts to explain the kinetic depth effect by developingformal algorithms for computing the three-dimensionalstructure of an object from its two-dimensional patternof projected motion (e.g., Clocksin, 19-80; Koenderink &van Doorn, 1975,1977; Lee, 1974; Longuet-Higgens &Prazdny, 1980; Nakayama & Loomis, 1974; Prazdny,1980; Todd, 1981, 1982; Ullman, 1979; Webb &Aggarwal, 1981). The primary difficulty for thesemathematical analyses is to find a unique interpretationof an object's three-dimensional form. As pointed out byBishop George Berkeley almost 300 years ago, any givenstructure on a two-dimensional surface is projectivelyequivalent to an infinite number of possible structuresin three-dimensional space. This is also true for movingimages-that is, there are an infinite number of objecttransformations in three-dimensional space that couldproduce any given moving image on a two-dimensionaldisplay surface. One might suppose, on the basis of this
Requests for reprints should be sent to James T. Todd,Department of Psychology, Brandeis University, Waltham,MA 02254.
mathematical fact, that an observer's experience duringthe kinetic depth effect would be extremely variable.That is not what happens, however. In the situationdescribed by Wallach and O'Connell, observersinvariablyreport that the perceived object appears to be under-going a rigid transformation (i.e., one that does not alterthe object's three-dimensional size or shape). Thisfinding has led some researchers to argue that humanobservers have an inherent predisposition toward theperception of rigid motion in three-dimensional space(e.g., see Johansson, 1964).
The perception of rigid motion during the kineticdepth effect is of fundamental importance to the mathe-matical analyses that have been proposed in the litera-ture. Although a moving two-dimensional image mayhave an infinite number of possible three-dimensionalinterpretations, there is generally only one interpreta-tion in which the object would be undergoing a rigidtransformation. Thus, one possiblestrategy for analyzingmoving images is to assume the observed object is rigid,and then to construct a three-dimensional interpreta-tion that is consistent with that assumption. Thisstrategyhas been adopted by all of the mathematical analysesthat have been proposed to date.
It is a common belief among perceptual psychologiststhat a similar strategy is also used by the human visualsystem to analyze moying images. There are some prob-lems with this approach, however. For example, thereare a large number of easily recognizable styles ofchange, such as bending, stretching, twisting, or flowing,that do not preserve an object's rigidity. Research hasshown that observers can recognize the bending orstretching of a line of points (Jansson, 1977), the elasticcompression of a fishnet pattern (von Fieandt & Gibson,1959), or the bending of a rectangle in depth (Jansson& Johansson, 1973; Jansson & Runeson, 1977). Somevarieties of nonrigid change can be correctly identifiedas the movements of biological organisms. From therelative motions of an unrecognizable configuration ofelements, observers can recognize a human form that is
Copyright 1984 Psychonornic Society, Inc.
walking, jogging, or dancing (e.g., Cutting, Proffitt, &Kozlowski, 1978; Johansson, 1973, 1975, 1976; Todd,1983), the expressions of a human face (Bassili, 1978),or intentional social interactions such as affection oraggression (Bassili, 1976; Heider & Simmel, 1944).
Can analyses of motion based on the rigidity assump-tion still be considered psychologically valid in the faceof these observations? One way of allowing the percep-tion of nonrigid motion within the context of existingtheory has recently been suggested by Lee (1974), Todd(1982), and Ullman (1979). According to this view, thevisual system would first test whether a moving imagehad a possible rigid interpretation. If so, it would applyan analysis based on the rigidity assumption. If not, itwould apply some other analysis that was appropriatefor nonrigid motion. This hypothesis makes two impor-tant predictions: First, observers should be able toaccurately distinguish between rigid and nonrigid mo-tion. Moreover, since rigid motion is really a special caseof nonrigid motion, the existence of a separate strategyto deal with that case could be justified only if it re-sulted in a measurable increase in performance. Thus, weshould expect that an object's three-dimensional struc-ture should be easier to determine when it is under-going rigid (as opposed to nonrigid) motion. The first ofthese predictions has recently been confirmed by Todd(1982), but the second has not yet been evaluated.
Another important property of current algorithms forcomputing structure from motion is that they are allsensitive to the effects of perspective. For example, theanalyses proposed by Clocksin (1980), Koenderink andvan Doom (1975, 1977), Lee (1974), Longuet-Higgensand Prazdny (1980), Nakayama and Loomis (1974), andPrazdny (1980) all require a large amount of perspective(i.e., a polar projection) in order to be effective. Theanalyses proposed by Todd (1982), Ullman (1979),and Webb and Aggarwal (1981), on the other hand, aremost effective when displays are generated with smallamounts of perspective (i.e., when they approximate aparallel projection). There is little evidence to suggesthow closely these limitations correspond to those ofactual human observers. Although there have beenseveral demonstrations that human observers will experi-ence the kinetic depth effect with either parallel orpolar projection, it has yet to be determined whichdegree of perspective results in the most precise percep-tual specification of an object's three-dimensionalstructure.
Given the importance of rigidity and perspectivefor existing theoretical analyses, it is interesting tospeculate why the perceptual effects of these variablesare still unknown. One reason, I suspect, is that thepsychophysical methods currently in use are extremelycrude. In most demonstrations of the perception ofstructure from motion, observers are asked to rate theperceived rigidity or three-dimensionality of a movingconfiguration of elements, but there is generally noattempt to evaluate the precise details of its perceivedthree-dimensional structure (see, however, Lappin &
Fuqua, 1983, and Braunstein & Andersen, in press).The research reported in the present article was specifi-cally designed to overcome this difficulty. The accuracyof form perception was evaluated by asking observers todiscriminate among a series of computer-generated sur-faces that varied in curvature. The goal of this researchwas to determine empirically whether the accuracy ofobservers' curvature judgments was significantly affectedby the rigidity of an object's motion or the type of pro-jection with which it was presented.
Six graduate students at Brandeis University participated inthe experiment. None of the observers was familiar with thetheoretical issues being investigated or with the specific detailsof how the displays were generated.
ApparatusThe stimuli were produced using an LSI-ll/23 microprocessor
and displayed on a Terak 8600 color graphics system with aspatial resolution of 320 X 240 pixels. The stimuli were pre-sented within a rectangular window of the display screen thatwas 18 em along the vertical axis and 25 em along the horizontalaxis. The displays were viewed binocularly at a distance ofapproximately 50 em. Head and body movements were notrestricted.
StimuliObservers were presented with computer simulations of
cylindrical surfaces that could vary in curvature and could begenerated under either parallel or polar projection. The five dif-ferent curvatures used in the experiment had circular arcs of159.2, 134.8, 106.3,73.7, and 37.8 deg. These are depicted inFigure 1. Each curve in the figure represents how a simulatedsurface would appear if viewed from the side (i.e., parallel to thedisplay screen). In the actual experiment, the surfaces weredisplayed individually and viewed from the front, as Shown inFigure 2. Perspective is defined in this context as a ratio betweenthe simulated viewing distance E and the distance D that thesimulated surface extends away from the display screen. It isimportant to note that D covaries with curvature. (The values ofD for the five curvatures in Figure 1 were 1.5, 3, 4.5, 6, and7.s em.) Thus, to maintain a constant amount of perspective,it was necessary to use a different simulated viewing distance foreach surface. The perspective ratios were set at 8:1 to simulatea polar projection and at 100:1 to approximate a parallel projec-tion. This resulted in simulated viewing distances of 12, 24, 36,48, and 60 em, respectively, for the five surfaces under polarprojection, and 150, 300, 450, 600, and 750 em for the samesurfaces under parallel projection. (The actual viewing distancewas approximately 50 em.)
It may appear at first glance that the 8:1 perspective ratioused in this experiment was considerably less severe than whathas been used by other researchers. For example, Lappin,
Figure 1. The five different curvatures used in Experiment 1.
STRUCTURE FROM MOTION 99
Figure 2. A section of a cylindrical surface as viewed by theobservers in Experiment 1. Perspective is defined as a ratiobetween the viewing distance (E) and the extension of thesurface in depth (D).
structure would become immediately apparent. It is possibleto obtain a clearer impression of how the displays changed overtime by examining the trajectory patterns depicted in Figure 4.This figure was created by superimposing all 24 frames for aNo.5 surface under the different combinations of perspectiveand motion. Note that the display elements in the rigid condi-tions moved along vertical trajectories. (These trajectorieswere slightly bowed under polar projection.) In the nonrigid con-ditions, in contrast, there was a clearly noticeable horizontalcomponent to the projected motion and the resulting trajectorieshad a sinusoidal shape.
As is evident in Figure 4, the instantaneous velocity of agiven dot was influenced by many different variables, includingits position on the display screen, the curvature of the depictedsurface, the type of transformation to which the surface wassubjected, and the amount of perspective. Across all of the dis-plays, the velocities of the moving elements ranged from 0 to7 cm/sec in both the vertical and the horizontal dimensions. Inthe vertical dimension the maximum velocities were alwaysobserved in the center of the display screen, whereas in thehorizontal dimension they were always observed at the displayboundaries.
, i JI ..
: i iI, II " I '.' I
(' I II ill n~,\\\\\\~~.;\
Figure 3. A static view of a No. 5 surface used in Experi-ment 1. If the same display were observed in motion, it wouldappear to be highly curved.
Figure 4. Some projected trajectory patterns used in Experi-ment 1. These patterns were created by superimposing all 24frames of a No. 5 surface under the different combinations ofperspective and motion.
Doner, and Kottas (1980) have shown that a perspective ratio of3: I may be necessary to facilitate the perception of structurefrom motion under certain minimal conditions. Such high levelsof perspective, however, are possible only for visual displays oftransparent objects. For opaque surfaces, there is a minimumdistance from which a given curvature can be viewed withoutoccluding itself (e.g., the viewing distance would have to beinfinite to observe a full 180 deg of a circular arc). The minimumviewing distance for a No.5 (maximally curved) surface in thepresent experiment would result in a perspective ratio of approx-imately 7: 1. Thus, the 8: I ratio used in the polar conditions wasalmost as severe as is mathematically possible.
In addition to variations in curvature and perspective, eachsurface could undergo two types of motion. In half of the dis-plays, the depicted motion was a rigid transformation. Each ofthe cylindrical surfaces was rotated back and forth around itscentral axis at a rate of 'h Hz. The angular extent of this rotationequaled 23% of the surface's visible arc. (When an element dis-appeared at a screen boundary, it reappeared' on the oppositeside.) In the other half of the displays, the surfaces were stretchednonrigidly along their central axes in addition to the rotation.The length of the simulated cylinder was increased sinusoidallyby 25% at a frequency of I Hz. The size of the display windowremained constant, however, so that some elements would dis-appear from view during portions of the oscillation cycle. Thesestretching motions had two important properties: First, theywere sufficiently nonrigid to foul up any of the techniques fordetermining structure from motion that have been proposed in theliterature; and" second, they did not affect surface curvature.This latter property is particularly important, because theobservers' task was to judge which of the five possible curvatureswas depicted in each display. These judgments would have beenimpossible to interpret if some of the simulated curvatures hadbeen changing over time.
The moving displays were created by stepping...