Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1
Apparent motion and reference frames
Michael H. Herzog1 and Haluk Öğmen2
To appear in:
Oxford Handbook of Perceptual Organization
Oxford University Press
Edited by Johan Wagemans
1Brain Mind Institute
École Polytechnique Fédérale de Lausanne 2Center for Neuro-Engineering and Cognitive Science
Department of Electrical & Computer Engineering
University of Houston
Abstract
This article presents a selective overview of motion perception starting with its early
philosophical underpinnings. Its role in Gestalt psychology is highlighted including the discovery
of the relativity of motion perception and form-motion interactions. The use of reference
frames in the computation of motion is illustrated with examples, leading to its current
implications on non-retinotopic processing.
Keywords: Apparent motion, motion perception, induced motion, relativity, reference frames,
non-retinotopic processing.
1. History of apparent motion and its role in Gestalt psychology
1.1. Mathematical foundations of space and time, Zeno’s paradoxes and the implied
psychological theory
By definition, motion is change of position over time. To understand motion from a
psychological perspective, one needs to appeal to the concepts whereby space and time are
defined from the perspective of physics (to express the stimulus) and from the perspective of
psychology (to express the percept). Around 450 B.C., Zeno studied how motion can be
expressed using the concepts of space and time available at this time (Kolers, 1972). Zeno’s
analysis of physical motion led him to paradoxes that he could solve by suggesting that motion is
a purely psychological construct. In one of these paradoxes, Achilles is trying to catch up Tortoise
in a race where Tortoise starts with an initial advantage. Zeno argues that Achilles will never be
able to catch up Tortoise because by the time Achilles reaches Tortoise’s starting point, Tortoise
will have advanced to a new position; by the time Achilles reaches this new position, Tortoise
will be yet at another position further down the road, and so on… Zeno thought that even if
Achilles moves faster than Tortoise and reduces his distance at every iteration, he will still have
2
to do this infinitely many times. Lacking the concept of infinity and convergent series, he
concluded that Achilles will never be able to catch Tortoise. A similar paradox arises if one
wants to move from point A to point B. Zeno reasoned that infinitely many points need to be
crossed and that one can never move between two points. When time is conceived as a
continuous variable composed of infinitely short (i.e. duration-less) instants, one cannot be in
motion because, by definition, the instant has no duration to allow change in position. If motion
is not physically possible, what then explains our percepts of moving objects? Zeno thought that
objects exist at different locations at different time instants. These percepts are stored in
memory and compared over time. When a disparity in spatial position is detected, we create an
illusion of motion to resolve this disparity. Progress in mathematics (the development of the
concept of convergent series) removed the conceptual barriers in expressing motion as a
physical stimulus. Armed with this new mathematics, naïve realistic approaches focused on how
this real motion can be perceived as a veridical, as opposed to an illusory percept. Nevertheless,
psychological implications of Zeno’s analysis have been enduring.
1.2. Exner’s and Wertheimer’s contributions, types of apparent motion, and Korte’s laws
About 2500 years later, an important advance occurred when Exner (1875) created a stimulus
consisting of two brief flashes presented at two spatially neighboring locations. With proper
selection of timing and separation parameters, this stimulus generated the perception of
motion, the first flash appearing to move smoothly to the location of the second flash. Since
there was no stimulation of the intermediate points between the two flashes, this was indeed
an illusion created by the perceptual system. More generally, Exner found that when the inter-
stimulus interval (ISI) between the flashes was 10 ms or shorter, the two flashes were perceived
as simultaneous; subjects could not report reliably their temporal order. When ISI increased, the
perception was that of a single object moving from one position to the other. At longer ISIs, the
stimuli appeared as two temporally successive flashes without the perception of motion. The
finding that the perception of motion occurred at ISIs at which the temporal order of stimuli
cannot be resolved led Exner to reject Zeno’s memory explanation. Since the temporal order of
the two stimuli cannot be determined, the contents of memory should appear simultaneous and
no motion should be perceived. Hence, Exner defended the view that motion is not an indirect
property inferred from the analyses of objects over time, but instead it is a basic dimension of
perception.
The experimental technique developed by Exner was essential in Max Wertheimer’s influential
study that led to the development of Gestalt psychology (Wertheimer, 1912; for a review of the
development of Gestalt psychology, see Wagemans, this volume). Using a borrowed
tachistoscope and with Wolfgang Köhler and Kurt Koffka as his subjects, Wertheimer extended
Exner’s study by creating a richer and more nuanced phenomenology. Exner’s three stages
(simultaneity, motion, succession) were refined further by describing different types of
perceived motion: one type of perceived motion was smooth movement of the object as
described by Exner. This was called beta motion. A second type is partial movement, i.e., the
object appears to move up to a certain point along the trajectory between the flashes,
disappears, and reappears in movement again at a further point along the trajectory. Finally, a
third type of movement, called the phi motion, corresponded to the percept of movement
without any specific form, i.e., “figureless movement”. Wertheimer used phi motion to argue
3
that the perception of motion does not emerge from the comparison of objects in memory but
it is a fundamental dimension of perception in its own, separate from form perception.
In terms of terminology, the perception of motion generated by two flashes is called apparent
motion. Phi and beta motions are sub-types of apparent motion. They are distinguished from
real motion, which refers to the perception of motion generated by a smoothly moving object1.
Following Wertheimer’s study, Gestalt psychologists Korte and Neuhaus explored further the
effect of various stimulus parameters leading to the so-called “Korte’s laws” (Korte, 1915;
Neuhaus, 1930). These “laws” can rather be viewed as rules of thumb since the relationship of
the percept with the parameters is rather complex (e.g., Kolers, 1972; Gepshtein & Kubovy
2007). In short, Korte’s laws state that to obtain the percept of apparent motion between
flashes: 1) larger separations require higher intensities, 2) slower presentation rates require
higher intensities, and 3) larger separations require slower presentation rates.
Since this early work, there have been a large number of studies investigating systematically the
dependence of motion perception on a broader range of stimulus parameters2. Around 1980s,
research focus shifted from explaining the complex phenomenology of motion to the more basic
question of how we detect motion. Several computational models have been proposed and
eventually united under a broad umbrella. In the next section, we briefly review these models
after which we will return to the main theme of our chapter, viz., phenomenal and
organizational aspects of motion.
2. Computational basis of motion detection
2.1. Motion detection as orientation detection in space-time
As shown in Fig. 1A, the real (continuous) motion of an object with a constant speed can be
described by an oriented line in a space-time diagram. An apparent motion stimulus is a
sampled version of this stimulus consisting of two (or more) discrete points on the pathway (Fig.
1B). Motion detection mechanisms have been described as filters tuned to orientation in space-
time. Among the earliest models, the Barlow-Levick model (Barlow & Levick, 1965) takes its
input from one point in space, delays it, and compares it (with Boolean “AND” operation) to the
input from another point is space. The Hassenstein-Reichardt correlation model (Hassenstein &
Reichardt, 1956) works on a similar principle but the comparison is carried out by the correlation
integral (Fig. 1C). Since these models sample space at two discrete spatial and temporal
positions, they respond to apparent and real motion in the same way. The more elaborate
1 Note that the terms apparent/real motion may refer to the stimulus or to the percept generated by the stimulus depending on context. Stroboscopic motion and sampled motion are synonymous terms for apparent motion; the former derived from the equipment used to generate it (stroboscope), while the latter to highlight its relation to real motion (see Section 2.1). 2 Several demos can be found in the movies page. We have also included the source Powerpoint files. We encourage the reader to change various parameters (spatial separation between stimuli, shapes, geometrical organization) and experiment. As mentioned in the text, time parameters are important in determining the percept. An easy way to modify time parameters is to go to “Transition” tab of Powerpoint and change the time parameter under “Advance Slide”. The modified file can be saved as a movie and played as a movie file.
4
versions of these models include denser sampling to build a space-time receptive field as shown
in Fig. 1D. These spatio-temporal models have been further extended by introducing
nonlinearities at early stages so that they can respond to second-order stimuli (i.e., defined by
stimulus dimensions other than luminance, such as texture). Finally, a third order motion system
has been proposed that requires attention (for review, see Lu & Sperling, 2001). Salient features
are detected and tracked over time. One implication of spatio-temporally localized receptive
fields is that each motion detecting neuron “views” a small part of the space via its receptive
field which acts as an “aperture”. When a uniform surface or edge moves across the viewing
aperture, only the motion component perpendicular to the edge can be measured by a local
motion detector, a problem known as the aperture problem (for a review, see Bruno &
Bertamini, this volume). The solution of the aperture problem requires integration of motion
signals across space. The motion integration problem will be discussed in the following sections
within a broader context, viz., even when each local measurement is accurate.
Figure 1. A. Trajectory of a stimulus moving with a constant speed can be
described as an oriented line in a space-time diagram. B. Apparent motion
stimulus is a sampled version of continuous motion. C. A motion detector samples
space
time
Delay
Compare
space
time
space
time
space
time
A B
C D
5
the input at two spatial locations and carries out a delay-and-compare operation.
D. The denser sampling in space-time yields an oriented receptive field for the
motion detector. This detector will become maximally active when the space-time
orientation of the motion stimulus matches the orientation of its receptive field.
2.2. Is motion an independent perceptual dimension?
Given this background, we can now return to one of the original questions about motion
perception: is it derived from object comparisons over time through memory or is it a
fundamental dimension of perception? At a first glance, all models discussed above involve
memory (e.g, delay or temporal filtering operations) and carry out comparisons (e.g., AND gate
or correlation). However, first-order and second-order models compare relatively raw inputs
without prior computation of form. As such, they constitute models that represent motion as an
independent dimension. The third order motion system, however, identifies and tracks features;
this system is, at least partially, built on form analyzers.
From the neurophysiological perspective, motion sensitive neurons have been found in many
cortical areas. In particular, visual areas MT and MST are highly specialized in motion processing
(for review, see Albright & Stoner, 1995). These areas are located in the dorsal stream as
opposed to form related areas located in the ventral stream. In sum, there is a broad range of
evidence for different systems dedicated to the processing of motion and form and that motion
constitutes an independent perceptual dimension. However, there is also evidence that these
systems are not strictly independent, but rather interact.
3. The problem of phenomenal identity and the correspondence problem
After Wertheimer’s pioneering work on apparent motion, while the major focus of Gestalt
psychology shifted to static images, there was still a strong emphasis on motion. In his 1925
dissertation, with Wertheimer as his second reader, Joseph Ternus took the task of studying
how grouping principles can be applied to stimuli in motion. The fundamental question he
posed was what he termed the problem of phenomenal identity: “Experience consists far less in
haphazard multiplicity than in the temporal sequence of self-identical objects. We see a moving
object, and we say that “this object moves” even though our retinal images are changing at each
instant of time and for each place it occupies in space. Phenomenally the object retains its
identity” (Ternus 1926). He adopted a stimulus previously used by Pikler (1917), shown in Fig.
2A.
6
Figure 2. A. A simple Ternus-Pikler display. B. An apparent motion stimulus with
two different shapes. C. The influence of shape is strong in correspondence
matching when there is overlap between stimuli (left) and becomes weaker as the
overlap is eliminated (right). D. A stimulus configuration used by Ternus to
investigate the relationship between local motion matches and global shape
configurations.
The first frame contains three identical elements. In the second frame, these elements are
displaced so that some of them overlap spatially with the elements in the previous frame. In the
example of Fig. 2A, the three disks are shifted by one inter-disk distance so that two of the disks
overlap across the two frames. Given all identical elements in the two frames, one can then ask
how will the elements be grouped across the two frames? This question has been later termed
the “motion correspondence” problem. If we consider the central disk in Frame 2 (Fig. 2A), will
this disk be grouped with the rightmost disk of the first frame based on their common absolute
spatial location, i.e, the same retinal position, or will it be grouped with the central disk of the
first frame based on their relative position as the central elements of spatial groups of three
elements? The answer to this question turned out to be quite complex with several variables
influencing the outcome. For example, when the ISI between the two frames is short, the
leftmost element in the first frame appears to move to the rightmost element in the second
frame while the spatially overlapping elements in the center appear stationary (i.e., they are
grouped together). For longer ISIs, a completely different organization emerges: the three
elements appear to move in tandem as a group, i.e., their relative spatial organization prevails in
the spatiotemporal organization. These two distinct percepts are called element and group
motion, respectively. Many other variables, such as inter-element separation, element size,
spatial frequency, contrast, inter-stimulus interval (ISI), luminance, frame duration, eccentricity,
and attention influence which specific organization emerges as the prevailing percept (e.g., Alais
& Lorenceau, 2002; Aydin et al., 2011; Breitmeyer & Ritter, 1986a, 1986b; Casco & Spinelli,
Frame 1
Frame 2
ISI
Frame 1
Frame 2
ISI
Frame 1
Frame 2
ISI
Frame 1
Frame 2
ISI
A B
C D
7
1988; Dawson, Nevin-Meadows, & Wright, 1994; He & Ooi, 1999; Hein & Moore, 2012; Ma-
Wyatt, Clifford, & Wenderoth, 2005; Pantle & Petersik, 1980; Pantle & Picciano, 1976). As many
other Gestalt grouping phenomena, spatiotemporal grouping is governed by multivariate
complex processes.
4. Form motion interactions
4.1. How local form information influences motion perception
The apparent motion stimulus lends itself nicely to the study of form motion interactions (for
other examples of form motion interactions, see Blair et al., this volume). Remember that Zeno
claimed that motion is an illusion created by the observer in order to reconcile the existence of
an object at two different spatial locations at two different time instants. The observer would
compare the two stimuli from memory and if a suitable match is found a phenomenal identity
will be attributed to these two stimuli as two instances of the same object. Perceived motion
from one object to the other would signal the conclusion that these two objects are one and the
same. Thus, according to this view, form analysis is a precursor of motion perception and the
match of the form of the two objects is a prerequisite for motion perception. This can be tested
directly by creating an apparent motion stimulus where the shapes presented in the two frames
are different (Fig. 3; see also the demo “AM – different shapes”). Many such experiments have
been carried out showing that form has little effect on the perception of apparent motion, i.e.,
motion percepts between the two stimuli are strong (Kolers, 1972). In the example of Fig. 3, one
perceives the square morphing into a circle along the path of apparent motion. That the shape
of an object in apparent motion should remain constant can, in general, be expected to hold
only for small displacements. This is because, the proximal stimulus is a two-dimensional
projection of a three-dimensional object and during motion, one experiences perspective
changes resulting in different views of the object. It is this very fact that Ternus used in defining
the problem of phenomenal identity.
In the case of the example shown in Fig. 2B, there is no motion ambiguity and the interpretation
of an object whose form changes (presumably due to perspective change) appears to be a
natural solution. What happens, however, if the correspondences in the display are more
complex and represent ambiguities such as the ones shown in Fig. 2C? Results indicate that form
information (or in general feature information such as color, texture) can be used in resolving
ambiguities in the case where there is physical overlap between elements of the two frames
(Ternus-Pikler displays; see for example the demo “TP – feature bias”) but this influence
becomes weaker when the overlap is reduced and the distance between the elements is
increased (Hein & Cavanagh, 2012). Taken together, all these results indicate that motion and
form are separate but interacting systems.
4.2. How local motion information influences form perception
Having answered the question of how local form information can influence motion perception,
one can ask the converse question, viz., how local motion information can influence form
perception. Figure 2D shows one of Ternus’ displays where in each static display dots group into
global shapes. One can see a vertical line and a diamond shape which are moved left to right
8
and vice-versa respectively. However, the strength of static groups cannot predict the perceived
forms in motion; i.e. the percept in Fig. 2D does not correspond to a line moving right and a
diamond moving left. Instead, at short ISIs, the three horizontally-aligned central dots appear
stationary while the outer dots appear to move rightwards. For longer ISIs, the percept appears
to be that of a single object rotating 180 degrees in 3D (Ternus, 1926). Note that in these
complex displays, multiple possible motion correspondences exist (e.g., Dawson & Wright, 1994;
Otto, Ogmen, & Herzog, 2008) and the percept may vary from subject to subject, or even from
trial to trial for the same subject. The reader can experiment with the demo “TP complex
configuration”.
Having established that form and motion information interacts, the next question is to
understand how. Combining signals from form and motion systems require a common basis
upon which they can be expressed. In other words, what is the reference frame that allows
interactions between these two systems? We will proceed first by discussing reference frames
within the motion system and then by extending these reference frames to form computations.
5. Reference frames
5.1. Relativity of motion and reference frames
The Gestalt psychologist Karl Duncker’s work was instrumental in highlighting the importance of
reference frames in perception (Duncker, 1929; for review, see Wallach, 1959; Mack, 1986). In
one of his experiments, he presented a small stimulus embedded in a larger one (Fig. 3A, left
panel). He moved the large surrounding stimulus while keeping the smaller one stationary.
Observers perceived the surrounding stimulus as stationary and the smaller stimulus as moving
in the direction opposite to the physical motion of the surrounding stimulus (for a recent paper
with demos, see Anstis & Casco, 2006). To account for this illusory induced motion, he proposed
that the larger surrounding stimulus served as the reference frame against which the position of
the embedded stimulus is computed. The right panel of Fig. 3A shows another configuration
studied by Duncker, the “rolling wheel”. If a light dot stimulus is placed on the rim of a wheel
rolling in dark, the perceived trajectory of this dot is cycloidal. If a second dot at the center of
the wheel is added to the display, one perceives the central dot to move in a linear trajectory
and the dot on the rim is perceived to rotate around the central dot. In other words, the central
dot serves as a reference against which the motion of the second dot is computed (for demos on
the relativity of motion using the Ternus-Pikler paradigm, the reader is referred to demos in Boi
et al., 2009).
9
Figure 3. A. Two stimulus configurations studied by Duncker. The top panels
represent the stimuli and the bottom panels depict the corresponding percepts.
Left panels: Induced motion, Right panels: Rolling wheel illusion. B. An example
illustrating Johansson’s vector decomposition principles. a. The stimulus. b. The
decomposition of the motion of the central dot so as to identify common vector
components for all three dots. c. The resulting percept.
To explain these effects, Johansson (1973) proposed a theory of vector analysis based on three
principles. The first principle states that elements in motion are always perceptually related to
each other. According to his second principle, simultaneous motions in a series of proximal
elements perceptually connect these elements into rigid perceptual units. Finally, when the
motion vectors of proximal elements can be decomposed to produce equal and simultaneous
motion components, per the second principle, these components will be perceptually united
into the percept of common motion. Fig. 3B illustrates these concepts. Fig. 3B-a shows the
stimulus. By the first principle, the motion of these dots are not perceived in isolation but are
related to each other. By the second principle, the top and bottom dots are connected together
as a single rigid unit moving together horizontally. By the third principle, a horizontal component
equal and simultaneous with the horizontal motion of the top and bottom dots is extracted from
the motion of the central dot (Fig. 3B-b). The resulting percept is the horizontal movement of
three dots during which the central dot moves up and down between the two flanking dots (Fig.
3B-c) (Johansson, 1973).
In a more natural setting, the distal stimulus generates a complex optic flow pattern on the
retina. For example, while watching a street scene, one perceives the background (shops,
houses, etc.) as stationary, cars and pedestrians moving with respect to this stationary
background, the legs and arms of pedestrians undergoing periodic motion with respect to their
body, their hands moving with respect to the moving arms, etc. Thus, the stimulus can be
analyzed as a hierarchical series of moving reference frames and motions are perceived with
a b c
A
B
10
respect to the appropriate reference frame in the hierarchy (e.g., hand with respect to the arm,
the arm with respect to the body). While powerful and intuitively appealing, the basic principles
of this theory are not sufficient to specify unambiguously how vectors will be decomposed in
complex naturalistic stimuli. In fact, a vector can be expressed as the sum of infinitely many
pairs of vectors, and it is not clear a priori how to predict which combination will prevail for
complex stimuli. The difficulty faced here is similar to the one when we attempt to apply the
Gestalt “laws” derived from simple stimuli to complex stimuli. To address this issue, Gestaltists
put forth the “law of Prägnanz” (or the law of good Gestalt) which states that among the
different possible organizations, the one that is the “simplest” is the one that will prevail
(Koffka, 1935; Cutting & Proffitt, 1982; for a review, see van der Helm, this volume). However,
the criterion for “simplest” remains arbitrary and elusive. The same concept has been adopted
by other researchers who tried to quantify the simplicity of organizations. For example, Restle
(1979) adopted the coding theory where different solutions are expressed as quantifiable
“codes”. A stimulus undergoing circular motion can be described by three parameters:
amplitude, phase, and wavelength. Restle used the number of parameters describing a
configuration as the “information load” and predicted that the configuration with lowest
information load would be the preferred (i.e., perceived) configuration. Dawson (1991) used a
neural network to combine three heuristics in solving the correspondence problem. However,
these approaches all suffer from the same general problems: As acknowledged by Restle, the
method does not have an automatic way for generating all possible interpretations. Moreover,
the choice of parametrization and its generality, the heuristics, their benefit and costs as well as
the optimization criteria remain arbitrary.
5.2. Object file theory
Kahneman and colleagues addressed the problem of phenomenal identity by adapting two
concepts from computer science, viz., addresses and files (Kahneman et al., 1992). The
fundamental building blocks of their theory are “object files”, each containing information about
a given object. These files establish and maintain the identities of objects. According to their
theory, an object file is addressed, not by its contents, but by the location of the object at a
given time3. This location-based index is a type of reference frame discussed in the previous
section. However, by restricting the file addressing mechanism to a spatial location, this theory
faces many shortcomings. In the object file theory, features are available on an instant-by-
instant basis and get inserted into appropriate files. On the other hand, feature processing takes
time. Without specifying the dynamics of feature processing, the theory ends up in a
bootstrapping vicious circle. When and how is the opening of an object file triggered? Since an
object is defined by features, an initial evidence for opening a file for an object necessitates that
at least some of the relevant features of the object are already processed; however, the
processing of features for a specific object requires that a file for that object is already opened.
Typical experiments used within the context of the object file theory include static preview
conditions whose “main end product (…) is a set of object files“ (Kahneman et al., 1992).
3 A similar concept was also proposed by Pylyshyn in his FINST theory (Pylyshyn, 1989). Several extensions and variants of the object file theory have been proposed, including the detailed analysis of object updating (Moore & Enns, 2004; Moore et al., 2007) and hierarchies in object structures (Lin & He, 2012).
11
However, under normal viewing conditions objects often appear from our peripheral field or
behind occlusions necessitating mechanisms that can operate in the absence of static preview
conditions. Another problem with the object file theory is that while vision has geometry, “files”
do not specify a geometric structure. Objects have spatial extent and thus the location of an
object cannot be abstracted from its features. Assume that the centroid of an object is used as
its location index. To put features in the file indexed by this location, one needs to know not just
one location index but the retinotopic extent of the object, which in turn necessitates surface
and boundary features. Moreover, as we will discuss below (feature attribution and occlusion
problems), objects may occlude each other. The insertion of correct features to correct object
files cannot be accomplished by location indices alone, spatial extent and occlusion information
needs to be represented as well.
In sum, while all this work highlights the importance of motion grouping and motion based
reference frames, a deeper understanding of why the visual system needs reference frames may
provide the constraints necessary to determine how and why reference frames are established.
6. The need for reference frames
6.1. The problems of motion blur and moving ghosts
In order to appreciate why reference frames are needed, consider first the fact that humans are
mobile explorers and interact constantly with other moving objects. The input to our visual
system is conveyed following the optics of the eye. The geometry of image formation can be
described by projective geometry. Neighboring points in the environment are imaged on
neighboring photoreceptors in the retina. The projections from retina to early visual cortical
areas preserve these neighborhood relationships creating a retinotopic representation of the
environment. To analyze the impact of motion on these representations, we need to consider
the dynamical properties of the visual system.
A fundamental dynamical property of vision is visible persistence: Under normal viewing
conditions, a briefly presented stationary stimulus remains visible approximately 120 ms after its
physical offset (e.g., Haber & Standing, 1970; Coltheart, 1980). Based on this duration of visible
persistence, we would expect moving objects to appear highly blurred. For example, a target
moving at 10 deg/s would generate a trailing smear of 1.2 deg. The situation is similar to taking
pictures of moving objects with a camera at an exposure duration that mimics visible
persistence. Not only do the moving objects exhibit extensive motion smear, they also have a
ghost-like appearance without any significant form information. This is because static objects
remain long enough on a fixed region of the film to expose sufficiently the chemicals while
moving objects expose each part of the film only briefly thus failing to provide sufficient
exposure to any specific part of the film. Similarly, in retinotopic representations, a moving
object will stimulate each retinotopically localized receptive-field briefly and an incompletely
processed form information would spread across the retinotopic space just like the ghost-like
appearances in pictures (Ogmen, 2007). Unlike photographic images, however, in human vision
objects in motion typically appear relatively sharp and clear (Bex, et al., 1995; Burr, 1980; Burr &
12
Morgan, 1997; Burr et. al, 1986; Hammett, 1997; Ramachandran et al., 1974; Westerink &
Teunissen, 1995).
In normal viewing, we tend to track moving stimuli with pursuit eye movements and thereby
stabilize them on the retina. While pursuit eye movements can help reduce the perceived blur
of a moving object (Bedell & Lott, 1996), the problem of motion blur remains for other objects
present in the scene, since we can pursue only one object at a time. Eye movements also cause
a retinotopic movement for the stationary background creating the blur problem for the
background. Furthermore, the initiation of an eye movement can take about 150-200 ms during
which a moving object can generate considerable blur. How does the visual system solve the
problems of motion blur and moving ghosts? A potential solution to the motion blur problem is
the use of mechanisms that inhibit motion smear in retinotopic representations (Chen et al.,
1995; Ogmen, 1993, 2007; Purushothaman et al., 1998). A potential solution to the moving
ghosts problem is the use of reference frames that move along with moving objects rather than
being anchored in retinotopic coordinates (Ogmen, 2007).
6.2. The problems of dynamic occlusions and feature attribution
When an object moves, a variety of dynamic occlusions happens. The object occludes different
parts of the background and, depending on depth relations, either occludes or gets occluded by
other objects in the scene. Moreover, as its perspective view changes with respect to the
observer, its visible features also change due to self-occlusion. All these dynamic considerations
lead to two inter-related questions: First, as highlighted by Ternus, how does the object
maintain its identity despite the changes in its features? Second, due to these occlusions,
features of different objects become dynamically entangled. How does the visual system
attribute features to the various objects in a consistent manner? As discussed in the previous
sections, a possible solution to maintain object identities is to establish motion correspondences
and to arrange the resulting motion vectors as a hierarchical set of reference frames. These exo-
centered reference frames4 establish and maintain the identity of objects in space and time. As
we discuss below, these reference-frames can also provide the basis for feature attribution.
7. Non-retinotopic feature attribution
7.1. Sequential metacontrast and non-retinotopic feature attribution
Earliest studies of motion blur and deblurring can be traced back to McDougall (1904) and
Piéron (1935). Figure 9 depicts the stimulus arrangements used by these researchers. As
mentioned in Section 6.1, the motion blur generated by a moving stimulus can be “deblurred”
by inhibitory mechanisms in retinotopic representations. In fact, McDougall reported that the
blur generated by the leading stimulus “a” in Fig. 4A could be curtailed by adding a second
stimulus, labeled “b” in Fig. 4A in spatiotemporal proximity. The specific type of masking where
4 Reference frames can be broadly classified into two types: Ego-centered reference frames are those centered on the observer (e.g., eye-centered, head-centered, limb-centered). Exo-centered reference frames are those centered outside the observer (e.g., centered on an object in a scene).
13
the visibility of a target stimulus is suppressed by a spatially non-overlapping and temporally
lagging stimulus is called metacontrast (Bachmann, 1994; Breitmeyer & Ogmen, 2006).
Figure 4. Stimulus arrangement used by A. McDougall (1904) corresponding to
metacontrast, B. Piéron (1935) corresponding to sequential metacontrast and by
C. Otto et al. (2006) to analyze feature attribution in sequential metacontrast.
Piéron (1935) modified McDougall’s stimulus to devise a “sequential” version as shown in Figure
4B. This sequential stimulus provides a temporally extended apparent motion and metacontrast
stimulus that can be used to illustrate the motion deblurring phenomenon. It can also be used
to study the feature attribution problem. Fig. 4C shows a version of sequential metacontrast
where the central line contains a form feature: A small Vernier offset is introduced by shifting
the upper segment of the line horizontally with respect to the lower segment (Otto et al., 2006).
In this stimulus, the central line containing the Vernier offset is invisible to the observer because
it is masked by the two flanking lines. One perceives two streams of motion, one to the left and
one to the right. The question of feature attribution is the following: What happens to the
feature presented in the central invisible element of the display? Will it also be invisible, or will
it be attributed to motion streams? The results of experiments using various versions of this
sequential metacontrast stimuli show that features of the invisible stimuli are attributed to
motion streams and integrated with other features presented within each individual motion
stream. In other words, features are processed according to reference frames that move
according to the motion vector of each stream (Otto et al., 2006, 2008, 2009, 2010a,b).
time
C
14
7.2. Ternus-Pikler display and non-retinotopic feature attribution in the presence of retinotopic
conflict
Ternus-Pikler displays are designed to pit directly retinotopic relations against non-retinotopic
grouping relations. This property offers the advantage of directly assessing whether features are
processed according to retinotopic or grouping relations (Ogmen et al, 2006). Fig. 5 shows an
example of how the Ternus-Pikler display is used for studying feature attribution. As a feature, a
Vernier offset, called the “probe Vernier” is inserted to the central element of the first frame
(Fig. 5). Observers were asked to report the perceived offset direction for elements in the
second frame, numbered 1, 2, and 3 in Fig. 5D-left. None of these elements contained a Vernier
offset and naïve observers did not know where the probe Vernier was located. Consider first the
control condition in Fig. 5E, obtained by removing the flanking elements from the two frames. In
this case no motion is perceived. Based on retinotopic relations, the probe-Vernier should be
integrated with element 1 in the second frame and the agreement of observers’ responses with
the direction of probe-Vernier offset should be high for element 1 and low for element 2. If
processing of Vernier were to take according to retinotopic relations, one would predict the
same outcome for the Ternus-Pikler display regardless of whether element or group motion is
perceived. On the other hand, if feature processing and integration take place according to
motion grouping relations (Fig. 5B and 5C), instead of retinotopic relations, one would expect
the probe Vernier to integrate with element 1 in the case of element motion (Fig. 5B) and with
element 2 in the case of group motion (Fig. 5C). The results of this experiment along with those
conducted with more complex combination of features show that form features are computed
according to motion grouping relations, in other words, according to a reference frame that
moves according to prevailing motion groupings in the display (Ogmen et al., 2006).
15
Figure 5. The Ternus-Pikler Display (A) and the associated percepts of “Element
Motion” (B) and “Group Motion” (C). Dashed arrows in panels B and C depict the
perceived motion correspondences between the elements in the two frames.
Experimental results for Ternus-Pikler stimulus (D) and the control stimulus (E).
From (Ogmen et al., 2006).
In follow-up studies, this paradigm has been applied to other visual computations and it has
been shown that form, motion, visual search, attention, binocular rivalry all have non-
retinotopic bases (Boi et al., 2009; 2011b). Non-retinotopic computation of various stimulus
features has also been supported by other paradigms using motion stimuli (Shimozaki et al.,
1999; Nishida, 2004; Nishida et al., 2007; Kawabe, 2008) or attentional tracking (Cavanagh et al.,
2008). On the other hand, not all processes are non-retinotopic; motion and tilt adaptation have
16
been found to be retinotopic (Boi et al., 2011a; Knapen et al., 2009; Wenderoth & Wiese, 2008)
indicating that they are by-products of computations occurring prior to the transfer of
information from retinotopic to non-retinotopic representations.
8. Concluding remarks
Motion is ubiquitous in the ecological environment and most biological systems devote
extensive neural processing to its analysis. This importance has been recognized by philosophers
and scientists who carried out extensive studies on how motion is processed and perceived.
While there has been convergence in the types of computational models that can detect
motion, the broader issue of how motion is organized as a spatiotemporal Gestalt remains a
challenging question. The discovery of the relativity of motion led to the introduction of
hierarchical reference frames according to which part-whole relations can be constructed. This
chapter provided a review of why reference frames are needed from ecological and
neurophysiological (retinotopic organization) perspectives. These analyses show that reference
frames are needed not just for motion computation but for all stimulus attributes. We expect
future research to develop more in depth the properties of these reference frames which will
provide a common geometry wherein all stimulus attributes can be processed jointly.
17
9. References
Alais, D., & Lorenceau, J. (2002). Perceptual grouping in the Ternus display: Evidence for an
‘association field’ in apparent motion. Vision Research, 42, 1005-1016.
Albright, T. D., & Stoner, G. R. (1995). Visual Motion Perception. PNAS, 92, 2433-2440.
Anstis, S., & Casco, C. (2006). Induced movement: The flying bluebottle illusion. Journal of
Vision, 10(8), 1087-1092, http://www.journalofvision.org/6/10/8/,
Aydin, M., Herzog, M. H. Öğmen, H. (2011). Attention Modulates Spatio-temporal Grouping,
Vision Research, 51, 435-446.
Bachmann, T. (1994). Psychophysiology of Visual Masking: The Fine Structure of Conscious
Experience. New York: Nova Science Publishers.
Barlow H. B., and Levick W. R. (1965). The Mechanism of Directionally Selective Units in Rabbit's
Retina." The Journal of Physiology 178, 477-504.
Bedell, H. E. & Lott, L. A. (1996). Suppression of motion-produced smear during smooth-pursuit
eye-movements. Current Biology, 6, 1032-1034.
Bex, P. J., Edgar, G. K. & Smith, A. T. (1995). Sharpening of blurred drifting images. Vision
Research, 35, 2539-2546.
Boi, M., Ogmen, H., Krummenacher, J., Otto, T. U., & Herzog, M. H. (2009). A (fascinating) litmus
test for human retino- vs. non-retinotopic processing. Journal of Vision, 9(13): 5;
doi:10.1167/9.13.5
Boi, M., Ogmen, H., & Herzog, M. H. (2011a). Motion and tilt aftereffects occur largely in retinal,
not in object coordinates, in the Ternus-Pikler display, Journal of Vision, 11(3):7, 1–11, doi:
10.1167/11.3.7, 2011.
Boi M., Vergeer M., Öğmen H., Herzog M.H. (2011b). Nonretinotopic exogenous attention.
Current Biology, 21, 1732-1737.
Breitmeyer, B. G., & Ritter, A. (1986a). The role of visual pattern persistence in bistable
stroboscopic motion. Vision Research, 26, 1801-1806.
Breitmeyer, B. G., & Ritter, A. (1986b). Visual persistence and the effect of eccentric viewing,
element size, and frame duration on bistable stroboscopic motion percepts. Perception &
Psychophysics, 39, 275-280.
Breitmeyer, B. G., & Ogmen, H. (2006). Visual Masking: Time Slices through Conscious and
Unconscious Vision, (2nd Edition), Oxford University Press, Oxford, UK.
Burr, D. (1980). Motion Smear. Nature, 284, 164-165.
Burr, D. C. & Morgan, M. J. (1997). Motion deblurring in human vision. Proc. R. Soc. Lond. B,
264, 431-436.
Burr, D. C. & Ross, J., & Morrone, M. C. (1986). Seeing objects in motion. Proc. R. Soc. Lond. B,
227, 249-265.
Casco, C., & Spinelli, D. (1988). Left-right visual field asymmetry in bistable motion perception.
Perception, 17, 721-727.
Cavanagh, P., Holcombe, A. O., & Chou, W. (2008). Mobile computation: Spatiotemporal
integration of the properties of objects in motion. Journal of Vision, vol. 8 no. 12 article 1, doi:
10.1167/8.12.
Chen, S., Bedell, H. E., & Ogmen, H. (1995). A target in real motion appears blurred in the
absence of other proximal moving targets. Vision Research, 35, 2315-2328
18
Coltheart, M. (1980). Iconic memory and visible persistence. Perception & Psychophysics, 27,
183-228.
Cutting, J.E., & Proffitt, D.R., (1982). The minimum principle and the perception of absolute,
common, and relative motions, Cognitive Psychology, 14, 211-246.
Dawson, M. R. W. (1991). The how and why of what went where in apparent motion: modeling
solutions to the motion correspondence problem. Psychological Review, 98, 569-603.
Dawson, M. R. W. & Wright, R. D. (1994). Simultaneity in the Ternus configuration:
Psychophysical data and a computer model. Vision Research, 34, 397-407.
Dawson, M. R. W., Nevin-Meadows, N., & Wright, R. D. (1994). Polarity matching in the Ternus
configuration. Vision Research, 34, 3347-3359.
Duncker, K. (1929). Über induzierte Bewegung (Ein Beitrag zur Theorie optisch
wahrgenommener Bewegung). Psychologische Forschung, 12, 180–259.
Exner, S. (1875). Experimentelle Untersuchungen der einfachsten psychischen Prozesse. Pflugers
Arch Gesamte Physiol. 11, 403-432.
Gepshtein, S. & Kubovy, M. (2007). The lawful perception of apparent motion. Journal of Vision,
7(8):9, 1-15.
Haber, R. N. & Standing, L. (1970). Direct estimates of the apparent duration of a flash. Canadian
Journal of Psychology, 24,216-229.
Hammett, S. T. (1997). Motion blur and motion sharpening in the human visual system. Vision
Research, 37, 2505-2510.
Hassenstein, B. and Reichardt, W. (1956). Systemtheoretische Analyse der Zeit, Reihenfolgen,
und Vorzeichenauswertung bei der Bewegungsperzepion des Rüsselkäfers Chlorophanus. Z.
Naturforsch. 11b, 513-524.
He, Z. J. & Ooi, T. L. (1999). Perceptual organization of apparent motion in the Ternus display.
Perception, 28, 877-892.
Hein E., & Moore C. M. (2012). Spatio-temporal priority revisited: The role of feature identity
and similarity for object correspondence in apparent motion. Journal of Experimental
Psychology: Human Perception and Performance, 38, 975-988.
Hein, E. & Cavanagh, P. (2012). Motion correspondence in the Ternus display shows feature bias
in spatiotopic coordinates. Journal of Vision, 12(7). pii: 16. doi: 10.1167/12.7.16.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis,
Perception and Psychophysics. 14, 201–211.
Johansson, G. (1975). Visual motion perception. Scientific American, 232, 76-88.
Johansson, G. (1976). Spatio-temporal differentiation and integration in visual motion
perception. Psychol. Res., 38, 379-393.
Kahneman, D., Treisman, A. & Gibbs, B.J. (1992). The reviewing of object files: Object-specific
integration of information. Cognitive Psychology 24, 174–219.
Kawabe T. (2008). Spatiotemporal feature attribution for the perception of visual size. Journal of
Vision, 8(8):7, 1–9. doi: 10.1167/8.8.7.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt.
Kolers, P. A. (1972). Aspects of Motion Perception. Oxford: Pergamon Press.
Korte, A. (1915). Kinematoskopische Untersuchungen. Zeitschrift für Psychologie, 72, 194-296.
Lin, Z., & He, S. (2012). Automatic frame-centered object representation and integration
revealed by iconic memory, visual priming, and backward masking. Journal of Vision, 12(11):24,
1–18.
19
Lu, Z.-L., & Sperling, G. (2001). Three-systems theory of human visual motion perception: Review
and update. J. Opt. Soc. of Amer. A, 18, 2331-2370
Ma-Wyatt, A., Clifford, C. W. G., & Wenderoth, P. (2005). Contrast configuration influences
grouping in apparent motion. Perception, 34, 669-685.
Mack, A. (1986). Perceptual aspects of motion in the frontal plane. In, Boff, K.R., Kaufman, L., &
Thomas, J.P. (Eds). Handbook of Perception and Human Performance. New York: Wiley.
Moore, C. M., & Enns, J. T. (2004). Object updating and the flash-lag effect. Psychological
Science, 15, 866-871.
Moore, C. M., Mordkoff, J. T., & Enns, J. T. (2007). The path of least persistence: object status
mediates visual updating. Vision Research, 47, 1624-1630.
Neuhaus, W. (1930). Experimentelle Untersuchung der Scheinbewegung. Archiv Für die Gesamte
Psychologie, 75, 315–458.
Nishida, S. (2004). Motion-based analysis of spatial patterns by the human visual system.
Current Biology, 14, 830-839.
Nishida, S., Watanabe, J., Kuruki, I., & Tokimoto, T. (2007). Human visual system integrates color
signals along a motion trajectory. Current Biology, 17, 366-372.
Ogmen, H., Otto, T., & Herzog, M. H. (2006). Perceptual grouping induces non-retinotopic
feature attribution in human vision. Vision Research, 46, 3234–3242.
Öğmen H. (2007). A theory of moving form perception: Synergy between masking, perceptual
grouping, and motion computation in retinotopic and non-retinotopic representations.
Advances in Cognitive Psychology, 3, 67–84.
Otto, T.U., Öğmen, H., and Herzog, M.H. (2006). The flight path of the phoenix-the visible trace
of invisible elements in human vision. Journal of Vision 6, 1079-1086.
Otto, T. U., Ogmen, H., & Herzog, M. H. (2008). Assessing the microstructure of motion
correspondences with non-retinotopic feature attribution. Journal of Vision, 8(7):16, 1–15,
http://journalofvision.org/8/7/16/, doi:10.1167/8.7.16.
Otto, T.U., Öğmen, H., and Herzog, M.H. (2009). Feature integration across space, time, and
orientation. Journal of Experimental Psychology: Human Perception and Performance 35, 1670-
1686.
Otto, T.U., Öğmen, H., and Herzog, M.H. (2010a). Attention and non-retinotopic feature
integration. Journal of Vision 10, 8, 1-13.
Otto, T.U., Öğmen, H., and Herzog, M.H. (2010b). Perceptual learning in a nonretinotopic frame
of reference. Psychological Science, 21(8), 1058-1063.
Pantle, A. J., & Petersik, J. T. (1980). Effects of spatial parameters on the perceptual organization
of a bistable motion display. Perception & Psychophysics, 27, 307-312.
Pantle, A. & Picciano, L. (1976). A multistable movement display: Evidence for two separate
motion systems in human vision. Science, 193, 500-502.
Piéron, H. (1935). Le processus du métacontraste. Journal de Psychologie Normale et
Pathalogique, 32, 1-24.
Pikler, J. (1917). Sinnesphysiologische Untersuchungen. : Leipzig: Barth.
Purushothaman, G., Ogmen, H., Chen, S., & Bedell, H. E. (1998). Motion deblurring in a neural
network model of retino-cortical dynamics. Vision Research, 38, 1827-1842.
Pylyshyn, Z. (1989). The role of location indexes in spatial perception: A sketch of the FINST
spatial-index model. Cognition, 32, 65-97.
20
Ramachandran, V. S., Rao, V. M., & Vidyasagar, T. R. (1974). Sharpness constancy during
movement perception. Perception, 3, 97-98.
Restle, F. (1979). Coding theory of the perception of motion configurations. Psychological
Review, 86, 1-24.
Shimozaki S.S., Eckstein M.P., Thomas J.P. (1999). The maintenance of apparent luminance of an
object. Journal of Experimental Psychology: Human Perception and Performance. 25, 1433–
1453.
Ternus, J. (1926). Experimentelle Untersuchung über phänomenale Identität. Psychologische
Forschung, 7, 81-136.
Wallach, H. (1959). The perception of motion. Scientific American, 201, 56-60.
Wertheimer, M. (1912). Experimentelle Studien uber das Sehen von Bewegung. Zeitschrift fur
Psychologie, 61, 161-265.
Westerink J. H. D. M., Teunissen K. (1995). Perceived sharpness in complex moving images.
Displays, 16, 89–97.