Upload
tinunoom
View
218
Download
0
Embed Size (px)
Citation preview
8/14/2019 Art and Visual Perception Thesis
1/128
THE ART OF SEEING: VISUAL PERCEPTION INDESIGN AND EVALUATION OF
NON-PHOTOREALISTIC RENDERING
BY ANTHONY SANTELLA
A Dissertation submitted to the
Graduate SchoolNew Brunswick
Rutgers, The State University of New Jersey
in partial fulllment of the requirements
for the degree of
Doctor of Philosophy
Graduate Program in Computer Science
Written under the direction of
Doug DeCarlo
and approved by
New Brunswick, New Jersey
May, 2005
8/14/2019 Art and Visual Perception Thesis
2/128
8/14/2019 Art and Visual Perception Thesis
3/128
ABSTRACT OF THE DISSERTATION
The Art of Seeing: Visual Perception in Design and
Evaluation of Non-Photorealistic Rendering
by Anthony Santella
Dissertation Director: Doug DeCarlo
Visual displays such as art and illustration benet from concise presentation of in-
formation. We present several approaches for simplifying photographs to create such
concise, artistically abstracted images. The difculty of abstraction lies in selecting
what is important. These approaches apply models of human vision, models of image
structure, and new methods of interaction to select important content. Important loca-
tions are identied from eye movement recordings. Using a perceptual model, features
are then preserved where the viewer looked, and removed elsewhere. Several visual
styles using this method are presented. The perceptual motivation for these techniques
makes predictions about how they should effect viewers. In this context, we validate
our approach using experiments that measure eye movements over these images. Re-
sults also provide some interesting insights into artistic abstraction and human visual
perception.
ii
8/14/2019 Art and Visual Perception Thesis
4/128
Acknowledgements
Thanks go to the many people whose help and support was essential in making thiswork possible. None of this would have happened without my advisor Doug DeCarlo.
Thanks go also to my other committe members: Adam Finkelstein, Eileen Kowler,
Casimir Kulikowski and Peter Meer for their advice and encouragement at various (in
some cases many) stages of this process.
Thanks go also to the many friends and family members who have supported and
kept me sane through this long process. I wouldnt have survived it without my parents
and brothers Nick and Dennis. Special thanks go to Bethany Weber. Thanks also to
Jim Housell, all the old NYU crowd, the grad group at St. Peters and all the supportive
souls in the CS Department, RuCCS and the VILLAGE.
Finally, thanks go to Phillip Greenspun for photos used in several renderings that
appear in chapters 7 and 9, as well as models Marybeth Thomas, Adeline Yeo and
Franco Figliozzi. Special thanks to Georgio Dellachiesa for looking equally thoughtful
in countless illustrative examples.
iii
8/14/2019 Art and Visual Perception Thesis
5/128
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Inspirations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1. Artistic Practice . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2. Psychology . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.3. Computer Graphics . . . . . . . . . . . . . . . . . . . . . . . 7
1.2. Our Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2. Abstraction in Computer Graphics . . . . . . . . . . . . . . . . . . . . 11
2.1. Manual Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Automatic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3. Level Of Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3. Human Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1. Eye Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1. Eye Movement Control . . . . . . . . . . . . . . . . . . . . . 19
3.1.2. Salience Models . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2. Eye Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3. Limits of Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1. Models of Sensitivity . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2. Sensitivity Away from the Visual Center . . . . . . . . . . . . 26
3.3.3. Applicability to Natural Imagery . . . . . . . . . . . . . . . . 26
iv
8/14/2019 Art and Visual Perception Thesis
6/128
4. Vision and Image Processing . . . . . . . . . . . . . . . . . . . . . . . 30
4.1. Image Structure Features and Representation . . . . . . . . . . . . . 30
4.2. Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3. Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5. Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.1. Eye tracking as Interaction . . . . . . . . . . . . . . . . . . . . . . . 38
5.2. Using Visibility for Abstraction . . . . . . . . . . . . . . . . . . . . . 40
6. Painterly Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.1. Image Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.2. Applying the Limits of Vision . . . . . . . . . . . . . . . . . . . . . 43
6.3. Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7. Colored Drawings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.1. Feature Representation . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.1.1. Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.1.2. Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2. Perceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.3. Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8. Photorealistic Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.1. Image Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.2. Measuring Importance . . . . . . . . . . . . . . . . . . . . . . . . . 65
8.3. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 67
9. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
v
8/14/2019 Art and Visual Perception Thesis
7/128
9.1. Evaluation of NPR . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.1.1. Analysis of Eye Movement Data . . . . . . . . . . . . . . . . 75
9.2. Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.2.1. Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
9.2.2. Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.2.3. Physical Setup . . . . . . . . . . . . . . . . . . . . . . . . . 78
9.2.4. Calibration and Presentation . . . . . . . . . . . . . . . . . . 79
9.3. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.3.1. Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 80
9.3.2. Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . 82
9.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.4.1. Quantitative Results . . . . . . . . . . . . . . . . . . . . . . 86
9.4.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
9.5. Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 92
10. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.1. Image Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9510.1.1. Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.1.2. Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
10.2. Perceptual Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.3. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
11. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .105
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108Curriculum Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116
vi
8/14/2019 Art and Visual Perception Thesis
8/128
List of Figures
1.1. (a) Henri de Toulouse-Lautrecs Moulin RougeLa Goulue (Litho-
graphic print in four colors, 1891). (b) Odd Nerdrums Self-portrait
as Baby (Oil, 2000). Artists control detail as well as other features
such as color and texture to focus a viewer on important features and
create a mood. La Goulues swirling under-dress is a highly detailed
focal point of the image, and contributes to the pictures air of reck-
less excitement. Artists have a fair amount of latitude in how they
allocate detail to create an effect. Nerdrum renders his eyes (usually
one of the most prominent features in a portrait) in a sfumato style
that makes them almost nonexistent. Detail is instead allocated to the
childs prophetic gesture. These choices change a common baby pic-
ture into something mysterious and unsettling. . . . . . . . . . . . . 4
1.2. Judith Schaechters, Corona Borealis (Stained glass, 2001). Skill-
ful artists use the formal properties and constraints of a medium forexpressive purposes. The high dynamic range provided by transmit-
ted light and the heavy black outlines of the lead caming that holds
the glass together are used to set the gure off from the background
creating a powerful image of joy in isolation. . . . . . . . . . . . . . 5
2.1. Direct placement of strokes. Complete control of abstraction is pos-
sible when a user provides actual strokes that are rendered in a given
style. Reproduced from [Durand et al, 2001]. . . . . . . . . . . . . . 11
2.2. Manual annotation for textural indication. Important edges on a 3D
model are marked and have texture rendered near them, while it is
omitted in the interior. Reproduced from [Winkenbach and Salesin,
1994]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
vii
8/14/2019 Art and Visual Perception Thesis
9/128
8/14/2019 Art and Visual Perception Thesis
10/128
4.1. (a) Scale space of one dimensional signal. Features disappear through
scale space but no new features appear. (b) Plot of inection points of
another one dimensional signal through scale space. Reproduced from
[Witkin 1983] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2. Interval tree for 1D signal illustrating decomposition of the signal into
a hierarchy. Reproduced from [Witkin 1983]. . . . . . . . . . . . . . 33
5.1. (a) Computing eccentricities with respect to a particular xation at p .
(b) A simple attention model dened as a piecewise-linear function for
determining the scaling factor a i for xation f i based on its duration
t i. Very brief xations (below t min ) are ignored, with a ramping up (at
t max ) to a maximum level of a max . . . . . . . . . . . . . . . . . . . . 40
6.1. Painterly rendering results. The rst column shows the xations made
by a viewer. Circles are xations, size is proportional to duration, the
bar at the lower left is the diameter that corresponds to one second. The
second column illustrates the painterly renderings built based on that
xation data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2. Detail in background adjacent to important features can be inappro-priately emphasized. The main subject has a halo of detailed shutter
slats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.3. Sampling strokes from an anisotropic scale space avoids giving the
image an overall blurred look, but produces a somewhat jagged look in
background areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.4. Color and contrast manipulation. Side by side comparison or render-
ing with and without color and contrast manipulation (precise stroke
placement varies between the two images due to randomness). . . . . 48
7.1. Slices through several successive levels of a hierarchical segmentation
tree generated using our method. . . . . . . . . . . . . . . . . . . . . 51
7.2. Line drawing style results. . . . . . . . . . . . . . . . . . . . . . . . 60
ix
8/14/2019 Art and Visual Perception Thesis
11/128
7.3. Stylistic decisions. Lines in isolation (a) are largely uninteresting. Un-
smoothed regions (b) can look jagged. Smoothed regions (c) have a
somewhat vague and bloated look without the black edges superimposed. 61
7.4. Renderings with uniform high and low detail. . . . . . . . . . . . . . 62
7.5. Several derivative styles of the same line drawing transformation. (a)
Fully colored, (b) color comic, (c) black and white comic . . . . . . . 62
8.1. Mean shift ltering tends to create images that no longer look like pho-
tographs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
8.2. Photo abstraction results . . . . . . . . . . . . . . . . . . . . . . . . 68
8.3. Photo in (a) is abstracted using xations in (b) in a variety of differ-
ent styles. (c) Painterly rendering, (d) line drawing, (e) locally disor-
dered [Koenderink and van Doorn, 1999], (f) blurred, (g) anisotropi-
cally blurred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.4. (a) Detail of our approach, (b) the same algorithm using an importance
map where total dwell is measured locally. Notice in (b) the leaking of
detail to the wood texture from the object on the desk. Here differences
are relatively subtle; but in general it is preferable to allocate detail in
a way that respects region boundaries. . . . . . . . . . . . . . . . . . 70
8.5. The range of abstraction possible with this technique is limited. With
greater abstraction the scene begins to appear foggy. In some sense it
no longer looks like the same scene. . . . . . . . . . . . . . . . . . . 71
9.1. Example stimuli. Detail points in white are from eye tracking, black
detail points are from an automatic salience algorithm. . . . . . . . . 76
9.2. Illustration of data analysis, per image condition. Each colored collec-
tion of points is a cluster. Ellipses mark 99 % of variance. Large black
dots are detail points. We measure the number of clusters, distance
between clusters and nearest detail point, and distance between detail
points and nearest cluster. . . . . . . . . . . . . . . . . . . . . . . . 80
x
8/14/2019 Art and Visual Perception Thesis
12/128
9.3. Statistical signicance is achieved for number of clusters over a wide
range of clustering scales. The magnitude of the effect decreases, but
its signicance remains quite constantly over a wide interval. Our re-
sults do not hinge on the scale value selected. . . . . . . . . . . . . . 829.4. Average results for all analyses per image. . . . . . . . . . . . . . . 84
9.5. Average results for all analyses per viewer. . . . . . . . . . . . . . . 85
9.6. Original photo and high detail NPR image with viewers ltered eye
tracking data. Though we found no global effect across these image
types, there are sometimes signicantly different viewing patterns, as
can be seen here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.1. A rendering from our line drawing system (b), can be compared to
an alternate locally varying segmentation (c). This segmentation more
closely follows the shape of shading contours. . . . . . . . . . . . . 96
10.2. Locally varying segmentation cannot replace a segmentation hierar-
chy. Another example of a locally varying segmentation controlled by
a perceptual model (c), compared to a rendering from our line drawing
system. Note ne detail in the brick preserved near the subjects headin (c). This is a consequence of the threshold varying continuously as
a function of distance from the xations on the face. . . . . . . . . . 97
10.3. A rendering from our line drawing system demonstrates how long but
unimportant edges can be inappropriately emphasized. Also, promi-
nent lower frequency edges like creases in clothing are detected in
fragments and ltered out because edges are detected at only one scale. 100
10.4. Attempting technical illustration of mechanical parts pushes our image
analysis techniques close to (if not over) their limits. . . . . . . . . . 103
xi
8/14/2019 Art and Visual Perception Thesis
13/128
1
Chapter 1
IntroductionIn all eras and visual styles, artists control the amount of detail in the images they
create, both locally and globally. This is not just a technique to limit the effort in-
volved in rendering a scene. It makes a denite statement about what is important
and streamlines understanding. Our goal is to largely automate this artistic abstrac-
tion in computer renderings. The hope is to remove detail in a meaningful way, while
automating individual decisions about what features to include. Eye tracking allows
the capture of what a viewer looks at and indirectly, what they nd important. We
demonstrate that this information alone is sufcient to control detail in an image based
rendering, and change the way successive viewers look at the resulting image. Our
method is grounded in the mechanisms and nature of visionhow we see and un-
derstand the world. This is an intuitive idea, if often overlooked. Artists must rst
be viewers [Ruskin, 1858] and viewers ultimately consume the resulting images. So,
vision must be central in the design of algorithms for creating imagery.
Vision appears simple and effortless. Because under most circumstances it requires
no conscious effort or exertion, it seems like a trivial operation, something that just
happens, as if the light falling on the eye made one see in the same way it warms a stone.
But sight is the product of an extraordinarily developed and complicated visual system.
In seeing we are all experts, and experts make things seem easy. Without any effort we
can navigate and act in the world and recognize objects even under difcult conditions.
The abilities of our sight outreach even our awareness of them. Experiments have
shown that the eyes of radiologists searching for tumors linger longer over tumors
that they fail to notice and report [Mello-Thoms et al., 2002]. The limited success of
attempts to mimic these human abilities in computer vision systems highlight both the
difculty of the computations involved, and our phenomenal success at them.
8/14/2019 Art and Visual Perception Thesis
14/128
2
The apparent ease with which we see slips when our vision is stressed: struggling
to keep a written page in focus as we fall asleep, searching for a loved ones face in the
shifting crowd of an airport. At these times we become conscious of sight as a struggle
to organize and make sense of the world. This struggle has continual victories, but alsofailures. An old friend waving to us on the street is passed by, a typo make its way into
an important document. The apparent ease of vision also masks our limitations. We
miss much, and are easily overloaded. Sometimes our failures are engineered: a cam-
ouaged soldier, the proverbial ne print. More often, however, they are accidental.
Some information was present, or presented, and we failed to notice it.
Well designed displays of visual information ensure we dont miss anything impor-
tant by careful arrangement and manipulation. A wide variety of techniques are used
to make meaning clear. Detail is put just where it is important, shapes can be changed
or removed, colors and textures enhanced or suppressed. Paintings, sketches, technical
illustrations, and even the most apparently photorealistic of artall products of the hu-
man handhave been simplied and manipulated for ease of understanding. Reality
is complicated and messy. Rather than realism, what is more often desired is verisimil-
itude. We want the appearance of reality which has been organized and structured tomake its meaning clearer, if necessarily more limited than the innite complexity of
reality.
Achieving this kind of clarity has always been the job of artists and designers who
make subjective, but not arbitrary, decisions about what is important, and how to con-
vey it. The ubiquity of digital media creates a need for automation in achieving this
kind of good design. The goal is not to replace the artist who creates a carefully crafted
one-off display, but instead to create a potentially vast number of adaptive displays,
tailored to particular situations and viewers. This information would otherwise be dis-
played in some less well-designed manner, laying more of a cognitive burden on the
user. It has been argued in fact, that avoiding this burden is one of the primary char-
acteristics of powerful art [Zeki, 1999]. If good design can be formalized, this will
8/14/2019 Art and Visual Perception Thesis
15/128
3
enhance understanding and aid effective communication, as well as improve our own
understanding of the workings of visual communication. This thesis presents some
initial steps toward this goal.
1.1 Inspirations
There are many techniques proposed by various artists, and perhaps even more theory
proposed by various researchers and critics on how to achieve good visual design. Yet
it remains imperfectly understood in all of the elds where it has been studied. Because
of this, a successful practical approach must necessarily draw on elements from many
areas of practice and theory. If a practical system is designed to be as general as
possible, its creation can improve understanding of what visual clarity means, and how
it relates to communication. It can also provide a framework in which to unify concepts
and techniques from many elds.
1.1.1 Artistic Practice
One important source of inspiration for this work is artistic practice and practical the-
ory. Artists have always had strong motivation to capture the attention and interest of
uninterested, sometimes hostile viewers. Much ingenuity has been applied to creating
images that are as gripping and clearly communicative as possible. Careful observation
of such images can yield interesting insights (see Figure 1.1). Similarly, artists have
throughout history given advice on the practice of their craft. Theorists and art histo-
rians have tried to make generalizations and analyze techniques [Ruskin, 1857, Gom-
brich et al., 1970, Graham, 1970, Arnheim, 1988]. This is true in graphic design as
well as ne art. Classical texts like Tufte [1990] try to explore the qualities of good
and bad presentations of information and make generalizations from carefully chosen
examples.
However, these instructions and recommendations are often difcult to apply. They
8/14/2019 Art and Visual Perception Thesis
16/128
4
(a) (b)
Figure 1.1: (a) Henri de Toulouse-Lautrecs Moulin RougeLa Goulue (Litho-graphic print in four colors, 1891). (b) Odd Nerdrums Self-portrait as Baby (Oil,2000). Artists control detail as well as other features such as color and texture to focusa viewer on important features and create a mood. La Goulues swirling under-dress isa highly detailed focal point of the image, and contributes to the pictures air of recklessexcitement. Artists have a fair amount of latitude in how they allocate detail to createan effect. Nerdrum renders his eyes (usually one of the most prominent features in aportrait) in a sfumato style that makes them almost nonexistent. Detail is instead allo-cated to the childs prophetic gesture. These choices change a common baby pictureinto something mysterious and unsettling.
are sometimes limited in scope, providing specic instructions for a particular narrow
problem. More often, guidelines are too broad and vague in their application. They
count for their functioning on the judgment of the artist. The advice of artists and
designers often comes in the form of heuristics, rules of thumb to be taken with a grain
of salt, kept in the back of ones mind, and applied when the moment seems right.
Becoming an expert in a visual eld is often a question of cultivating, through practice
and observation, an instinctive sense of when to apply such rules, and conversely when
to break them.
8/14/2019 Art and Visual Perception Thesis
17/128
5
Figure 1.2: Judith Schaechters, Corona Borealis (Stained glass, 2001). Skillfulartists use the formal properties and constraints of a medium for expressive purposes.
The high dynamic range provided by transmitted light and the heavy black outlinesof the lead caming that holds the glass together are used to set the gure off from thebackground creating a powerful image of joy in isolation.
1.1.2 Psychology
A somewhat different approach is to study good design with the methodologies of psy-
chology, psychophysics and neuroscience. This is in essence an attempt to understand
good design from rst principles: the functioning of the human mind and visual sys-
tem. Visual perception obviously mediates all information that passes from a display
to a user. So, as a form of visual communication, art must be constrained by the laws
of psychology and the visual system [Arnheim, 1988, Zeki, 1999,Ramachandran and
8/14/2019 Art and Visual Perception Thesis
18/128
6
Hirstein, 1999]. This is an attractive idea. By understanding the strengths and weak-
nesses of the process that allows us to see, it should be possible to maximize use of the
limited cognitive bandwidth between a display and viewer.
This is perhaps not so far from what artists have done all along. One could view
every daub of paint, every pen stroke as an informal experiment in vision. Artists test
their actions against the evidence of their own visual systems, and make predictions
about how they will affect others. Formal attempts to understand perception and art
are simply more conscious, more systematic, and more interested in understanding the
creative process itself than making a statement through it. A number of psychologists
have speculated on this, and pointed to specic examples from art history [Arnheim,
1988, Leyton, 1992, Zeki, 1999, Ramachandran and Hirstein, 1999]. Studies have in-
deed found empirical evidence of perceptual effects resulting from artistic style or com-
position [Ryan and Schwartz, 1956,Locher, 1996].
Like most attempts to do anything complicated from rst principles, looking at art
and design using cognition is hard. There is much that has been understood about the
visual system, but also much that is not. The more basic and low level the area of
visual function is, the more we know about it, and the less useful that information is
for design. Much for example, is known about the physical mechanism of how we per-
ceive color, substantially less is known about how we parse shapes out of a background
and assemble them into objects. Its not surprising that many researchers looking at art
from a cognitive standpoint consider primarily 20th century painters, like Mondrian,
Kandinsky, or even Picasso at his more abstract, who themselves were largely con-
cerned with the purely formal aspects of pictorial space rather than the semantics of
subject matter. The semantic aspects of vision which reference the rest of the world
and its non-visual aspects are ill understood, so little cognitive research can be brought
to bear on the semantics of art.
Given the limited basic knowledge, general theories of how art functions cogni-
tively are, almost of necessity, rather vague in their application. Ramachandran [1999]
8/14/2019 Art and Visual Perception Thesis
19/128
7
for example, suggests that all art is guided by the peak shift principle. This principle,
found in a number of situations in psychology, says that if a response is trained to some
stimuli, the greatest, or peak, response will be found with a stimulus that is greater than
the one used in training. A depiction functions by emphasizing the features that nor-mally let one know what it is. In this view all art is a form of caricature. However,
this does not tell us the qualities of a successful caricature. In another example, Leyton
[1992] argues that art maximally encodes a causal history that can be read by viewers.
Good art should contain as much information in the form of asymmetry as possible to
stimulate viewers, but not too much, which will disturb them. Though a reasonable
sounding standard, this only hints at what the correct level of complexity is.
The application of psychology to design is difcult. However, we do not need to
build a system directly on these principles. Inspired by them, we can apply knowledge
from low-level vision and computer graphics techniques to build practical systems.
1.1.3 Computer Graphics
A large body of work in computer graphics ignores all these difculties and sets out
to create attractive synthetic art and illustration. Attempts at algorithmic denitions of
good design surface in a number of areas in computer science, graphics, scientic vi-
sualization, document layout, human computer interaction, and interface design. Con-
cerns of effective art-like visual communication have particularly come to the forefront
in the realm of non-photorealistic rendering, or NPR. This area is perhaps excessively
broad. It includes almost any part of graphics that aims to create images that are not an
imitation of reality. It includes things as diverse as computer generation of geometrical
patterns, instructional diagrams and impressionist paintings. NPR images run a gamut
between the purely ornamental and those designed to convey very specic information.
A large area of research in NPR has been the production of many, often quite impres-
sive, phenomenological models for rendering in various traditional media and styles.
There is however an increasing interest in NPR as not just a way to imitate traditional
8/14/2019 Art and Visual Perception Thesis
20/128
8
visual styles, but also as a set of techniques for trying to display visual information in
a concise and abstract way.
The link between concise presentation and imitating traditional artistic styles is not
accidental. Almost all the visual styles of traditional media, line drawings, wood-block
prints, comics, expressionist or impressionist paintings, pencil sketches, necessarily
discard vast amounts of information as a direct consequence of their visual style. There
is, for example, no color or shading in a pure line drawing. However, these images still
carry the essential content that the artist (and viewer) requires of them. Skillful artists
can use the properties and constraints of a medium to enhance the expressiveness of
a work (see Figure 1.2). A brief time spent working with photo lters in a program
like Adobe Photoshop suggests that computer implementations of these styles capture
some of the effects of traditional media, but often in a way that does not adapt to
particular situations with an artists exibility. Artists ultimately can judge their results
as they go. Applying a technique in a blanket manner is often less satisfactory. What
is acceptable as reality in a photograph can look fussy and crowded as a painting.
1.2 Our Goal
Though todays algorithms cannot model the general intelligence of an artist, we argue
that carefully designed systems can make use of minimal user interaction to create
much more expressive images. Specically, we look at modulation of local detail, an
important cue used in traditional art and visualization. Including detail only where it
is needed focuses viewer interest and can help clarify the point of an image. As well
as being a feature of art and illustration, applications in visualization could benet
from this. It would allow the computer to hand-craft displays for clarity and efcient
understanding in a particular situation.
This work does not directly address specic visualization applications. Rather than
exploring visualization directly, art remains the focus, and this thesis remains rmly in
8/14/2019 Art and Visual Perception Thesis
21/128
9
the relm of artistic NPR. Our hope however, is that insights gained in this way should
be applicable to a number of areas in visualization. Art is a particularly good place to
explore the link between cognition and design of displays. Specic applications tend
to distract with their own implementation details and domain constraints. Radiology,for example, is a domain where complexity and high stakes greatly constrain practical
applications. Art encourages a wider view, in which it is easier to look at general
techniques and patterns that are widely useful. Similarly, in evaluation, validation of
a particular system is of limited interest, while evaluation of more general techniques
can provide insights into cognition and be more widely relevant.
Grounding our work in knowledge of visual perception also helps focus attentionaway from application engineering and towards general concepts. We are interested in
interactively efcient methods for achieving expressive NPR images. Knowledge of
visual perception suggests that by exploiting the visual system we can reserve human
effort for just the hardest parts of the process of crafting images, and pass the major-
ity of the work over to a computer. For a computer application, the hardest part of
abstraction is deciding what is important. This is not hard for people, since it is done
instinctively. Deciding what to paint a picture of is the easy part for an artist. It is the
mechanics of turning that intention into an image that takes training, time and effort.
This leads us to a simple, minimally interactive method for controlling detail via eye
tracking. As we will soon see, vision research leads us to believe that where people
look indicates importance. Such areas should be portrayed in detail. Conversely, what
viewers dont look at is unimportant to them and can be removed or de-emphasized.
The same insights about vision that leads to this methodology also leads us to quanti-
tative methods for evaluation. If our approach is successful, increased interest in areas
highlighted with detail should be reected in eye movements. This methodology holds
the promise of images that are carefully crafted for understanding on sound principles,
and can be formally evaluated for effectiveness. Such images and techniques can in
turn serve as a tool for further investigating human vision in a way targeted toward the
8/14/2019 Art and Visual Perception Thesis
22/128
10
questions that are important for crafting images. With more information, even better
techniques and images can be built.
In this thesis we begin in Chapter 2 by laying out the basic problem of control-
ling detail in NPR imagery, and look at the range of techniques that have been usedto address it. In Chapters 3 and 4 we then review the basic background in human and
computer vision underlying our approach to this problem. The nature of vision leads
us to an approach of capturing the intentionality central to design via eye tracking.
Information about where people look alone is sufcient to control detail in a directed
way, allowing us to craft semi-automatic NPR images with much of the attractive and
engaging intentionality of completely hand made art. The basic nature of this interac-
tion is described in Chapter 5. In Chapter 6, 7 and 8 we then present several systems
for creating NPR renderings built on this idea, and discuss their strengths and weak-
nesses. An evaluation of one of these systems is presented in chapter 9, which not only
validates the general approach but gives some interesting insights into abstraction and
human vision. Finally, in Chapter 10 we discuss some directions for future research.
8/14/2019 Art and Visual Perception Thesis
23/128
11
Chapter 2
Abstraction in Computer GraphicsIn any work of art all parts of the picture plane do not receive equal attention from the
artist. Critical areas are more detailed, while others are left relatively abstract. This is
the case even in quite realistic styles, and in technical illustration. Such effects have not
been ignored in computer graphics and NPR. Local control of detail has been addressed
in several visual styles. Whatever the rendering techniques used, important areas can
be identied and depicted with greater detail, or emphasis on delity. Deciding what is
important is difcult to do automatically. Two broad approaches to selecting important
areas can be characterized: manual user annotation, and simple heuristics.
Figure 2.1: Direct placement of strokes. Complete control of abstraction is possiblewhen a user provides actual strokes that are rendered in a given style. Reproduced from[Durand et al, 2001].
2.1 Manual Annotation
At one extreme, near complete control of detail can remain in the hands of a user.
This provides many expressive possibilities at the expense of much interaction. At its
8/14/2019 Art and Visual Perception Thesis
24/128
12
Figure 2.2: Manual annotation for textural indication. Important edges on a 3D modelare marked and have texture rendered near them, while it is omitted in the interior.Reproduced from [Winkenbach and Salesin, 1994].
Figure 2.3: Manual local importance images. Hand painted images can indicate im-portant areas to be rendered in greater detail or delity. Reproduced from [Hertzmann,2001]
furthest extreme the computer becomes merely a digital paintbrush the user directly
manipulates [Baxter et al., 2001]. A number of intermediate approaches exist that aid
the user in the technicalities of creating an image while still giving them complete
control over detail. The earliest work creating a painting-like appearance, or painterly
rendering effect [Haeberli, 1990] took this approach. A user places strokes entirely
by hand, their color being sampled from an underlying source image. The approach
is in effect a form of tracing, where the user ultimately remains in control of stroke
placement and size while, like a traditional media artist, making their own decisions
about which details are important as they go. A similar kind of interaction has been
used [Durand et al., 2001] in generating pencil renderings (see Figure 2.1. The user
places strokes which are shaded and shaped automatically to create a nal drawing.
The same stroke based interactive methods are applicable in 3D [Kalnins et al., 2002].
8/14/2019 Art and Visual Perception Thesis
25/128
13
One step distant from actually drawing strokes, it is also possible to indicate in-
creased importance for some areas of a rendering using an importance map , where
higher intensity indicates the need for more attention or detail in that area. For exam-
ple in a painterly rendering framework [Hertzmann, 2001], a hand drawn importancemap was used to indicate that a source image should be more closely approximated in
certain locations (see Figure 2.3). Similarly, [Winkenbach and Salesin, 1994] in 3D
hand drawn lines have been used to indicate locations near which textural detail should
be included (see Figure 2.2). In another painterly rendering application [Gooch and
Willemsen, 2002] rectangles to be painted in greater detail could be drawn by hand.
Various digital versions of other media, such as pen and ink [Salisbury et al., 1994]
and watercolor [Curtis et al., 1997] have been developed that provide the user with a
signicant control over the detail present in different areas. Such approaches can yield
attractive results, but require careful attention on the part of a user.
(a) (b) (c)
Figure 2.4: (a) original image. (b) corresponding salience map [Itti et al, 1998]. (c)corresponding salience map [Itti and Koch, 2000]. Salience methods picks out poten-tially important areas on the basis of contrast in some space (not limited to intensity).The two methods pictured here differ in the method of normalization used to enhancecontrast between salient and nonsalient regions.
2.2 Automatic Methods
More common in NPR have been purely automatic methods. Automatic methods also
run a gamut, from approaches that process an image in a completely local, uniform
manner to those that automatically extract some quantity from an image as a proxy for
8/14/2019 Art and Visual Perception Thesis
26/128
14
importance. Uniform approaches perform some (not necessarily local) operation uni-
formly across an image, and have been used extensively in painterly rendering [Hertz-
mann, 1998,Litwinowicz, 1997, Shiraishi and Yamaguchi, 2000]. A global effect pro-
vides users with only limited control. Rather than being truly uniform, some of theseapproaches make a (largely implicit) simple assumption that some low level features
are important and worth preserving. Automatic painterly rendering methods for ex-
ample, largely assume strong high frequency features are important and should be
preserved in a rendering. In fact, painterly techniques vary largely in their method
for respecting these boundaries: aligning strokes perpendicular to the image gradi-
ent [Haeberli, 1990], terminating strokes at edges [Litwinowicz, 1997], or drawing in
a coarse-to-ne fashion [Hertzmann, 1998, Shiraishi and Yamaguchi, 2000, Hays and
Essa, 2004]. Similarly, automatic line drawing approaches (both 2D and 3D) assume
the importance of all lines that meet certain purely geometrical denitions, occluding
contours, creases, [Saito and Takahashi, 1990,Interrante, 1996,Markosian et al., 1997],
and suggestive contours [DeCarlo et al., 2003]. Such techniques can create attractive
images, but lack the selective omission which gives art much of its expressive power.
The kind of omission commonly used in depicting specic objects can sometimesbe explicitly stated. In drawing trees for example, [Kowalski et al., 1999,Deussen and
Strothotte, 2000] you can avoid drawing detail in the center of the tree, especially as the
tree is drawn smaller. Though this may be an accurate characterization of a particular
common style of depiction, it is not generally applicable to any subject.
For general images, there are relatively few options for automatically selecting
important areas. Some attempts have been made to predict importance using various
image analysis techniques. In 3D, image pyramids have been applied to omit detail in
the interior of a shape [Grabli et al., 2004]. In 2D, drawing on vision research, some
approaches have attempted to use salience measures to capture importance. Salience
measures are a guess at the ability of a feature to capture interest based on its low level
properties [Itti et al., 1998,Itti and Koch, 2000]. Similarly motivated salience measures
8/14/2019 Art and Visual Perception Thesis
27/128
15
have been applied to attempt to predict features worth preserving in painterly rendering
[Collomosse and Hall, 2003]. Because faces are often an important component of
images, detecting them also provides a useful (though not always reliable) automatic
cue for what areas are important. Face detection has been used alongside saliencemethods in other areas of graphics loosely related to NPR where identifying important
features is useful, such as automatic cropping [Chen et al., 2002, Suh et al., 2003] and
recomposing of photographs [Setlur et al., 2004].
2.3 Level Of Detail
An area of computer graphics left out in the above discussion has dealt with many of
these same issues. Various adaptive rendering and level of detail ( LOD ) schemes have
used the visibility or potential interest of features to skip computations that are unlikely
to be noticed. This is different from our goal. We are interested in detail modulation for
stylistic and expressive reasons. Level of detail seeks to control the computational cost
of rendering through approximation, not abstraction. Though both are concerned with
simplication, LOD and various other corner cutting is usually meant to be invisible,
or nearly so, while expressive abstraction is meant to be seen and indeed have a strong
effect on the way a viewer looks at an image. Though the goals are different, some
of the methodologies overlap. The goal of imperceptible omission has encouraged
researchers to look at perceptually motivated methods. Salience measures have been
applied to concentrate computation on noticeable areas, [Yee et al., 2001, Cater et al.,
2003]. In addition, a variety of low level perceptual models have been applied to try
to quantify the visibility of features and guarantee that simplication is invisible, or
minimize visibility. We adopt several of these metrics in our own efforts. One of
our contributions can be seen as applying and expanding perceptual models originally
adopted in LOD to create expressive artistic abstraction.
Both perceptually motivated LOD methods and the methods we present in this
8/14/2019 Art and Visual Perception Thesis
28/128
16
thesis use models of vision to identify expendable areas of an image. It is the functional
denition of an expendable area that differs between the two. In the following chapter
we present the relevant background in human vision necessary for understanding why
such areas exist, and how they may be identied.
8/14/2019 Art and Visual Perception Thesis
29/128
17
Chapter 3
Human VisionA background in human vision is essential in computationally dening artistic abstrac-
tion. We have extraordinarily complex abilities to analyze images, these abilities have
weaknesses and strengths. Level of detail simplication methods seek to exploit the
limits of vision to cut corners in an unnoticeable way. In contrast, we hope to use the
related strengths of the visual system to improve visual design, clarifying content and
make things that need to pop out, pop out. Our interactive technique uses eye move-
ments and the limits of vision to indirectly measure the importance of features. Some
background will clarify the motivation for this approach.
3.1 Eye Movements
The human eye is maximally sensitive over a relatively small central area called the
macula. This area of relatively high resolution is approximately 5 degrees across, while
the most sensitive region (the fovea) is only 1.3 degrees (from a total visual angle of
about 160 degrees) [Wandell, 1995]. Sensitivity rapidly degrades outside of this central
region. Our perception of uniform detail throughout space is a result of continually
switching the point at which our eyes are looking (the point of regard or POR).
This process involves two important types of eye motions: xations , relatively long
periods spent looking at a particular spot, and saccades , very rapid changes of eye po-
sition. These are not the only kinds of motion of which the eye is capable. In smooth
pursuit the eye follows a moving object, and even when xated the eye continually
makes very small jittery motions. Fixations and saccades however are the most signif-
icant motions when viewing static scenery. Saccades can be initiated consciously, but
for the most part occur naturally as we explore a scene. Though xating on a location
8/14/2019 Art and Visual Perception Thesis
30/128
18
Figure 3.1: Patterns of eye movements of a single subject over an image when givendifferent instructions. Note (1) free observation which shows xations that are rel-atively dispersed yet still focused on relevant areas. Contrast it with (3) where theviewer is instructed to estimate the gures ages. Reproduced from Yarbus 1967.
8/14/2019 Art and Visual Perception Thesis
31/128
19
is not identical to attending it, for the most part an attended location is xated, (i.e. if
we pay attention to something, we strongly tend to look at it directly) [Underwood and
Radach, 1998].
Figure 3.2: Similar effects to [Yarbus, 1967] are easily (even unintentionally) achievedwhen using eye tracking for interaction. Circles are xations, their diameter is propor-tional to duration. The rst viewer was instructed to nd the important subject matterin the image. The second viewer was told to just look at the image. The viewer as-sumed, from prior experience in perceptual experiments, that he was going to be laterasked detailed questions about the contents of the scene. This resulted in a much morediffuse pattern of viewing.
3.1.1 Eye Movement ControlQualitatively, a great deal is known about xations. Eye movements are highly goal
directed. Viewers dont just look around at random. Instead, they xate meaningful
parts of images [Mackworth and Morandi, 1967, Underwood and Radach, 1998, Hen-
derson and Hollingworth, 1998], and xation duration is related to processing [Just
and Carpenter, 1976, Henderson and Hollingworth, 1998]. Viewing is highly inu-
enced by task. The classic example of this [Yarbus, 1967] showed that viewers ex-
amining the same image, with different tasks to perform, showed drastically differ-
ent patterns of viewing, in which they focused on the features relevant to their task
(see Figure 3.1). Given the same task, the motions of a particular viewer over an
image at different viewings can be quite different, yet the overall distribution of x-
ations remains similar [Yarbus, 1967]. In real activities, actions, even those thought
8/14/2019 Art and Visual Perception Thesis
32/128
20
of as automatic, are usually preceded by (largely unperceived) xations of relevant
features [Land et al., 1999]. These effects have been noted from some of the earliest
research in the eld [Yarbus, 1967], but the mechanisms involved remain for the most
part informally understood.In general, understanding of most higher-level aspects of eye movement control
is largely qualitative. In limited domains such as reading, attempts have been made
to formulate mathematical models of viewing behavior. For complex natural scenes,
much less is known [Henderson and Hollingworth, 1998]. Clearly any information
used in guiding eye movements must come from the scene. Likewise, the process of
selecting a new location to view must be guided in part by low frequency information
gathered from the periphery during earlier xations. A matter of debate is whether low-
level visual information gained like this is a direct control of behavior or whether it is
primarily used when integrated into a higher level understanding. The precise factors
involved in control and planning of eye movements are an active and highly debated
topic [Kowler, 1990].
3.1.2 Salience ModelsMuch effort has gone into attempts to identify purely low-level image measurements
that can account for a signicant amount of viewing behavior. Clearly it would be inter-
esting if what appears to be a highly complex behavior requiring general understanding
could be modeled or at least reasonably predicted by a simple approach. Results have
been mixed. Fixation locations do not correlate very well over time with the presence
of simple low level image features such as areas of high contrast, junctions, etc... [Un-
derwood and Radach, 1998].
More complex models have been formulated, such as the salience methods men-
tioned earlier. All measure contrast in one sense or another. In general, salience meth-
ods embody the assumption that unusual features are likely to be important and looked
at. Choice of feature space, and scale of measurement and comparison differ. One
8/14/2019 Art and Visual Perception Thesis
33/128
21
popular approach [Itti et al., 1998, Itti and Koch, 2000] uses center surround lters to
measure local contrast in color, orientation and intensity to model general viewing be-
havior. [Rosenholtz, 2001] uses a probabilistic framework to measure the probability of
a feature given a Gaussian model of color or velocity in the surround. This was used topredict visual search performance. A related salience framework was proposed [Walker
et al., 1998] to select unique image locations to match for image alignment. This ap-
proach used kernel estimation to measure the rarity of local differential features in the
global image wide distribution of those features.
These approaches share the same basic idea but vary in what they attempt to model.
This begs the question of what one is really trying to capture with salience. One can
look at salience as simply a quantitative method of deciding whether something is
present in a particular location in the visual eld. In this context, salience doesnt actu-
ally state the location is important, just that it might be because something is there. It
seems quite plausible that a measure like this plays a role in perception. However, more
is usually claimed for salience, for example that it predicts most of viewing behavior
or the valuable content in an image.
Salience would seem to have some additional predictive power because in a wide
class of images the semantically important subject does contrast with the rest of the
scene. Relatively few people take pictures of their family members dressed in camou-
age and lurking in the bushes. Nobody takes a picture of a leaf of grass in a eld. The
tendency of meaningful features to be visually prominent is by no means universal. It
is also unclear if this is really a property of the world, or a property of pictures people
take, but it does seem to underlie some of the success of salience as an engineering tool
in graphics.
Salience models have also been used to model viewing in narrower domains where
their applicability is more clear. The presence or absence of pop out effects in search
for example [Rosenholtz, 1999, Rosenholtz, 2001] is effectively modeled by simple
salience models that measure how distracting a distracter actually is.
8/14/2019 Art and Visual Perception Thesis
34/128
22
Debate about how useful salience is in understanding general viewing is ongoing.
Some optimistically state that salience predictions correlate well with real eye motions
of subjects free viewing images [Privitera and Stark, 2000,Parkhurst et al., 2002]. Oth-
ers are more doubtful and claim that when measured more carefully and in the contextof a goal driven activity, the correlation is quite poor [Land et al., 1999, Turano et al.,
2003]. This mismatch in experimental results ts the intuition that visually promi-
nent, eye catching features might be more correlated with idle exploration of a scene,
and much less related to eye movements made during a task. In spite of this contro-
versy, salience methods are quite popular and have seen a fair amount of application
in computer graphics. They show some correlation with visually prominent features
and are fairly simple to implement. Code for some is publicly available. Clearly both
semantics and low-level features play a part in eye movements. Further investigation
is necessary to clarify the contributions to viewing behavior of salience and scene se-
mantics. Though they seem unable to model important aspects of viewing behavior,
salience models may provide important measures of visual prominence.
3.2 Eye Tracking
Much of the knowledge above about human eye motion has been gained through the
use of eye-tracking. A system measures a viewers eye in one of several manners
and records the point where it is looking, termed the point of regard or POR . One
common approach involves a video camera and an infrared light source. The relative
positions of the pupil and corneal reection in the resulting image are used to calculate
point of regard [Duchowski, 2000]. These systems are reasonably reliable and accurate
and improve with each generation, though they are still subject to drift over time and
variability between viewers. The same technology is used in producing units that sit
in front of a xed display, and in head mounted units for use in more general scenes.
Video based trackers have the virtue of not interfering directly with a viewer, making
8/14/2019 Art and Visual Perception Thesis
35/128
23
them useful as both a natural interactive method and a research tool.
Outside of research in human vision, eye-trackers have seen increasing use as a
mode of human computer interaction. It has also enabled the use of eye movements
as a gauge of cognitive activity for psychological investigations and for evaluation of
visual displays.
Eye position has been used as a cursor for selection tasks in a GUI [Sibert and Ja-
cob, 2000]. They have also been used to indicate a users attention to others in a video-
conferencing environment [Vertegaal, 1999]. Another class of use, related to ours, uses
POR to control simplifying images or scenes for efciency purposes. Knowing where
a user looks enables pruning of information that is not perceptible, and need not be
transmitted in a video stream [Duchowski, 2000]. Similarly, unexamined content need
not be rendered in a 3D environment. In practice, few current systems that make use
of such simplication actually use eye tracking, presumably because of limited avail-
ability, head tracking is typically used instead [Reddy, 2001].
On the whole, eye tracking has been found more useful in interaction where it
serves as an indirect measure of user interest. Eye movements are not under full vol-
untary control. Because of this, when viewers attempt to explicitly point with their
eyes the result tends to lack control and suffer from the so called Midas Touch prob-
lem [Jacob, 1993] where struggling to control eye position, like a cursor, based on
visual feedback creates even more uncontrolled looking, touching on many irrelevant
or undesirable locations.
The same involuntary link of eye movement to thought processes that makes eye
tracking a bad mouse have made it useful as an indirect measure of interest and cog-
nitive activity. Eye tracking has been used to evaluate the effectiveness of informa-
tional displays including application interfaces [Crowe and Narayanan, 2000], web
pages [Goldberg et al., 2002], and air trafc control systems [Mulligan, 2002]. As
mentioned earlier, eye movements may even reveal information that viewers are trying
to report, but cannot, because it is not consciously available. Experiments have shown
8/14/2019 Art and Visual Perception Thesis
36/128
24
that professional radiologists examining slides look longer at locations where tumors
are present, even when they fail to identify and report them [Mello-Thoms et al., 2002].
In the future, this might hold the promise of computer assisted technologies to avoid
such mistakes. Several consulting companies currently sell evaluation services usingeye tracking to graphic design houses and web content creators among others 1 .
3.3 Limits of Vision
Eye movements are related to the resolutional limitations of the eye. At any of the x-
ations with which a viewer explores a scene, the most detailed information is received
only in the fovea, but lower frequency information is received throughout the visualeld. These limits on sensitivity within the visual eld are not a weakness of the visual
system. On the contrary, they are part of our ability to efciently process wide elds
of view and integrate information across eye movements and changes in viewpoint.
3.3.1 Models of Sensitivity
Quantitative models of visual acuity and contrast sensitivity have been developed tomodel sensitivity to stimuli with different properties. Models of acuity predict whether
an observer can detect a black feature of a particular size on a white background. Con-
trast sensitivity measures an observers ability to discriminate a repeating pattern of a
particular contrast and frequency from a uniform gray eld. The drop-off in these sen-
sitivities away from the visual center is modeled as a function of eccentricity , location
relative to the point of xation.
Contrast sensitivity has been studied extensively in a variety of conditions usually
using monochromatic sinusoidal gratings (smoothly varying, repeating patterns of light
and dark bands). This sensitivity declines sharply with eccentricity [Kelly, 1984, Man-
nos and Sakrison, 1974,Koenderink et al., 1978]. Contrast threshold is dened as the
1http://www.eyetools.com, http://www.factone.com, http://www.veridicalresearch.com
8/14/2019 Art and Visual Perception Thesis
37/128
25
(unitless) contrast value (0 to 1 with 1 being maximal contrast) at which a grating and
uniform gray become indistinguishable. Contrast sensitivity is the reciprocal of this
value.
100
101
10-2
10-1
100
101
102
103
Contrast Sensitivity
frequency cycles/degree
i n v e r s e
c o n
t r a s
t
visible
invisible
Figure 3.3: Log-log plot of contrast sensitivity from equation (3.2) This function isused to dene a threshold between visible and invisible features.
Many researchers have empirically studied human contrast sensitivity and several
have developed mathematical models from their data. Researchers in computer science
have also used existing data and models in applications. Different aspects of a stimuli
are important in different situations. Fitting models to data collected from different
viewers under different circumstances gives somewhat different results. Two examples
are given here to illustrate the form these mathematical models take.
Kelly [1984] developed a mathematical model for the contrast sensitivity curve (at
the center of the visual eld) including appropriate scaling factors describing the effects
of velocity ( v) as well as frequency ( f in cyles/degree) of a grating on sensitivity.
A( f , v) = ( 6.1 + 7.3(log 10 (v/ 3)3)v f 2e 2 f (v+ 2)/ 4.59 (3.1)
Mannos and Skarinson [1974] t a mathematical model appropriate to still imagery
to results of prior empirical studies for use as a metric in evaluating image compression.
8/14/2019 Art and Visual Perception Thesis
38/128
26
A( f ) = smax 2.6(0.0192 + 0.144 f )e (0.144f )1.1
(3.2)
Where smax is the peak contrast sensitivity (this is around 400, but varies from
person to person).
3.3.2 Sensitivity Away from the Visual Center
A number of researchers have explored how sensitivity varies with eccentricity [Kelly,
1984,Rovamo and Virsu, 1979]. At larger eccentricities (expressed in degrees of visual
angle) the contrast sensitivity function is multiplied by another function which models
the drop-off of sensitivity in the visual periphery. This function is termed the cortical
magnication factor. It is not radially symmetric, but drops off faster vertically than
horizontally. It can be approximated [Rovamo and Virsu, 1979] with separate formulas
for decrease in sensitivity in four areas. For simplicity a bound from the most sensitive
area can be used in estimating visibility [Reddy, 2001, Reddy, 1997].
M (e) =1
1 + 0.29e + 0.000012 e3 (3.3)
The cubic term can usually be ignored, as its contribution in the range of eccentricities
normal in a screen display is negligible [Reddy, 1997]. The contrast sensitivity is then
M (e) A( f ).
3.3.3 Applicability to Natural Imagery
Some caution is necessary in applying these models derived from simple monochro-
matic repeating patterns to complex natural imagery. Though these models have been
applied with good results in graphics [Reddy, 2001], our goal of creating visible ab-
straction rather than conservative level of detail is more ambitious, and more likely to
stress the models involved.
8/14/2019 Art and Visual Perception Thesis
39/128
27
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
eccentricity degrees
Cortical Magnification
Figure 3.4: Cortical Magnication describes the drop-off of visual sensitivity withangular distance from the visual center.
How to measure contrast is relatively obvious in gratings, there are only two ex-
trema. A single contrast exists for the entire grating. Between two regions in a scene
the meaning of contrast is less clear. Regions are neither uniform in color nor uni-
formly varying. No strong perceptually motivated approach to this problem appears to
have been formulated. Lillesaeter [1993] attempts to address this by dening a contrastbetween a nonuniform gure and ground. This contrast measure is a weighted aver-
age of the contrast between the region and background and the integral of the contrast
along the edge of the region. This is demonstrated to provide more intuitive results
than simpler alternatives on regions with at colors. Issues related to sampling in real
images are not addressed. Measuring contrast in a color image presents another prob-
lem. Contrast in colored gratings has been studied, and much work has been done in
general on color perception. However, there does not appear to be a simple general
contrast sensitivity model dened in color space [Regan, 2000]. Adapting a luminance
based model therefore remains a plausible course of action in designing a model for a
practical application.
Applying the notion of visibility for a grating to a non-repeating pattern of regions
8/14/2019 Art and Visual Perception Thesis
40/128
28
also presents problems. The hump-like shape of the contrast sensitivity curve tells us
something counterintuitive if the size of an area is treated as proportional to an inverse
frequency [Reddy, 2001]. Very low frequencies are much less visible than some higher
ones at a given contrast. This is because detectability of a grating is related to thedensity of receptive elds of corresponding size. There are upper bounds on the size of
human receptive elds. Intuitively, a large slowly varying sine wave may be difcult
to see.
This has been less of a concern in previous work where judgments were being made
mostly about high frequency parts of the curve [Reddy, 2001], but will be noticeable
when visibly abstracting images.
It can be argued [Reddy, 1997] that natural images, at least in places (and certainly
the uniform color regions that we will ultimately use in rendering) more closely resem-
ble square wave, rather than sine, gratings. Since a square wave can be approximated
by the sum of an innite sequence of sine waves, and sensitivity to combined sinu-
soidal patterns is closely related to that of the independent components [Campbell and
Robson, 1968] one might think the visibility for low frequency square waves would
be higher than that for equal frequency sine waves. The actual relation has been stud-
ied empirically [Campbell and Robson, 1968] and conrms this intuition. For square
waves at frequencies below about 1 cycle/degree sensitivity levels off rather than drop-
ping. A theoretical derivation of the difference is presented in [Campbell and Robson,
1968]. It matches some but not all features of the empirical data.
These concerns remind us that when applying these models to real images they
cannot serve as an accurate absolute perceptual measure of visibility. Rather, they
provide a plausible relative sense of the visibility of different features. The absolute
contrast or acuity threshold at which a feature becomes visible is not necessary for our
application. What is important is the relative ordering of feature visibility, that allows
us to create a prioritization. It is necessary to model visual sensitivity only up to the
level where results correspond to our intuitions about this prioritization.
8/14/2019 Art and Visual Perception Thesis
41/128
29
To apply these models in actual scenes, we need to decide on a denition of the
features whose visibility we are judging with these methods. For example, these mod-
els have been used in 3D level of detail [Reddy, 1997] to avoid rendering invisible
features. In this context the obvious choice of feature is a polygon which may or maynot be included in the rendering. For images the choice is less clear, as image prop-
erties can be measured in an unstructured, local way or an image can be partitioned
into a more structured representation. We review some of the possibilities for image
representation in the following chapter.
8/14/2019 Art and Visual Perception Thesis
42/128
30
Chapter 4
Vision and Image Processing4.1 Image Structure Features and Representation
(a) (b)
Figure 4.1: (a) Scale space of one dimensional signal. Features disappear throughscale space but no new features appear. (b) Plot of inection points of another onedimensional signal through scale space. Reproduced from [Witkin 1983]
Image representation and processing is a large eld of relevance in both human and
computer vision. We concentrate on some basic concepts relevant to the task of simpli-
fying images. Scale space theory provides a way of characterizing the different scales
of information present in an image and making correspondences between features at
different scales. Segmentation divides an image into distinct regions, enabling an ex-
plicit, non-local representation of image content. Edge detection provides a measure
of the prominent boundaries in an image.
An important unifying concept in image analysis is that the same image data can
be represented in many forms. In any of these certain information in the image is
explicit and other information is less easy to access [Marr, 1982]. The information and
representation appropriate is task dependent. A variety of representations with different
properties are available. With the exception of 3D techniques, NPR applications have
largely used low-level representations, often functioning locally on the original image
itself. However, human artistic processes operate on richer representations. Ruskin,
8/14/2019 Art and Visual Perception Thesis
43/128
31
one of the 19th centurys most prominent art historians and theorists, famously argued
that in teaching art technique, the most important lesson was teaching the student to
see [Ruskin, 1858]. There seems to be an assumption in image based NPR that seeing
is simply capturing a bitmap representation of the scene, and that it can be consideredaccomplished in the presence of a source photo. Human vision however is much more
than simply capturing an image. If a computer is to produce artistic renderings that
capture some of the expressiveness of real art, especially in highly abstracted styles,
some higher level representation is necessary, analogous to those created in the artists
head as she understands the scene before her, and begins to paint. The better suited
this representation is to the task, the easier it should be to drastically simplify an image
while retaining its important features.
The lowest level representation is the image itself, analogous to the retinal image.
This is the starting point of any further representation, making explicit the light inten-
sities at each pixel. There is structure here that can be more explicitly represented in
other ways. Information in the image exists over a variety of scales, small and large
features, making up parts and whole objects in the scene.
One common way to come to terms with the multiple scales of information in an
image is through its scale space . From a single image, a three dimensional stack of
images is generated in which each contains progressively coarser scale information.
Again, this representation has an analogue in human vision where neurons have recep-
tive elds of different sizes, in effect generating a multi scale representation from the
retinal image.
Scale space has come to refer to such a space of increasingly simple images gener-
ated by a range of processes. Generically this can be thought of as a stack of images
with decreasing information contained at each level as scale increases. This stack is
in theory continuous, in practice sampled at some discrete interval. Starting with the
original image, detail is progressively lost until a uniform color is all that remains (see
Figure 4.1).
8/14/2019 Art and Visual Perception Thesis
44/128
32
A number of constructions for such a space have been developed. Perhaps the sim-
plest approach creates something like an image pyramid, successively downsampling
the image so it is more coarsely pixelated. This approach has a problem in that de-
tailed, high frequency information (the edges between the new larger pixels) may havebeen introduced which was not in the original image. This is the problem of spurious
resolution [Koenderink, 1984]. New information has been hallucinated into existence
by imposing a coarser grid structure on the data. Convolution with a Gaussian kernel
(blurring) generates a space that avoids this problem [Witkin, 1983,Koenderink, 1984].
In fact this blurring has been proven [Koenderink, 1984] to be the unique way to gen-
erate a scale space which is both uniform or uncommitted, (i.e., the process is uniform
across image space and through the scale dimension), and also avoids spurious reso-
lution. Information disappears but cannot be created. In one dimension, this ensures
that any feature will only disappear as scale increases. In two dimensions new features,
maxima for example can appear. However in both cases clear judgments can be made
about what features exist at what range of scales.
That the process of blurring is uniform is an advantage in that ltering can be
applied to any signal, one doesnt need to have a model of what the important featurespresent are. A disadvantage is that coarser features are more coarsely located, the
blurring process that reveals them distorts their spatial extent.
If you know what youre looking for, there is no reason why the blurring operation
must be uniform or uncommitted. A number of nonuniform or nonlinear scale spaces
have been formulated which do not introduce false content but remove information
selectively in certain locations. One of the best known of such methods is anisotropic
diffusion [Perona and Malik, 1990]. Here the diffusion process is not uniform but rather
inversely proportional to the magnitude of the gradient at any position. This results in
an edge preserving blurring which removes low contrast detail while preserving strong
edges. This has the advantage that edges are better preserved in their initial location
until the point at which they disappear. Niessen et al [1997] compares this and several
8/14/2019 Art and Visual Perception Thesis
45/128
33
other nonlinear methods in the context of segmentation. Nonlinear methods perform
well but are signicantly more expensive.
A practical application must sample the continuous scale space at some discrete
intervals. One would like to sample sufciently nely to capture interesting events,the order of disappearance of different features, but not more densely than need be.
Looking at the linear scale space, Koenderink [1984] derives an appropriate sampling
as logarithmic along the scale axis corresponding to a uniform sampling in the scale
parameter t , the standard deviation of the Gaussian kernel used in blurring. This is
intuitive. At small scales many tiny regions are merging quite often, requiring dense
sampling. At higher scales, there are fewer regions, fewer events to capture, and much
less dense sampling in t is required. The issue is the same for nonlinear spaces. Re-
lating scales in different spaces is not straightforward. Some attempt at doing this has
been made in [Niessen, 1997].
Figure 4.2: Interval tree for 1D signal illustrating decomposition of the signal into ahierarchy. Reproduced from [Witkin 1983].
While a scale space such as this begins to capture structural relations of features
across scales, this is still largely an implicit representation. To make this explicit,
features at different scales need to be directly related to each other. Witkin [1983]
8/14/2019 Art and Visual Perception Thesis
46/128
34
addresses this problem in 1D signals. In the scale space of a one dimension signal
features will never appear at coarse scales. So, any features found at a coarse scale
can (if the sampling is dense enough) be traced directly back to their ne scale origin.
This allows localization of features found at a coarse level. Witkin demonstrates thischoosing as a feature zero crossings in the second derivative, inection points in the
signal (Figure 4.1).
Similarly, using these correspondences across scale it is also possible to create
a structure that captures the relationship between all features at all scales. Intervals
between two zero crossings (which again correspond to sections of the signal between
two inection points) disappear in only one way. Two successive zero crossings merge
together, with the result that three intervals, the one between the crossings and those on
either side, merge into one. These three intervals can be made children of the resulting
interval to create an interval tree which characterizes the structure of the signal at all
scales. Witkin observes that those intervals which have longer persistence through
scale space appear to be those identied by human observers as subjectively salient or
important in the signal.
Extending this nice analytical derivation to a practical application in 2D is not
trivial. In 2D features such as maxima, or curves dened by inection points may split
into two at coarser scales. Koenderink [1984] suggests the use of equiluminance curves
in the image as a 2D equivalent to Witkins intervals. Generic equiluminance curves
form a single closed curve. There are two singularities: extrema where the curve is
just a point, and saddle points where the curve forms multiple loops which intersect at
one point. Each loop may contain other saddle points and has to contain at least one
extrema [Koenderink and van Doorn, 1979]. The nesting of these saddle points gives
the structure of the image regions. Though new saddle points may appear inside a loop,
centermost saddle points must disappear before outer ones. Because of this the saddle
points present at all scales can be represented as a tree. Such a structure is difcult to
calculate in practice. It is not obvious how to nd these saddle points efciently or if
8/14/2019 Art and Visual Perception Thesis
47/128
35
they provide a subjectively intuitive partitioning of the image. In addition its not clear
how color could be handled. In a naive approach, each band would produce its own
surface with its own saddle points, resulting in 3 separate scale space trees that would
need to be unied in some way.
4.2 Segmentation
The process described above of dividing up a signal based on the intervals between
features is a particular approach to the general problem of segmentation. This problem
again occurs in both computer and human vision. Segmentation makes explicit the
association (or disassociation) between different areas of an image. It produces an ex-
plicit representation of parts of the image that are associated with each other, assigning
each pixel to one, usually connected group or region. These regions should be uniform
by some measure. Separate regions, at least the adjoining ones, should be markedly
different. How people do this, parsing shapes and objects from the background is only
partially understood. In computer vision, a tremendous variety of methods have been
devised to dene similarity measures for this using color, gray scale intensity, texture
etc. This segmentation is usually a partitioning of an image at a single scale. However
it is sometimes desirable to dene a segmentation over a range of scales.
Scale space has been considered in segmentation. It is typically used to make seg-
mentations produced with other methods more robust. Niessen et al [1997] link pixels
with their neighbors who have similar color in both the spatial and scale dimensions
to create a hierarchy. The end product is a single at segmentation taking its set of
regions from a coarse scale and their spatial extent from a ne scale. A similar ap-
proach is taken in Bangham et al [1998]. Here, the desire is to create a hierarchical
segmentation tree that describes the image over a variety of scales. An alternate ap-
proach [Ahuja, 1996] creates a multi-scale representation without explicitly generating
a scale space.
8/14/2019 Art and Visual Perception Thesis
48/128
36
Each of these methods compute a hierarchical representation of image structure.
However, there is no clear relation between the hierarchy and the theoretical hierarchy
induced by scale space. This is not a major concern; scale space structure is attractive
because of its simple formal denition, but is not the single correct answer in anymeaningful sense for a given practical application. Hierarchical representations are
not general purpose, desirable properties depend on the application. For the purposes
of image abstraction, an important question is whether each subtree in the structure
represents some coherent area or region. This is guaranteed in some geometric sense
by scale space proper, since nodes occur in the tree only when features disappear. In
contrast, methods for building a hierarchy that i