Exploring Human Sketching Process - UC Berkeleyvis.berkeley.edu/courses/cs294-10-fa13/wiki/images/4/4c/Main.pdf · 101 to teach people sketching with varieties types of guidance

Online Submission ID: 0000

Exploring Human Sketching Process

Figure 1: We visualize temporal trend of common sketching practice across multiple human subjects. In each row, we show the averagesketching results with 3%, 10%, 25%, 42%, 70%, 100% completion rate respectively for a particular object category.

Abstract1

In this paper, we aim to build an interactive system for visualizing2

and understanding how human sketch objects. We implement two3

key techniques: Video Averaging and Generalized Selection to al-4

low users (a) visualize temporal information of drawing sequences5

(b) understand common sketching practice across different human6

subjects (c) identify multiple modes from data and detect unusual7

sketching behavior.8

1 Introduction9

Humans have used sketching to describe visual concept and tell10

visual story from antiquity till today. Recently, Eitz et al. com-11

piled a very nice sketched object dataset including 20, 000 unique12

sketches evenly distributed over 250 object categories [Eitz et al.13

2012a], which is the first large-scale study of non-expert sketches14

of everyday objects. Although the authors ask human to identify15

sketched object category, and also compare human performance16

against computational recognition methods, they mainly focus on17

object recognition problem instead of sketching process itself. We18

are still not clear how human create their sketches from scratch, and19

whether there exists common practice across multiple human sub-20

jects, and across different object categories. In this project, we are21

typically interested in visualizing sketching process since it shows22

some consistent patterns on stroke orders. For example, as shown in23

Figure 1, if we ask people to draw a potted plant, they will usually24

first draw a pot, then sketch stem, then go for leaves and flowers,25

and finally add more details. For face, people always start from face26

outline, then sketch facial parts like eyes, nose, and mouth, and fi-27

nally sketch ears and hair with different style. Our goal is to build28

an interactive system for visualizing and understanding “what do29

sketching processes look like” rather than “what do sketched ob-30

jects look like”, which differs from the original paper.31

At first sight, sketching sequence data look similar to time-series32

1D data (e.g. stock trend). However, traditional line chart tech-33

nique can only display 1D signal at each timestamp while we are34

now faced with 2D image pixels at each frame, which makes it35

non-trivial to reveal underlying regularities existing in the common36

intensity patterns across spatial-temporal domain. We are picturing37

two ideas to address the above problem:38

• Aggregate images at each frame to depict a global trend using39

“Image Averaging” technique [Viegas and Wattenberg 2007].40

Instead of showing multiple image sequences at one large dis-41

play (i.e. play multiple sketching videos), we simply average42

strokes drawn by multiple subjects and only show one average43

sketching sequence to users, which ease the burdens of users’44

perception system.45

• In addition to presenting commonalities, we also need to fig-46

ure out difference, and group different sketching processes47

into several subcategories. This requires techniques that al-48

low users to further filter, select and slice data based on their49

observation and questions. We implement an interactive inter-50

face that provides selection operation similar to [Heer et al.51

2008] to help users quickly select interesting groups out of52

original messy data.53

2 Prior Work54

Our work is inspired by, and builds on, ideas from a number of55

different areas:56

Sketch-based Interaction: Unlike keyboard typing or mouse57

click, people have utilized sketching to describe visual concept and58

tell visual story from antiquity. Thus it is very straightforward for59

humans to convey semantic meanings to computer using sketching,60

which makes sketching a powerful and intuitive tool for many pur-61

poses including (1) Sketch-based Modeling and Design: Igarashi62

and others built very cool sketching interfaces for computer aided63

design and modeling like Teddy System [Igarashi et al. 2007] and64

FiberMesh [Nealen et al. 2007]. Sketching cannot only be applied65

to editing and creating 3D model, but also can be used for synthesiz-66

ing realistic images. Recent semi-automatic image composting sys-67

tems (like Sketch2Photo [Chen et al. 2009] and Photosketcher [Eitz68

et al. 2009]) allow users to create novel photos from sketching anno-69

tated with text labels. (2) Sketch-based Retrieval: Sketching is also70

an efficient tool for exploring huge amounts of visual data (e.g. im-71

ages, videos, 3D models, etc.) since visual data are universally easy72

1


Figure 2: Sampled images from human sketched object dataset.

Figure 3: Examples of drawing sequence with color encoding temporal orders of strokes.

to render but relatively difficult to describe and explain by words.73

Several excellent sketch-based retrieval systems were proposed to74

retrieve images [Eitz et al. 2011] [Cao et al. 2010], paintings [Shri-75

vastava et al. 2011], 3D models [Eitz et al. 2012b] and even com-76

plex scenes [Xu et al. 2013]. We believe a better understanding of77

human sketching process could help researchers in this field design78

new sketch-based interface more friendly to novice users.79

Average Image: Average Image is one type of data analytic and80

visualization technique appearing in contemporary art [Viegas and81

Wattenberg 2007]. In particular, the simple technique of image av-82

eraging has been used extensively, and to great effect, by several83

well-known visual media artists such as Jason Salavon [Salavon84

2004], James Campbell [Campbell 2002], and Idris Khan [Khan85

2005]. Whereas individual image produces a view of visual data,86

average image aims to capture the data as a whole. We propose87

two extensions to this popular technique. First, we build an interac-88

tive interface for users to update average image in real-time, while89

dynamic change of average image directly and explicitly reflects90

users’ data exploratory process. Previous average image results91

are typically static images manually produced by artists without92

interactive design. Second, we extend Average Image into spatial-93

temporal domain, and propose the idea of “Video Averaging”. In94

2


particular, we average all the strokes drawn by multiple human sub-95

jects frame by frame, when each frames stands for one stroke. In96

this way, the user can not only inspect one average image, as the97

summary of sketched objects, but also can capture dynamic change98

of average images across temporal domain.99

Drawing Assistance System Researchers have taken great efforts100

to teach people sketching with varieties types of guidance. iCan-101

Draw system [Dixon et al. 2010] and the Drawing Assistant [Iarussi102

et al. 2013] both display exemplar realistic images and compare103

users’ sketches with reference image. ShadowDraw [Lee et al.104

2011] further pushes this idea and display a shadow image under-105

lying user’s strokes with real-time feedback. [Limpaecher et al.106

2013] and [Zitnick 2013] use collected human drawing data to107

beautify and correct sketching in a data-driven fashion. While we108

and these previous systems both works on human sketches data, our109

main goal of this project is to understand, explore, and visualize ex-110

isting human sketching behaviors rather than beautifying a specific111

sketch. We try to reveal commonality and difference across mul-112

tiple human subjects, although a deeper understanding of human113

sketches data can definitely contribute to new development and de-114

sign of drawing assistant system.115

3 Data116

Recently, Eitz et al. compiled a very nice large-scale sketched ob-117

ject dataset including 20, 000 unique sketches evenly distributed118

over 250 object categories [Eitz et al. 2012a]. Figure 2 shows119

sampled images from multiple categories. Even for the same cate-120

gory like bear, human subjects actually produce bears with different121

shapes, poses and details, which suggest the diversity and interest-122

ingness of dataset. The authors of dataset ask Amazon Mechnical123

Turk workers to draw one sketch at a time given a object category124

name. They publish 90×250 Human Intelligence Tasks, and collect125

sketches from 1, 350 unique workers. The workers draw a total of126

351, 060 strokes with each sketch containing a median number of127

13 strokes. After manual inspection and clean up, the authors trun-128

cate the dataset to contain exactly 80 sketches per category yielding129

20, 000 sketches.130

The dataset not only stores final sketched objects, but also store131

temporal information of each stroke as a Bezier Spline in SVG for-132

mat, which allows us to analyze sketching process. Figure 3 shows133

examples of drawing sequence with color encoding temporal orders134

of strokes. Green strokes were made in the beginning of sketching135

process while red strokes were composed later. We can see sketches136

demonstrate more interesting patterns and become much more vivid137

after this simple coloring. We now know people usually first sketch138

overall structure of piano, and then draw keys one by one, which139

can never be inferred from Figure 2.140

Since our video averaging and interactive tool require each stroke’s141

pixel positions, we need to extract information from original SVG142

data. We first parse one sketch’s SVG file into separate SVG files143

while each SVG file represent one single stroke. We then convert144

each stroke from SVG format to bitmap format using “mogrify”145

command in ImageMagick package [ImageMagick 2008], which146

leads to 2.84G stroke data. After compressing data using matrix147

sparsification, we save all the stroke data as 673MB matlab file.148

4 Approach149

4.1 Visualizing Drawing Sequence150

Early study [Eitz et al. 2012a] mainly focuses on recognizing static151

sketched objects without taking advantage of temporal information.152

Figure 5: Color Encoding for Average Video

Figure 6: Demonstration of Interactive Brush Tool: First imageis the original average image, after brushing several strokes (asshown in the second image and the third image), we could achievefinal results (the fourth image)

However, exploring such time-series data can reveal many interest-153

ing patterns about how people draw a specific object. Thus we use154

an animated approach to let user observe how drawing changes as155

time goes.156

As shown in Figure 5, our interface allow users to capture draw-157

ing progress at current timestamp. The users can drag the timeline158

bar to navigate entire sketching progress and compare sketched re-159

sults between different frames. Our toolkit can also automatically160

change the timestamp and update result to simulate the original161

drawing process. Also, we use color to encode the temporal in-162

formation. In our case, we use green to indicate that strokes were163

drawn in the beginning while red means that strokes were produced164

afterward.165

4.2 Video Averaging166

Although animation can help us explore time-series data. Animat-167

ing multiple drawing sequence is not an easy task. The simplest168

approach would be displaying them individually in a grid (Fig-169

ure 3). However, as mentioned in class, humans can only trace a170

limited number of moving objects (typically fewer than 6). Thus171

although such kind of visualization can clearly demonstrate each172

sketching sequence, it would be nontrivial for users to compare dif-173

ferent drawings, and figure out general trend of sketching processes174

of same object category.175

In order to solve this problem, we extend image averaging ap-176

proach [Viegas and Wattenberg 2007] to video averaging by com-177

puting mean image of all the strokes drawn by 80 human subjects178

stroke by stroke for the same object category. Such approach can179

concentrate uses’ attention to a small region of the screen. Also, the180

3


Figure 4: Video Averaging Results:We visualize temporal trend of common sketching practice across multiple human subjects. In each row,we show the average sketching results with 3%, 10%, 25%, 42%, 70%, 100% completion rate respectively for a particular object category.

aggregated result can reveal general trends of peoples’ drawings as181

shown in Figure 1 and Figure 4. Notice that in Figure 5, we con-182

tinue to use color encoding method when averaging videos, thus183

temporal information can be shown in one single average image as184

well.185

4.3 Generalized selection186

Although video averaging can show main modes of sketching pro-187

cess, there are still plenty of noise and outliers in the data. In some188

cases, the orientations of sketched object could also be very differ-189

ent between each other (e.g. giraffe). Due to culture and education190

background, human also could have completely different knowl-191

edge and interpretation for the same object category (e.g. western192

dragon vs. eastern dragon). Thus it would be desirable if we can193

filter out noised data, or divide the entire data into different subcat-194

egories.195

In order to achieve this goal, we provide an interactive eraser brush196

for users to remove and shape drawings. Inspiring by Generalized197

Selection [Heer et al. 2008], our brush is different from traditional198

eraser brush. The generalized selection uses manipulation tech-199

niques that couple declarative selection queries with a query relax-200

ation engine that enables users to interactively generalize their se-201

lections. And our system will generalize the selection by two ways:202

First, the whole sketching will be selected and removed if any part203

of it is touched by our brush. Second, users can use our brush at204

arbitrary time frame and the whole drawing will be removed if it205

is touched at that frame. Thus our system will not only remove206

what brush directly touched, but also will remove all the elements207

related to them. As demonstrated in Figure 6, after brushing several208

strokes, we could achieve much cleaner average results.209

This tool is very powerful for users to filter out drawings since when210

they navigate different timestamps and remove whatever outliers211

they see. We will describe more concrete applications and findings212

in the next session.213

5 Results and Applications214

5.1 Patterns and Trends215

By exploring our dataset, we can see some clear patterns and trends216

of drawing sequences for several objects in Figure 1 and Figure 4.217

For example, when people draw bicycles, most people would draw218

the front wheel first. But after that, the drawing order would be a bit219

diverse and random. Either the back wheel or the frame would be220

drawn. Also, for the potted plant example, most people will draw221

the pot first, and then draw the plant from stem to leaf.222

5.2 Identifying Clusters223

By using our brush tool, users can easily shape averaged sketching224

results. Instead of simply brush on noised region to clean up data,225

one can also discover subcategory structure. For example, in “gi-226

raffe” case (Figure 6 and Figure 7) , two different orientations of227

head exist. Users can simply brush one head to keep the other clus-228

ter. Similar results are showed for “key” category, where different229

rotation of keys fall into different clusters (Figure 7).230

5.3 Outliers detection231

Also, the brush can be also used for detecting outliers. By brushing232

some dense region, the common drawings will be removed. And233

those who differ from common patterns will be shown alone. For234

the hourglass case, when we brush the middle region of the hour-235

glass, the remaining drawings all have huge bottleneck which seems236

to be impossible for real hourglass (Figure 8).237

4


Figure 7: Clustering

Figure 8: Outlier Detection

6 Discussion238

The above sessions only discussed drawings that have some general239

patterns. In those cases, it would be easy to see the patterns in the240

averaging results. However, there are also other cases where such241

general patterns don’t exist, and the drawings are very diverse as242

shown in dragon and panda case (Figure 9).243

In dragon case, we can see that there are two kinds of dragon, one244

is with eastern style while the other is more like a western dragon.245

Such inconsistence make the data a bit messy. What’s more, even246

for each subcategory, the shape of the dragons varied greatly. Thus247

the averaging result of those drawings is almost like random noise.248

In the panda case, the result is also shocking, that most people are249

not drawing panda at all. Thus we need to recall that in the origi-250

nal data collection procedure [Eitz et al. 2012a], the people are not251

given any examples, but only the textual instructions. Thus maybe252

some people are just not familiar with the concept panda. Also, for253

the concept dragon, different people may refer to different things as254

well.255

But those are not the only factors that cause the inconsistence. In256

our exploration, we found that animals are typically more difficult257

for people to achieve consistence. This could be explained by that258

the real world images for animals are also very diverse since an-259

imals are movable creatures. Thus people don’t have some static260

memory about the animals.261

Also, we found that objects that are hard to rotate would have bet-262

ter consistence. Objects like tomatoes whose views don’t change263

after rotation are consistent as well. All these findings give us some264

insights that before people begin to draw, they are actually making265

some choice about how to draw that specific object. Since we are266

providing no examples, they have to think of a concrete object and267

choose one view of the object to draw. And such decision process268

varied greatly between different concepts.269

Figure 9: Inconsistent and Bad Sketches

7 Future work270

Understanding such decision process would be really meaningful271

but we currently don’t have enough data to conduct a detailed anal-272

ysis on it. In the future, we plan to collect more data and conduct273

some controlled experiments to analyze how people make those de-274

cisions and what factors are affecting people’s drawings.275

References276

CAMPBELL, J., 2002. http://jimcampbell.tv/portfolio/still image works/.277

CAO, Y., WANG, H., WANG, C., LI, Z., ZHANG, L., AND278

ZHANG, L. 2010. Mindfinder: interactive sketch-based image279

search on millions of images. In Proceedings of the international280

conference on Multimedia, ACM, 1605–1608.281

CHEN, T., CHENG, M.-M., TAN, P., SHAMIR, A., AND HU, S.-282

M. 2009. Sketch2photo: internet image montage. In ACM283

Transactions on Graphics (SIGGRAPH Asia), vol. 28, 124.284

DIXON, D., PRASAD, M., AND HAMMOND, T. 2010. icandraw:285

using sketch recognition and corrective feedback to assist a user286

in drawing human faces. In Proceedings of the SIGCHI Confer-287

ence on Human Factors in Computing Systems, ACM, 897–906.288

EITZ, M., HILDEBRAND, K., BOUBEKEUR, T., AND ALEXA, M.289

2009. Photosketch: A sketch based image query and composit-290

ing system. In SIGGRAPH 2009: Talks, ACM, 60.291

EITZ, M., HILDEBRAND, K., BOUBEKEUR, T., AND ALEXA,292

M. 2011. Sketch-based image retrieval: Benchmark and bag-293

of-features descriptors. Visualization and Computer Graphics,294

IEEE Transactions on 17, 11, 1624–1636.295

EITZ, M., HAYS, J., AND ALEXA, M. 2012. How do humans296

sketch objects? ACM Transactions on Graphics (TOG) 31, 4,297

44.298

EITZ, M., RICHTER, R., BOUBEKEUR, T., HILDEBRAND, K.,299

AND ALEXA, M. 2012. Sketch-based shape retrieval. ACM300

Transactions on Graphics (TOG) 31, 4, 31.301

HEER, J., AGRAWALA, M., AND WILLETT, W. 2008. Generalized302

selection via interactive query relaxation. In Proceedings of the303

5


SIGCHI Conference on Human Factors in Computing Systems,304

ACM, 959–968.305

IARUSSI, E., BOUSSEAU, A., TSANDILAS, T., ET AL. 2013.306

The drawing assistant: automated drawing guidance and feed-307

back from photographs. In ACM Symposium on User Interface308

Software and Technology (UIST).309

IGARASHI, T., MATSUOKA, S., AND TANAKA, H. 2007. Teddy: a310

sketching interface for 3d freeform design. In ACM SIGGRAPH311

2007 courses, ACM, 21.312

IMAGEMAGICK, 2008. www.imagemagick.org/script/index.php.313

KHAN, I., 2005. www.skny.com/artists/idris-khan/images/.314

LEE, Y. J., ZITNICK, C. L., AND COHEN, M. F. 2011. Shad-315

owdraw: real-time user guidance for freehand drawing. In ACM316

Transactions on Graphics (SIGGRAPH), vol. 30, 27.317

LIMPAECHER, A., FELTMAN, N., TREUILLE, A., AND COHEN,318

M. 2013. Real-time drawing assistance through crowdsourcing.319

ACM Transactions on Graphics (TOG) 32, 4, 54.320

NEALEN, A., IGARASHI, T., SORKINE, O., AND ALEXA, M.321

2007. Fibermesh: designing freeform surfaces with 3d curves.322

ACM Transactions on Graphics (TOG) 26, 3, 41.323

SALAVON, J., 2004. www.salavon.com/work/specialmoments/.324

SHRIVASTAVA, A., MALISIEWICZ, T., GUPTA, A., AND EFROS,325

A. A. 2011. Data-driven visual similarity for cross-domain im-326

age matching. In ACM Transactions on Graphics (TOG), vol. 30,327

ACM, 154.328

VIEGAS, F. B., AND WATTENBERG, M. 2007. Artistic data visu-329

alization: Beyond visual analytics. In Online Communities and330

Social Computing. Springer, 182–191.331

XU, K., CHEN, K., FU, H., SUN, W.-L., AND HU, S.-M. 2013.332

Sketch2scene: Sketch-based co-retrieval and co-placement of 3d333

models. ACM Trans. Graph. 32, 4 (July), 123:1–123:15.334

ZITNICK, C. L. 2013. Handwriting beautification using token335

means. ACM Trans. Graph. 32, 4 (July), 53:1–53:8.336

6

Documents

Exploring Human Sketching Process - UC Berkeleyvis.berkeley.edu/courses/cs294-10-fa13/wiki/images/4/4c/Main.pdf · 101 to teach people sketching with varieties types of guidance