sdarticle (40)

7/24/2019 sdarticle (40)

1/29

Autonomous pedestrians

Wei Shao a, Demetri Terzopoulos b,*

a Google Inc., Kirkland, WA, USAb Computer Science Department, University of California, Los Angeles, CA, USA

Received 12 April 2006; received in revised form 18 September 2007; accepted 24 September 2007Available online 18 Ocotber 2007

Abstract

We address the challenging problem of emulating the rich complexity of real pedestrians in urban environments. Ourartificial life approach integrates motor, perceptual, behavioral, and cognitive components within a comprehensive modelof pedestrians asindividuals. Featuring innovations in these components, as well as in their combination, our model yieldsresults of unprecedented fidelity and complexity for fully autonomous multihuman simulation in a large urban environ-ment. We represent the environment using hierarchical data structures, which efficiently support the perceptual queries thatinfluence the behavioral responses of the autonomous pedestrians and sustain their ability to plan their actions on localand global scales. 2007 Elsevier Inc. All rights reserved.

Keywords: Artificial life; Virtual humans; Autonomous characters; Behavioral animation; Cognitive modeling

1. Introduction

Forty years ago today at 9 a.m., in a light rain,

jack-hammers began tearing at the granite walls

of the soon-to-be-demolished Pennsylvania Sta-

tion, an event that the editorial page of The New

York Times termed a monumental act of vandal-

ism that was the shame of New York.

(Glenn Collins, The New York Times, 10/28/03)

The demolition of New York Citys originalPennsylvania Station (Fig. 2), which had openedto the public in 1910, in order to make way forthe Penn Plaza complex and Madison Square Gar-

den, was a tragic loss of architectural grandeur.Although state-of-the-art computer graphicsenables a virtual reconstruction of the train stationwith impressive geometric and photometric detail,it does not yet enable the automated animation ofits human occupants with anywhere near as muchfidelity. Our research addresses this difficult, long-term challenge.

In a departure from the substantial literature onso-called crowd simulation, we develop a decen-tralized, comprehensive model of pedestrians asautonomous individuals capable of a broad varietyof activities in large-scale synthetic urban spaces.Our artificial life approach to modeling humansspans the modeling of pedestrian appearance, loco-motion, perception, behavior, and cognition [35].We deploy a multitude of self-animated virtualpedestrians within a large environment model, a

1524-0703/$ - see front matter 2007 Elsevier Inc. All rights reserved.

doi:10.1016/j.gmod.2007.09.001

* Corresponding author. Fax: +1 310 794 5057.E-mail address:[email protected](D. Terzopoulos).URL:http://www.cs.ucla.edu/~dt(D. Terzopoulos).

Available online at www.sciencedirect.com

Graphical Models 69 (2007) 246274

www.elsevier.com/locate/gmod
http://-/?-mailto:[email protected]://www.cs.ucla.edu/~dthttp://www.cs.ucla.edu/~dtmailto:[email protected]://-/?-

7/24/2019 sdarticle (40)

2/29

VR reconstruction of the original Penn Station(Fig. 1). The environment model includes hierarchi-cal data structures that support the efficient interac-tion between numerous pedestrians and theircomplex virtual world through fast perceptual queryalgorithms that sustain pedestrian navigation onlocal and global scales.

We continue with a review of related work in Sec-tion2. Section3briefly reviews our virtual environ-ment model. In Section 4, we present ourautonomous pedestrian model, mostly focusing onits (reactive) behavioral and (deliberative) cognitivecomponents. Additional details regarding theautonomous pedestrian and environmental modelsare provided in Appendices A,B,C,D. Section 5describes the simulation of the models. Section 6presents results comprising long-term simulationswith well over 1000 pedestrians and reports on per-formance. Finally, Section7 draws conclusions anddiscusses future work.

2. Related work

Human animation is an important and challeng-ing problem in computer graphics[2]. Psychologistsand sociologists have been studying the behaviorand activities of people for many years. Closer tohome, pedestrian simulation has recently begun tocapture the attention of CG researchers [1,20]. Thetopic has also been of some interest in the field of

artificial life[4], as well as in architecture and urban

planning [16,28] where graphics researchers haveassisted in visualizing planned construction projects,including pedestrian animation[8,19].

In pedestrian animation, the bulk of priorresearch has focused on synthesizing natural loco-motion (a problem that we do not consider in thispaper) and on path planning (one that we do).

The seminal work of Reynolds [23] on behavioralanimation is certainly relevant to our effort, as isits further development in work by other researchers[37,36,17]. Behavioral animation has given impetusto an entire industry of applications for distributed(multiagent) behavioral systems that are capable ofsynthesizing flocking, schooling, herding, etc.,behaviors for lower animals, or in the case of humancharacters, crowd behavior. Low-level crowd inter-action models have been developed in the sciences[10,12,3,27] and by animation researchers[11,15,33,38,14] and also in the movie industry byDisney and many other production houses, mostnotably in recent years for horde battle scenes infeature films (see www.massivesoftware.com).

While our work is innovative in the context ofbehavioral animation, it is very different from so-called crowd animation. As the aforementionedliterature shows, animating large crowds, whereone character algorithmically follows another in astolid manner, is relatively easy. We are uninterestedin crowds per se. Rather, the goal of our work is todevelop a comprehensive, self-animated model of

individualhuman beings that incorporates nontrivial

Fig. 1. A large-scale simulation of a virtual train station populated by self-animated virtual humans. From left to right are renderedimages of the main waiting room, concourses, and arcade.

Fig. 2. Historical photos of the original Pennsylvania Station in New York City.

W. Shao, D. Terzopoulos / Graphical Models 69 (2007) 246274 247
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://www.massivesoftware.com/http://www.massivesoftware.com/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

7/24/2019 sdarticle (40)

3/29

human-like abilities suited to the purposes of ani-mating virtual pedestrians in urban environments.Our approach is inspired most heavily by the workof [37]on artificial animals and by [9] on cognitivemodeling for intelligent characters that can reason

and plan their actions. We further develop theircomprehensive artificial life approach and adopt itfor the first time to the case of autonomous virtualhumans that can populate extensive urban spaces.In particular, we pay serious attention to delibera-tive human activities over and above the reactivebehavior level.

3. Virtual environment model

The interaction between a pedestrian and his/her

environment plays a major role in the animation ofautonomous virtual humans in synthetic urbanspaces. This, in turn, depends heavily on the repre-sentation and (perceptual) interpretation of theenvironment. Recently, Lamarche and Donikian[14]proposed a suitable structuring of the geometricenvironment and reactive navigation algorithms forpedestrian simulation. While this part of our work isconceptually similar, our methods differ. We havedevoted considerable effort to developing a large-scale (indoor) urban environment model, which is

described in detail in Appendix A and elsewhere[29], and which we summarize next.We represent the virtual environment by a hierar-

chical collection of maps. As illustrated in Fig. 3,the model comprises (i) a topological map whichrepresents the topological structure between differ-ent parts of the virtual world. Linked within thismap are (ii) perception maps, which provide rele-

vant information to perceptual queries, and (iii)path maps, which enable online path-planning fornavigation.

In the topological map, nodes correspond toenvironmental regions and edges represent accessi-

bility between regions. A region is a bounded vol-ume in 3D space (such as a room, a corridor, aflight of stairs or even an entire floor) together withall the objects inside that volume (e.g., ground,walls, benches). The representation assumes thatthe walkable surface in a region may be mappedonto a horizontal plane without loss of essentialgeometric information. Consequently, the 3D spacemay be adequately represented within the topologi-cal map by several 2D, planar maps, therebyenhancing the simplicity and efficiency of environ-mental queries.

The perception maps include grid maps that rep-resent stationary environmental objects on a local,per region basis, as well as a global grid map thatkeeps track of mobile objects, usually other pedes-trians. These uniform grid maps store informationwithin each of their cells that identifies all of theobjects occupying that cellular area. The typical cellsize of the grid maps for stationary object percep-tion is 0.20.3 m. Each cell of the mobile grid mapstores and updates identifiers of all the agents cur-rently within its cellular area. Since it serves simply

to identify the nearby agents, rather than to deter-mine their exact positions, it employs cells whosesize is commensurate with the pedestrians visualsensing range (currently set to 5 m). The perceptionprocess will be discussed in more detail in Section4.2.

The path maps include a quadtree map whichsupports global, long-range path planning and agrid map which supports short-range path planning.Each node of the quadtree map stores informationabout its level in the quadtree, the position of thearea covered by the node, the occupancy type(ground, obstacle, seat, etc.), and pointers to neigh-boring nodes, as well as information for use in pathplanning, such as a distance variable (i.e., how farthe node is from a given start point) and a conges-tion factor (the portion of the area of the node thatis occupied by pedestrians). The quadtree map sup-ports the execution of several variants of the A*

graph search algorithm, which are employed tocompute quasi-optimal paths to desired goals (cf.[5]). Our simulations with numerous pedestriansindicate that the quadtree map is used for planning

about 94% of their paths. The remaining 6% of the

D (hall)

B (arcade)

C (concourse)

A (waiting room)

Perception Maps

Topological

Map

Path Maps

Grid Quadtree

Specialized

Objects Ground Seat Waiting-line

Stationary Mobile

Fig. 3. Hierarchical environment model.

248 W. Shao, D. Terzopoulos / Graphical Models 69 (2007) 246274
http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

7/24/2019 sdarticle (40)

4/29

paths are planned using the grid path map, whichalso supports the execution of A* and providesdetailed, short-range paths to goals in the presenceof obstacles, as necessary. A typical example of itsuse is when a pedestrian is behind a chair or bench

and must navigate around it in order to sit down.With each of the aforementioned maps special-ized to a different purpose, our environment modelis efficient enough to support the real-time (30 fps)simulation of about 1400 pedestrians on a2.8 GHz Xeon PC with 1 GB memory. For thedetails about the construction and update of ourenvironment model and associated performance sta-tistics regarding its use in perception and path plan-ning, we refer the reader toAppendix Aand[29,31].

4. Autonomous pedestrian model

Like real humans, our synthetic pedestrians arefully autonomous. They perceive their virtual envi-ronment, analyze environmental situations, makedecisions, and behave naturally. Our autonomoushuman characters are architected as a hierarchicalartificial life model. Progressing upward throughthe levels of abstraction, our model incorporatesappearance, motor, perception, behavior, and cog-nition sub-models. The following sections discusseach of these components in turn.

4.1. Human appearance, movement, & motor control

As an implementation of the low-level appear-ance and motor levels, we employ a human anima-tion software package called DI-Guy, which iscommercially available from Boston DynamicsInc. It provides textured 3D human characters withbasic motor skills, such as standing, strolling, walk-ing, running, sitting, etc. [13]. DI-Guy charactersare by no means autonomous, but their actionsmay be scripted manually using an interactive toolcalled DI-Guy Scenario, which we do not use. DI-Guy also includes an SDK that allows external C/C++ programs to control a characters basic motorrepertoire. This SDK enables us to interface DI-Guy to our extensive, high-level perceptual, behav-ioral, and cognitive control software, which will bedescribed in subsequent sections, thereby achievingfully autonomous pedestrians.

Emulating the natural appearance and move-ment of human beings is a difficult problem and,not surprisingly, DI-Guy suffers from several limita-

tions. The 3D character appearance models are

insufficiently detailed. More importantly, DI-Guycharacters cannot synthesize the full range ofmotions needed to cope with a highly dynamicurban environment. With the help of the DI-GuyMotion Editor, we have modified and supplemented

the motion repertoire, enabling faster action transi-tions, which better enables our pedestrians to dealwith busy urban environments.

Moreover, we have implemented a motor controlinterface between the kinematic layer of DI-Guyand our higher-level behavioral controllers. Theinterface accepts motor control commands frombehavior modules, and it verifies and corrects themin accordance with the pedestrians kinematic limits.It then selects an appropriate motion sequence orposture and calls upon the kinematic layer to updatethe state of the character. Our seamless interface

hides the details of the underlying kinematic layerfrom our higher-level behavior routines, allowingthe latter to be developed largely independently.Hence, in principle, any suitable low-level humananimation API can easily replace DI-Guy in ourfuture work.

4.2. Perception

An autonomous and highly mobile virtualhuman must have a perceptive regard of its environ-

ment. Our environment model (Section3) efficientlyprovides accurate perceptual data in response to thequeries of autonomous pedestrians.

4.2.1. Sensing ground height

In the static object perception map, each map cellcontains the height functions of usually a singlethough sometimes multiple ground objects, such asthe floor, stairs, etc. The highest object at thedesired foot location of a pedestrian is returned inconstant time and it is processed within the pedes-

trians motor layer, which plants the foot at theappropriate height.

4.2.2. Sensing static objects

The visual sensing computation shoots out a fanof line segments, with length determining the desiredperceptual range and density determining thedesired perceptual acuity (Fig. 4(a)(b)). Grid cellson the perception map along each line are interro-gated for their associated object information. Thisperceptual query takes time that grows linearly with

the length of each line times the number of lines but,

http://-/?-http://-/?-http://-/?-http://-/?-

7/24/2019 sdarticle (40)

5/29

most importantly, it does not depend on the numberof objects in the virtual environment.

4.2.3. Sensing mobile objects

To sense mobile objects (mostly other humans), apedestrian must first identify nearby pedestrianswithin the sensing range. The range here is definedby a fan as illustrated in Fig. 4(c). On the mobileobject perception map, the cells wholly or partlywithin the fan are divided into tiers based on theirdistance to the pedestrian. Closer tiers are examinedearlier. Once a predefined number (currently set to16) of nearby pedestrians are perceived, the sensingis terminated. This is motivated by the fact that, at

any given time, people usually pay attention only toa limited number ofother people, usually those thatare most proximal.1 Once the set of nearby pedestri-ans is sensed, additional information can beobtained by referring to finer maps, estimation, orsimply querying some pedestrian of particular inter-est. Given the sensing fan and the upper bound onthe number of sensed pedestrians, sensing is a con-stant-time operation.

4.2.4. Locating an object

Given a location identifier (e.g., Track 9), asearch at the object level can find the virtual object.This is accomplished in constant time using a hashmap with location names as keys. As the virtualobject has an upward reference to its region (e.g.,under the lower concourse), it can be quicklylocated by referring to the node in the topologicalgraph, as can nearby objects in that region (say,

Platform 9 and Platform 10) by referring tothe perception maps linked within the node.

4.2.5. Interpreting complex situations

Abstract interpretation of the environment isindispensable for performing higher-level, motiva-tional behaviors (Section4.3). For example, in orderto get a ticket, a pedestrian should assess (1) thelengthof the queues at ticketing areas and pick a shortone, (2) the last pedestrian waiting in line in order to

join the queue, (3) when (s)he has become the first per-son in line, and (4) whether a ticket booth has becomeavailable to make the purchase. These perceptualqueries are efficiently answered by the specialized

environmental objects that keep track of the evolvingsituation, in the ticketing example these include aqueue object and severalpurchase-point objectseach associated with a ticket booth.

4.3. Behavioral control

Realistic behavioral modeling, whose purpose isto link perception to appropriate actions, is a bigchallenge in the case of autonomous virtualhumans. Even for pedestrians, the complexity of

any substantive behavioral repertoire is high. Con-siderable literature in psychology, ethology, artifi-cial intelligence, robotics, and artificial life isdevoted to the subject. Following [37], we adopt abottom-up strategy that uses primitive reactivebehaviors as building blocks that in turn supportmore complex motivational behaviors, all con-trolled by an action selection mechanism.

4.3.1. Basic reactive behaviors

Reactive behaviors appropriately connect per-

ceptions to immediate actions. We have developed

Fig. 4. Perception. (a) and (b) A pedestrian (denoted by the red circle) perceives stationary objects, (a) by examining map cells along therasterized eye ray, while (b) perceiving the broader situation by shooting out a fan of eye rays (rasterization not shown). (c) Sensing mobileobjects by examining (color-coded) tiers of the sensing fan. (For interpretation of the references to color in this figure legend, the reader isreferred to the web version of this article.)

1 Because of the limited processing capacity of the brain, peoplebecome consciously aware of but a small fraction of the availablesensory input, as determined by their focus of attention, which is

partly a subconscious and partly a voluntary selection process[6].


7/24/2019 sdarticle (40)

6/29

six key reactive behavior routines, each suitable fora specific set of situations in a densely populatedand highly dynamic environment (Fig. 5). Giventhat a pedestrian possesses a set of motor skills, suchas standing in place, moving forward, turning in dif-

ferent directions, speeding up and slowing down,etc., these routines are responsible for initiating, ter-minating, and sequencing the motor skills on ashort-term basis guided by sensory stimuli andinternal percepts. The details of the six routines,denoted Routines AF, are provided in AppendixB.

Several remarks regarding the routines are inorder: Obviously, the fail-safe strategy of Routine

E suffices in and of itself to avoid nearly all colli-sions between pedestrians. However, our experi-ments show that in the absence of Routines C andD, Routine E makes the dynamic obstacle avoid-ance behavior appear very awkwardpedestrians

stop and turn too frequently and they make slowprogress. Enabling Routines C and D, the obstacleavoidance behavior looks increasingly more natural.Interesting multiagent behavior patterns emergewhen all the routines are enabled. For example,pedestrians will queue to go through a narrow por-tal. In a busy area, lanes of opposing pedestriantraffic will tend to form spontaneously after a shortwhile.

N

C

C

H

C

C

H

C

H

C

C

H

H

C

H

w

d s

N

C

C

C

C H

Fig. 5. Reactive behaviors. (a) (left to right) Original travel direction of pedestrian; detect stationary obstacle ahead by examining gridentries along the rasterized eye ray; perceive other nearby stationary obstacles by shooting additional eye rays; change direction andproceed. (b) Pedestrians choose best turning curves (gray) to turn southward. (c) Pedestrians within pedestrian Hs forward parabolicregion traveling in similar directions as H(labeledC) are in Hs temporary crowd. (d1) To avoid cross-collision, (left) Hslows down andturns toward Cwhile Cdoes the opposite until collision is cleared (right). (d2) To avoid head-on collision, both pedestrians turn slightlyaway from each other. (e) The dotted rectangle defines Hs forward safe area; w and ddepend on Hs bounding box size and dis alsodetermined by Hs current speed s. (f) Confronted by static and dynamic threats, Hpicks obstacle-free direction (light gray arrow) and

slows down (black arrow) to let others pass before proceeding.


7/24/2019 sdarticle (40)

7/29

A remaining issue is how best to activate the sixreactive behavior routines. Since the situationencountered by a pedestrian is always some combi-nation of the key situations that are covered by thesix routines, we have chosen to activate them in a

sequential manner (Fig. 6), giving each an opportu-nity to alter the currently active motor control com-mand, comprising speed, turning angle, etc. Foreach routine, the input is the motor commandissued by its predecessor, either a higher-levelbehavior module (possibly goal-directed navigation)or another reactive behavior routine. The sequentialflow of control affords later routines the chance tooverride motor commands issued by earlier rou-tines, but this may cause the pedestrian to ignoresome aspect of the situation, resulting in a collision.The problem can be mitigated by finding a best

permutation ordering for activating the six routines.We have run many extensive simulations (longerthan 20 min in virtual time) in the Penn Stationenvironment with different numbers of pedestrians(333, 666, and 1000), exhaustively evaluating theperformance of all 720 possible permutation order-ings. The best permutation of the six routines, in thesense that it results in the fewest collisions while rea-sonable progress is still maintained in navigation, isCABFED. Appendix C explains how wedetermined this best permutation.

4.3.2. Navigational behaviors

While the reactive behaviors enable pedestriansto move around freely, almost always avoiding col-lisions, navigational behaviors enable them to gowhere they desire, which is crucial for pedestrians.A pioneering effort on autonomous navigation isthat by Noser et al. [21]. Metoyer and Hodgins[19] propose a model for reactive path planning inwhich the user can refine the motion by directingthe characters with navigation primitives. We preferto have our pedestrians navigate entirely on theirown, as normal biological humans are capable ofdoing.

The online simulation of numerous pedestrianswithin large, complex environments, confronts uswith many navigational issues, such as the realismof paths taken, the speed and scale of path planning,and pedestrian flow control through and around

bottlenecks. We have found it necessary to developa number of novel navigational behavior routines toaddress these issues. These behaviors rely in turn ona set of conventional navigational behavior rou-tines, including moving forward, turning (in placeor while moving), proceeding toward a target, andarriving at a target (see[25]for details).

In the Penn Station environment, large regionsare connected by narrow portals and stairways,some of which allow only two or three people toadvance comfortably side by side. These bottleneckscan easily cause extensive queueing, leading to

lengthy delays. In our experience, available tech-niques, such as queuing in [25], self-organizationin [12], and global crowd control in [20] cannottackle the problem, as it involves highly dynamictwo-way traffic and requires quick and flexibleresponses from pedestrians. In our solution, weemploy two behavioral heuristics. First, pedestriansinside a bottleneck should move with traffic whiletrying not to impede oncoming pedestrians. Second,all connecting passageways between two placesshould be used in balance. The two behaviors that

are detailed next enabled us to increase the numberof pedestrians within the Penn Station model fromunder 400 to well over 1000 without any long-termblockage in bottlenecks.

4.3.2.1. Passageway navigation.In real life, if pedes-trians are traveling in the same direction inside anarrow passageway, they will tend to spread outin order to see further ahead and maximize theirpace. However, once oncoming traffic is encoun-tered, people will tend to form opposing lanes tomaximize the two-way throughput. Our virtualpedestrians incorporate a similar behavior. First,two imaginary boundaries are computed parallelto the walls with an offset of about half the pedes-trianHs bounding box size (Fig. 7(a)). RestrictingHs travel direction within a safety fan defined bythe boundaries, as shown in the figure, guaranteesthat Hstays clear of the walls. Second, ifHdetectsthat its current direction is blocked by oncomingpedestrians, it will search within the safety fan fora safe interval to get through (Fig. 7(b)). The searchstarts from Hs current direction and continues

clockwise. If the search succeeds, H will move in

C A B F E D

Motor control

Higher-level behavior control

Motor control

command flow

Reactive behavior

routines

Fig. 6. Activation of the reactive behavior routines in the best

permutation order CABFED.

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

7/24/2019 sdarticle (40)

8/29

the safe direction found. Otherwise, H will slowdown and proceed in the rightmost direction withinthe safety fan. This strategy allows non-blockingtraffic to intermingle without resistance. However,in a manner that reflects the preference of real peo-

ple in many countries, a virtual pedestrian will tendto squeeze to the right if it is impeding or impededby oncoming traffic (Fig. 7(d)). Finally, Routine C(see SectionB.3 in AppendixB) is used to maintaina safe separation between pedestrians travelling inthe same direction. By altering their crowding factorwi based on the observation of oncoming traffic,pedestrians can spread out or draw tightly to adaptto the situation (Fig. 7(c)).

4.3.2.2. Passageway selection. People are usuallymotivated enough to pick the best option from sev-eral available access routes, depending on both per-sonal preferences and the real-time situation in andaround those routes. Likewise, our pedestrians willassess the situation around stairways and portals,pick a preferred one based on proximity and densityof pedestrians near it, and proceed toward it. Theywill persist in the choice they make, unless asignificantly more favorable condition is detectedelsewhere. This behavior, although executed inde-

pendently by each individual, has a global effect ofbalancing the loads at different passageways.

Visually guided navigation among static obsta-cles is another important behavior for pedestrians.

The following two behavioral routines accomplishthis task on a local scale.

4.3.2.3. Perception-guided navigation among static

obstacles. Given a path P (the global planning ofpaths will be explained in the next section), afarthestvisible point pon Pi.e., the farthest point along Psuch that there is no obstacle on the line between pand the pedestrian Hs current positionis deter-mined and set as an intermediate target (Fig. 8). AsHprogresses toward p, it may detect a new farthest

visible point that is even further along the path. Thisenablesthe pedestrianto approachthe final target in anatural, incremental fashion. During navigation,motor control commands for each footstep are veri-fied sequentially by the entire set of reactive behaviorroutines in their aforementioned order so as to keepthe pedestrian safe from collisions.

4.3.2.4. Detailed arrival at target navigation.

Before a pedestrian arrives at a target, a detailed

H

H

A

B CH

H

E

DF

H

Fig. 7. Passageway navigation. (a) Two imaginary boundaries (dashed lines) and the safety fan. (b) Pedestrian Hsearches for a safedirection interval when confronted by oncoming traffic. (c) Spread out when no oncoming traffic is observed. (d) Typical flow ofpedestrians in a passagewaybig flows on the sides with small unblocking streams intermingling in the middle.

H

T

Start point

Obstacle

H

F

T

Fig. 8. Perception-guided navigation. (a) To reach target T, pedestrian Hwill (b) plan a jagged path on a path map (either grid orquadtree), (c) pick the farthest visible point (blue circle marked F) along the path, and proceed toward it. (For interpretation of the

references to color in this figure legend, the reader is referred to the web version of this article.)


7/24/2019 sdarticle (40)

9/29

path will be needed if small obstacles intervene.Such paths can be computed on a fine-scale gridpath map. The pedestrian will follow the detailedpath strictly as it approaches the target, becausenavigational accuracy becomes increasingly impor-

tant as the distance to the target diminishes. Assome part of an obstacle may also be a part of orvery close to the target, indiscriminately employingthe reactive behaviors for static obstacle avoid-anceRoutines A and B (refer to Subsections B.1and B.2inAppendix B)will cause the pedestrianto avoid the obstacle as well as the target, therebyhindering or even preventing the pedestrian fromreaching the target. We deal with this by temporar-ily disabling the two routines and letting the pedes-trian accurately follow the detailed path, whichalready avoids obstacles. Note that the other reac-

tive behaviors, Routines C, D, and E, remain active,as does Routine F, which will continue to play theimportant role of verifying that modified motorcontrol commands never lead the pedestrian intoobstacles.

4.3.3. Motivational behaviors

The previously described behaviors comprise anessential aspect of the pedestrians behavioral reper-toire. To make our pedestrians more interesting,however, we have augmented the repertoire with a

set of non-navigational, motivational behavior rou-tines including, among others, the following:

Meet with friends and chat; Select an unoccupied seat and sit down when

tired; Approach and observe a performance when

interested; Queue at a ticketing area or vending machine and

make a purchase.

In the latter behavior, for example, a pedestrianjoins a ticket purchase queue and stands behind itsprecursor pedestrian, proceeding forward until com-ing to the head of the queue. Then, the pedestrianwill approach the first ticket counter associated withthis queue that becomes available.Appendix Dpre-sents the details of several of the above behaviorroutines.

Note that these motivational behaviors dependon the basic reactive behaviors and navigationalbehaviors to enable the pedestrian to reach targetsin a collision-free manner. Furthermore, these rou-

tines are representative of higher-level behaviors

that involve potential competition among pedestri-ans for public resources. While each pedestrian gen-erally tries to maximize their personal benefit, theyalso follow social conventions designed to maximizecollective benefits, leading to natural, rational

pedestrian behavior.

4.3.4. Mental state and action selection

Each pedestrian maintains a set of internalmental state variables that encodes the pedestrianscurrent physiological, psychological, or socialneeds. These variables include tiredness, thirst,curiosity, the propensity to be attracted by perfor-mances, the need to acquire a ticket, etc. Whenthe value of a mental state variable exceeds aspecified threshold, an action selection mechanism

chooses the appropriate behavior to fulfill theneed. Once a need is fulfilled, the value of theassociated mental state variable begins to decreaseasymptotically to zero.

We classify pedestrians in the virtual train stationenvironment as commuters, tourists, law enforce-ment officers, performers, etc. Each pedestrian typehas an associated action selection mechanism withappropriately set behavior-triggering thresholdsassociated with mental state variables. For instance,law enforcement officers on guard will neverattempt to buy a train ticket and commuters will

never act like performers. As a representative exam-ple,Fig. 9illustrates the action selection mechanismof a commuter.

4.4. Cognitive control

At the highest level of autonomous control, acognitive model is responsible for creating andexecuting plans suitable for autonomous pedestri-ans. Whereas the behavioral substrate described

Is fulfilling any desireD?

Update internal state values Decrease the value ofD

IsD fulfilled?

Remove current goal from stackNeed a ticket?

g = buy a

ticket

g = buy a drink g = take a restg = go and watch g = go to the platform

Putg on the stack

Pass control to behavior module

Y

Y

Y

N

NN

U= attracted U= thirsty U= tired U= hurried

Pick the most urgent desire U

Fig. 9. Action selection in a commuter.

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

7/24/2019 sdarticle (40)

10/29

in the previous section is mostly reactive, the cog-nitive control layer [9] makes our pedestrian adeliberative autonomous agent that can exploitknowledge, reason about the world, and conceiveand execute long-term plans, as real humans do.

The realism of pedestrian animation depends onthe execution of rational actions at different levelsof abstraction and at different spatiotemporalscales. At a high level of abstraction and over alarge spatiotemporal scale, a pedestrian must beable to make reasonable global navigation plansthat can enable it to travel purposefully and withsuitable perseverance between widely separatedregions of its environment, say, from the end ofthe arcade through the waiting room, through theconcourses, and down the stairs to a specific trainplatform. Such plans should exhibit desirable prop-

erties, such as being relatively direct and saving timewhen appropriate. During the actual navigation,however, the pedestrian must have the freedom todecide to what extent to follow the plan, dependingon the evolving situation, as we discussed whenexplaining the behaviors in Section 4.3.2. Prioritymust be given to the local (in both space and time)situation in the dynamic environment in order tokeep the pedestrian safe while still permitting pro-gress towards the long-range goal. Moreover, in alarge, rich, and highly dynamic environment, an

autonomous pedestrian should have the ability tofulfill not just a single goal, but possibly multiplegoals. The pedestrian also needs the ability to decidewhether and when a new plan is needed, whichrequires a bidirectional coupling between the behav-ioral layer and cognitive layer. With these insights,we apply three heuristics in the design of our pedes-trian cognitive model:

1. Divide and conquerdecompose complex tasksinto multiple simpler ones;

2. Think globally, but act locallya plan is neededat the global level, but on the local level it willserve only as a rough guide; and

3. be flexiblealways be prepared to modify localsubplans in response to the real-time situation.

The subsequent sections present additional detailsof our cognitive model, starting with a brief descrip-tion of the internal knowledge representation.

4.4.1. Knowledge of the virtual world

To make plans, especially path plans, a pedestrian

must have a representation of the environment from

its own egocentric point of view. For efficiency, theenvironment model serves as the internal knowledgebase of the static portion of the virtual world for eachpedestrian. Hence, except for tourists, every pedes-trian knows the layout of the environment and the

position of relevant static objects (such as walls,stairs, tracks, ticket booth, etc.), as well as the pedes-trians own current position within it (or localization,a fundamental problem in robotics [7]). However,each pedestrian also maintains, and updates at everysimulation step, a representation of the dynamicaspect of the current world in the form of a list of per-ceived objects, including nearby pedestrians, and cur-rent situation-relevant events (e.g., that ticketwindow is available, that seat is taken, etc.).

This approach allows us to maintain only a singlecopy of the huge representation of the static world

and still be able to support diverse cognitive andbehavioral control for different pedestrians. Whilean over-simplification, this is reasonable for model-ing pedestrians that are generally familiar with theurban space around them and is a sensible strategyfor the real-time simulation of hundreds of pedestri-ans in extensive environments.

4.4.2. Planning

Planning is central to the cognitive model. Globalpath planning is an important planning task for

pedestrians. Path planning directs a pedestrian toproceed through intermediate areas and reach ulti-mate destinations. In our pedestrian model, pathplanning is decomposed into subtasks at differentspatiotemporal scales (Section A.4). The divide-and-conquer strategy allows the problem to be solvedin a top-down manner. Each subtask can address itsown portion of the overall problem in a suitable andefficient way, contributing to the final solution. Thus,the complete path planning process has global scale,yet reasonable local accuracy and economy.

As knowledge, the path planner instantiatedwithin each individual pedestrian exploits the topo-logical map at the top level of the environmentmodel (Fig. 3). Given a pedestrians current locationand a target region, this map provides a set of opti-mal neighboring regions where the pedestrian cango. By applying path search algorithms within thepath maps associated with each region known toeach pedestrian, the pedestrian can plan a path fromthe current location to the boundary or portalbetween the current region and the next. The pro-cess is repeated in the next region, and so on, until

it terminates at the target location. In this way,


7/24/2019 sdarticle (40)

11/29

although the extent of the path is global, the pro-cessing is primarily local. Our path search algo-rithms (detailed in Appendix A and [31]), whichare based on the well-known A* graph search algo-rithm, are very efficient, but they provide rough

pathsi.e., paths that are either jagged (grid pathmaps) or contain spikes (quadtree path maps)asopposed to smooth, spline-like paths. Consequently,a pedestrian uses those rough navigational plansonly as a guide and retains the freedom to locomotelocally in as natural a manner as possible, as wasdescribed in Section4.3.2.

In addition to the global path planner, the cogni-tive model incorporates other planning routines, orplanners. Generally speaking, planners are likemanuals that guide one through the steps neededto accomplish specific goals. Each step may also

require further instructions. For instance, to meeta friend, a pedestrian must first reach the meetingpoint and then wait for the friend. Accomplishingthe first step requires a path planner. Once the plan-ner finishes its work, control is handed over to thebehavioral level to pursue the plan either until it iscomplete or until a more urgent plan arises. Conti-nuity is maintained by the memory model.

4.4.3. Memory

A pedestrian can have many different activities

inside a big train station. For instance, (s)he mayhave to purchase a ticket and meet with a friendbefore boarding a train. The route from stationentrance to train track may be complicated andmay require multiple strategies to navigate. On theway to the platform, (s)he may want to get a drinkor to stop and watch a street artist performance,and so on. To keep important things in mind whiledoing others, a pedestrian needs memory. The mem-ory model enables a pedestrian to:

Store informationmemorize intermediate andultimate tasks;

Retrieve informationremember pending tasks;and

Remove informationforget accomplishedtasks.

Unlike the pedestrians static world knowledge,the memory model stores internal results directlyfrom thinking or planning, including a list of pre-planned goals (e.g., first purchase a ticket, then meeta friend, then catch the train), a sequence of sub-

tasks decomposed from a major task (e.g., to catch

a train, one must exit the arcade, cross the waitingroom, pass one of the gates, cross the upper con-course, and descend the stairs to the platform),interruption/resumption (e.g., after stopping at thevending machine and performance, continue to the

train platform), and so on. Therefore, the memorymodel is highly dynamic and usually long-term,and can vary dramatically in size.

Taking a simple approach, we use a stack as thememory. Our stack memory model precludes pedes-trians from accomplishing multiple tasks in flexible(non-deterministic) ways, but real pedestrians prob-ably do not normally exhibit such flexibility. How-ever, the stack data structure features simpleconstant-time operations which permit easy and fastmaintenance, regardless of size. This offers a bigadvantage in the real-time simulation of hundreds

or thousands of pedestrians.

Algorithm 1. Skeleton routine for processingmemory items

1: ifthe goal is accomplished then2: pop the goal from the memory stack3: else ifthe goal has expired then4: pop the goal from memory stack5: else if multiple subtasks are required to

achieve the goal then6: create a plan containing these multiple

subtasks based on the current situation7: push the first subtask on the memory

stack marked with an appropriate expi-ration time

8: else9: determine the appropriate action needed

to achieve the goal

Each item on the memory stack represents a goaland has a list of properties, including a descriptor ofthe goal type (e.g., catch a train, take a rest, reach aspecific target point), the goal complexity (high orlow), goal parameters (e.g., platform 10, positionand orientation of the seat, position of the targetpoint), preconditions (e.g., target must be within0.5 m), and expiration time (e.g., for 3 s, untilfinished).

The top item on the stack is always the currentgoal. It will be processed by a cognitive or behav-ioral routine designed specifically for its type. Thisis done according to Algorithm 1, which decom-poses complex goals into multiple simpler ones.

Among these subtasks, only the first subtask is

http://-/?-http://-/?-

7/24/2019 sdarticle (40)

12/29

pushed onto the memory stack because task decom-position is accomplished according to the real-timesituation (reflected, for instance, in the choice ofgoal parameters in subtasks). By the time the firstsubtask is accomplished, the situation may have

changed and the remaining subtasks of the originaldecomposition may no longer be optimal or appro-priate. However, when the top subtask is completedand removed from the memory stack, the originalgoal rises to the top and is processed again, and anew decomposition can be determined at the currenttime. To make pedestrians persistent to their previ-ous choice, after the decomposition but prior to thepushing of the first subtask, the original complexgoal on the top of the memory stack is updated withconcise information about the decomposition.Hence, when the original goal is exposed again, this

information reminds the pedestrian of the previouschoice, and the pedestrian can choose to adhere toit, depending on the cognitive or behavioral routinethat will process the goal.

Sometimes, the time needed to finish the first sub-task can be lengthy. The expiration time attached tothe memory item can cause the subtask to expire,re-exposing the original goal, thus guaranteeing fre-quent plan update. The expiration time also forcesreplanning when a task resumes after an interruptionby another task. For example, if a pedestrian feels

thirsty on the way to the train platform and decidesto detour to a vending machine, (s)he needs to con-tinue the trip to the platform after purchasing thedrink. As the get a drink task is an interruption,there may be subgoals of get to the platform onthe memory stack reflecting the original plan. Dueto their expiration time, these subgoals may becomeinvalid, which effectively forces a replanning of theget to the platform task. However, as it is an inter-ruption, the get a drink tasks expiration time is setin accordance with the spare time available before thepedestrian needs to proceed to the train platform.

4.4.4. Coupling the cognitive and behavioral layers

The goal stack of the deliberative, cognitive layeris also accessible to the underlying reactive, behav-ioral layer, thus coupling the two layers. If a goalis beyond the scope of the behavioral controller(for example, some task that needs path planning),it will be further decomposed into subgoals, allow-ing the cognitive controller to handle those subgoalswithin its ability (such as planning a path) and thebehavioral controller to handle the others by initiat-

ing appropriate behavior modules (such as local

navigation). The behavioral controller can alsoinsert directives according to the internal mentalstate and environmental situation (e.g., if thirstyand a vending machine is nearby, then push planto get a drink). This usually interrupts the execu-

tion of the current task and typically invalidates it.When it is time for the interrupted task to resume,a new plan is often needed. Intuitively, the goalstack remembers what needs doing, the mentalstate variables dictate why it should be done,the cognitive controller decides how to do it ata higher, abstract level, and the behavior controllerdetermines how to do it at a lower, concrete leveland ultimately attempts to get it done.

5. Simulation

Fig. 10shows the layered relationship of the var-ious components within the pedestrian model,together with the world model. We will employthe figure to explain what happens at each humansimulation step (HSS), which comprises severalmotion/rendering frames.

At the beginning of each HSS, the pedestrianscognitive center retrieves the top memory item asthe current goal g(arrow 2). Ifg is a complex goal(such as meet a friend), it will be decomposedby a planner into subtasks (e.g., go to the meeting

point and wait). The first subtask will becomethe current task and will be pushed onto the mem-ory stack (arrow 2). Knowledge (usually of the sta-tic world) will be used (arrow 1) during planning asneeded (say, in planning a path). If the task is simpleenough (arrows 6 and 7), the action selection mech-anism will choose a suitable behavioral routine tohandle it. The behavior center acquires informationabout the current environmental situation throughthe sensory processes (arrows 3 and 10). The sen-sory data are also used to update the internalknowledge representation (arrow 4). Additionalbehavior-relevant information is supplied by thecognitive center (arrow 5). The behavior centerissues motor control commands (e.g., go forwardwith speed 1.2 m/s in the next step, turn 15 degreesleft with walk speed 0.5 m/s in the next step, standstill, sit down, look at the clock, look left, etc.) downto the motor control interface (arrow 12) where theywill first be verified and corrected in accordancewith the motion abilities and kinematic limits ofthe pedestrians body. They then manifest them-selves as actual motions or motion transitions that

the motor center selects from the motion repertoire


7/24/2019 sdarticle (40)

13/29

(arrow 14), thus determining the pedestrians pose

sequence (arrow 15) for several subsequent frames.Feedback (arrow 13) from the motion level (suchas a change of position/orientation, the currentspeed, whether in a seated state, etc.) is used toupdate the external world model (arrow 11) and italso becomes available proprioceptively to the per-ception center (arrow 8), thus resulting in a slightlydifferent perception. This may modify the internal(mental) state variables (arrow 9), which may inturn trigger (arrow 10) the behavioral controller toinitiate or terminate certain behaviors. The textured

geometric human model is used to render the scene

in each frame (arrows 16). After several frames, a

new HSS will be processed.Note from the figure that the motor control inter-

face effectively hides the DI-Guy software from ourhigher-level controllers, which makes it easy, inprinciple, to replace it with other human animationpackages.

6. Results

Our pedestrian animation system, which com-prises about 50,000 lines of C++ code, is capable,

without manual intervention, of running long-term

Static world

- World model pointer

Dynamic world

- Nearby pedestrians- Current events

World Model

Tiredness

Thirst

Curiosity

- Navigational motions (walk,jog,run,turn,stop, etc.)

- Non-navigational motions (sit down,wait,chat, etc.)

Get a drink

Meet a friend

Have a rest

Get on a train

- Action selection- Behavior routines

Cognition

DI-Guy

Rendering

3

45

6 7

21

16

10

11

8

12

1413

15

9

Pedestrian Model

16

Fig. 10. Architecture of the pedestrian animation system.


7/24/2019 sdarticle (40)

14/29

simulations of pedestrians in a large-scale urbanenvironmentspecifically, the Penn Station envi-ronment. In our simulation experiments, we popu-late the virtual train station with five differenttypes of pedestrians: commuters, tourists, perform-

ers, policemen, and patrolling soldiers. With everyindividual guided by his/her own autonomous con-trol, these autonomous pedestrians imbue their vir-tual world with liveliness, social (dis)order, and arealistically complex dynamic.

In preprocessing the Penn Station environmentmodel, the entire 3D space of the train station(200(l) 150(w) 20(h) m3), which contains hun-dreds of architectural and non-architectural objects,was manually divided into 43 regions. At run time,the environment model requires approximately90 MB of memory to accommodate the station

and all of its associated objects.

6.1. Performance

We have run various simulation tests on a2.8 GHz Intel Xeon system with 1 GB of mainmemory. The total length of each test is 20 min invirtual world time.Fig. 11plots the computationalload as the number of pedestrians in the simulationincreases. The simulation times reported excluderendering times and include only the requirements

of our algorithmsenvironment model updateand motor control, perceptual query, behavioralcontrol, and cognitive control for each pedes-trianupdating at 30 frames per virtual second.The figure shows that real-time simulation can beachieved for as many as 1400 autonomous pedestri-ans (i.e., 20 virtual world minutes takes 20 min tosimulate). Although the relation is best fit by a qua-dratic function, the linear term dominates by a fac-tor of 2200. The small quadratic term is likely due tothe fact that the number of proximal pedestrians

increases as the total number of pedestriansincreases, but at a much lower rate. Fig. 12breaks

down the computational load for various parts ofthe simulation based on experiments with differentnumbers of pedestrians ranging from 100 to 1000on the aforementioned PC. Fig. 13 tabulates theframe rates that our system achieves on the afore-mentioned PC with an NVIDIA GeForce 6800GT AGP8X 256 MB graphics system. Due to the

geometric complexity of the Penn Station modeland its numerous pedestrians, rendering times dom-inate pedestrian simulation times.

6.2. Animation examples

We will now describe several representative sim-ulations that demonstrate specific functionalities.To help place the animation scenarios in context,Fig. 14shows a plan view of the Penn station model.

x= Number of Pedestriansy=Runningtime(sec)

0

1200

1000

800

600

400

200

0

200 400 600 800 1000 1200

actual data

y = 0.843x 78.3

y = 0.000229x 2+ 0.522x + 1.97

1400

Fig. 11. Pure simulation time vs. number of pedestrians.

2.7% Others

4.7% Motor Control

29.8%

MotivationalNavigation

1.8% Cognitive Control

43.6%Frame

Update

17.4%Reactive

Behaviors

includes: 1) update agents

velocity, position, etc(28.8%); and 2) modify

maps due to agents

update(13.7%).

includes necessary

perception: 1) sensingobstacle (8.0%); and 2)

sensing nearby humans

(5.6%).

includes: 1) passageway

behaviors (5.1%); 2) pathplanning (17.3%); and 3)plan-guided navigation

(4.5%).

Fig. 12. Computational loads of the systems parts.

Fig. 13. Frame rate (in frames/s) for pedestrian simulation only(including DI-Guy), rendering only (i.e., static pedestrians), andboth simulation and rendering, with different numbers ofpedestrians.

Fig. 14. Plan view of the Penn Station model with the roof notrendered, revealing the two-level concourses and the train tracks(left), the main waiting room (center), and the long shopping

arcade (right).


7/24/2019 sdarticle (40)

15/29

6.2.1. Following an individual commuter

As we claimed in the introduction, an importantdistinction between our system and existing crowdsimulation systems is that we have implemented acomprehensive human model, which makes every

pedestrian a complete individual

with a richly broadbehavioral and cognitive repertoire. Figs. 15(a)(f)and (g)(l) show selected frames from a typical ani-mation in which we choose a commuter and followour subject as he enters the station (a), proceeds tothe ticket booths in the main waiting room (b), andwaits in a queue to purchase a ticket at the firstopen booth (c). Having obtained a ticket, he then(d) proceeds to the concourses through a congested

portal, avoiding collisions. Next, our subject feelsthirsty (e) and spots a vending machine in the con-course (f). He walks toward it and waits his turn toget a drink (g). Feeling a bit tired (h), our subjectfinds a bench with an available seat, proceeds

towards it, and sits down (i). Later, the clockchimes the hour (j) and it is time for our subjectto rise from his seat and proceed to the correct trainplatform. He makes his way through a somewhatcongested area by following, turning, and stoppingas necessary in order to avoid colliding with otherpedestrians. He passes by some dancers that areattracting interest from many other pedestrians(k), but in this particular instance our subject has

Fig. 15. Following an individual commuter.


7/24/2019 sdarticle (40)

16/29

no time to watch the performance and descends thestairs to his train platform (l).

6.2.2. Pedestrian activity in the train station

Fig. 16(a)(f) shows selected stills of a routinesimulation, which includes over 600 autonomouspedestrians, demonstrating a variety of pedestrianactivities that are typical for a train station. Wecan interactively vary our viewpoint through thestation, directing the virtual camera on the mainwaiting room, concourse, and arcade areas in orderto observe the rich variety of pedestrian activitiesthat are simultaneously taking place in different

parts of the station (see also Fig. 1). Some addi-

tional activities that were not mentioned aboveinclude pedestrians choosing portals and navigatingthrough them (a), chatting in pairs (b) or smallgroups (e), congregating in the busy upper con-course (c) to watch a dance performance for amuse-ment (d), and proceeding to the train platforms bynavigating the rather narrow and oftentimes con-gested staircases (f).

7. Conclusions

We have developed a sophisticated human ani-mation system whose major contribution is a com-

prehensive artificial life model of pedestrians as

Fig. 15 (continued)


7/24/2019 sdarticle (40)

17/29

highly capable individuals that combines percep-tual, behavioral, and cognitive control components.Incorporating a hierarchical environmental model-ing framework that efficiently supports perceptualsensing, situational awareness, and multiscale plan-ning, our novel system efficiently synthesizes numer-ous self-animated pedestrians performing a richvariety of activities in a large-scale indoor urbanenvironment.

Our results speak to the robustness of our humansimulation system and its ability to produce prodi-gious quantities of intricate animation of numerouspedestrians carrying out various individual and

group activities. Like real humans, our autonomous

virtual pedestrians demonstrate rational behaviorsreflective of their internal goals and compatible withtheir external world. Although motion artifacts areoccasionally conspicuous in our animation resultsdue to the limitations of the low-level DI-Guy soft-ware, our design facilitates the potential replace-ment of this software by a better characterrendering and motion synthesis package shouldone become available.

7.1. Future work

With additional work, at least in principle there

seems to be no reason why the appearance and

Fig. 16. Pedestrian activity in a train station.


7/24/2019 sdarticle (40)

18/29

performance of virtual humans cannot eventuallybecome indistinguishable from those of realhumans, though clearly not for many years tocome. In future work, we plan to expand thebehavioral and cognitive repertoires of our auton-

omous virtual pedestrians, as well as to includelearning mechanisms, in a systematic effort to nar-row the gap between their abilities and those ofreal people. Despite the satisfactory performanceof our reactive behavior level for the time being,it may be beneficial to activate the behavior rou-tines in some parallel fashion instead of sequen-tially. We also intend to develop a satisfactoryset of reactive and deliberative head/eye move-ment behaviors for our virtual pedestrian model.Our autonomous pedestrians are currently toomyopic for the sake of computational efficiency.

Longer-range, biomimetic visual perception (e.g.,[34,35]) in conjunction with appropriate head/eyebehaviors would enable realistic visual scanningin which a mobile individual, assesses the inten-tions of other nearby pedestrians, especially thoseto the front. It also remains to imbue our pedes-trians with useful manipulation skills, includingthe upper body motions necessary for makingpurchases at ticket booths or vending machines.Also, our train station simulations would be morerealistic if some virtual pedestrians toted luggage

and if we could simulate families of pedestriansthat move together in small groups. The simula-tion of interpersonal behavior in autonomouspedestrians is a challenging problem and it hasrecently received attention through a probabilisticapproach [39]. Finally, we are pursuing novelapplications of our simulator to archaeology[30], computer vision [22], and other fields.

Acknowledgments

The research reported herein was carried out atthe Media Research Laboratory of New York Uni-versitys Courant Institute of Mathematical Sciencesand was supported in part by grants from the De-fense Advanced Research Projects Agency (DAR-PA) of the Department of Defense and from theNational Science Foundation (NSF). We thankDr. Tom Strat, formerly of DARPA, for his gener-ous support and encouragement. We also thankMauricio Plaza-Villegas for his invaluable contribu-tions to the implementation and visualization of the

Penn Station model (which was distributed to us by

Boston Dynamics Inc.) and its integration with theDI-Guy API.

Appendix A. Environmental modeling

We represent the virtual environment by a hierar-chical collection of data structures, including atopological map, two types of maps for perception,two types of maps for path planning and a set ofspecialized environmental objects (Fig. 3). Witheach of these data structures specialized to a differ-ent purpose, the combination is able to supportaccurate and efficient environmental informationstorage and retrieval.

A.1. Topological map

At the highest level of abstraction, a graph repre-sents the topological relations between differentparts of the virtual world. In this graph, nodes cor-respond to environmental regions, and edgesbetween nodes represent accessibility betweenregions.

A region is a bounded volume in 3D space (suchas a room, a corridor, a flight of stairs, or even anentire floor) together with all the objects inside thatvolume (for example, ground, walls, ticket booths,

benches, vending machines, etc.). We assume thatthe walkable surface in a region may be mappedonto a horizontal plane without loss of essentialgeometric information, particularly the distancebetween two locations. Consequently, a 3D spacemay be adequately represented by several planarmaps, thereby enhancing the simplicity and effi-ciency of environmental queries, as will be describedshortly.

Another type of connectivity information storedat each node in the graph ispath-to-viainformation.Suppose that L(A, T) is the length in the number ofedges of the shortest path from a region Ato a dif-ferent target regionT, andP(A, T) is the set of pathsfromA toTof lengthL(A, T) andL(A, T) + 1. Thenthe path-to-via of A associated with T is a set ofpairs defined as V(A, T) = {(B,CB)}, where CB isthe length of a path p 2 P(A, T) along which regionBis next to A. As the name suggests, if (B, CB) is inV(A, T), then there exists a path of length CBfromAtoTvia B. Informally,V(A, T) answers the questionIf I am currently in A and want to go to T, towhich region shall I go and what will be the

expected cost?

http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-

7/24/2019 sdarticle (40)

19/29

Algorithm 2. Computing path-to-via information

Require:G(N, E), a graph with Nnodes and Eedges1: Initialization:

foreach node A doforeach target node Tdo

ifA==TthenV(A, T) {(A,0)}

else

V(A, T) {}2: Collect information associated with paths of

length L based on the information associated

with paths of length L 1:forL= 1 to N 1 do

foreach node A doforeach target node Tdo

forevery neighbor nodeBofA andany node Xin Gdo

if(X,L 1) 2 V(B,T) thenadd (B,L) to V(A, T)

3: Keep only minimal cost entries:

foreach node A doforeach target nodeTand any node YinGdo

letCminbe the minimal cost inV(A, T)foreach entry E(Y,C) in V(A, T) do

if(C>Cmin+ 1) then

remove Efrom V(A, T)

Given a graph, the path-to-via information iscomputed offline, in advance, using the incrementalAlgorithm 2. Note that after Step 3 of the algorithm,only those entries are stored whose cost is Cmin orCmin+ 1. Thus, we can avoid paths with cycles. Tounderstand this, consider V(A,C) for the graph inFig. 3. Region Cis a direct neighbor ofA; so (C,1)is clearly an entry ofV(A,C). As A-B-A-Cis also apossible path from A to C, then (B,3) is anotherentry. Obviously,A-B-A-Cis not desirable as it con-tains a cycle. Such paths will automatically beremoved from the path-to-viaset after Step 3.

Linked within each node of the topological mapare perception maps and path maps together witha list of objects inside that region. The next threesections describe each in turn.

A.2. Perception maps

Mobile objects and stationary objects are stored

in two separate perception maps, which form a com-

posite grid map. Hence, objects that never needupdating persist after the initialization step andmore freedom is afforded to the mobile object (usu-ally virtual pedestrian) update process during simu-lation steps.Table 1compares the perception maps,

and the next two subsections present the details.

A.2.1. Stationary objects

Our definition of a region assumes that we caneffectively map its 3D space onto a horizontal plane.By overlaying a uniform grid on that plane, wemake each cell correspond to a small area of theregion and store in that cell identifiers of all theobjects that occupy that small area. The griddedfloor plan simplifies visual sensing. The sensingquery shoots out a fan of line segments whoselength reflects the desired perceptual range and

whose density reflects the desired perceptual acuity(cf.[37,18]). Each segment is rasterized onto the gridmap (see the left and center panels ofFig. 4). Gridcells along each line are interrogated for their asso-ciated object information. This perceptual querytakes time that grows linearly with the number ofline segments times the number of cells on each linesegment. More importantly, however, it does notdepend on the number of objects in the virtual envi-ronment. Without grid maps, the necessary line-object intersection tests would be time consuming

in a large, complex virtual environment populatedby numerous pedestrians. For high sensing accu-racy, small sized-cells are used. In our simulations,the typical cell size of grid maps for stationaryobject perception is 0.20.3 m.

A.2.2. Mobile objects

Similarly, a 2D grid map is used for sensingmobile objects (typically other pedestrians). In thismap, each cell stores and also updates a list of iden-tifiers of all the pedestrians currently within its area.To update the map, for each pedestrianH(withColdand Cnew denoting the cells in which Hwas and is,respectively), if (Cold==Cnew) then we do nothing;otherwise, we remove H from Cold and add it toCnew. As the update for each pedestrian takes negli-gible, constant time, the update time cost for theentire map is linear in the total number of pedestri-ans, with a small coefficient.

The main purpose of this perception map is toenable the efficient perceptual query by a givenpedestrian of nearby pedestrians that are within itssensing range. The sensing range here is defined by

a fan as illustrated in the right part of Fig. 4. In

http://-/?-http://-/?-

7/24/2019 sdarticle (40)

20/29

the mobile object perception map, the set of cellswholly or partly within the fan are divided into sub-sets, called tiers, based on their distance to thepedestrian. Closer tiers are examined earlier. Oncea maximum number (currently set to 16) of nearbypedestrians are perceived, the sensing is terminated.This strategy is intuitively motivated by the fact thatusually people can simultaneously pay attentiononly to a limited number of other people, preferably

proximal individuals. Once the set of nearby pedes-trians is sensed, further information can be obtainedby referring to finer maps, by estimation, or simplyby querying a particular pedestrian of interest.Given the sensing fan and the upper bound on thenumber of sensed pedestrians, perception is a con-stant-time operation.

A.3. Path maps

Goal-directed navigation is one of the most

important abilities of a pedestrian, and path plan-ning enables a pedestrian to navigate a complexenvironment in a sensible manner. To facilitate fastand accurate online path planning, we use two typesof maps with different data structuresgrid mapsand quadtree maps. These will be briefly presentedin turn, before we discuss path planning.

A.3.1. Grid path map

Grid maps, which are useful in visual sensing, arealso useful for path planning. We can find a shortestpath on a grid map, if one exists, using the well-known A* graph search algorithm[32].

In our system, grid path maps are used whenevera detailed path is needed. Suppose D is the directdistance between pedestrian H and its target T.Then, a detailed path is needed for HifD is smallerthan a user-defined constant Dmax and there areobstacles between H and T. This occurs, forinstance, when one wants to move from behind achair to the front and sit on it. Clearly, the accuracyof the path in this instance depends on the size ofthe cells in the grid path maps. A small cell size

results in a large search space and, likely, low per-

formance. However, detailed paths are usually notneeded unless the target is close to the startingpoint. Therefore, chances are that paths can befound quickly, while the search has covered only asmall portion of the entire search space. Roughlyspeaking, in most cases the space that must besearched is bounded by 4(Dmax/c)

2, where c is thecell size. Typical values for these constantsin our current system are 1 6 Dmax 6 10 m and

101

6 c 6 1 m.When creating grid maps, special care must be

taken to facilitate efficient updates and queries.Polygonal bounding boxes of obstacle objects repre-sented on grid maps are enlarged by half the size ofa pedestrians bounding circle. If the center of apedestrian never enters this buffer area, collisionswill be avoided. This enables us to simplify the rep-resentation of a virtual pedestrian to a single point,which makes most queries simpler and moreefficient.

A.3.2. Quadtree path map

Every region has a quadtree map, which supportsfast online path planning [5]. Each quadtree mapcomprises

1. A list of nodes Ni (i= 0, 1, 2, . . . , m), whichtogether cover the entire area of the region (seeStep (3) inFig. 17);

2. C, the number of levels; i.e., the number of differ-ent node cell sizes appearing in the map (which is3 for the quadtree map inFig. 17); and

3. A pointer to an associated grid map with smallcell size (see Step (1) in Fig. 17).

Each node Niof the quadtree[26]stores the fol-lowing variables:

1. Li, where 0 6 Li

7/24/2019 sdarticle (40)

21/29

5. A congestion factor gi, which is updated at every

simulation step and indicates the portion of thearea covered by this node that is occupied bypedestrians; and

6. A distance variable, which indicates how far thearea represented by the node is from a given startpoint, and will be used at the time of path-plan-ning, especially during back-tracking as a gradi-ent reference to find the shortest way back tothe start point.

AsFig. 17illustrates, given a grid map with smallcells, the algorithm for constructing the quadtreemap first builds the list of map levels containingnodes representing increasing cell sizes, where thecell size of an upper level node is twice as large asthat of lower level nodes. Higher level nodes, whichaggregate lower level nodes, are created so long asthe associated lower level cells are of the same occu-pancy type, until a level is reached where no morecells can be aggregated. Quadtree maps typicallycontain a large number of lower level nodes (usuallyover 85% of all nodes) that cover only a small por-tion (usually under 20%) of the entire region. Such

nodes significantly increase the search space forpath planning. Thus, in the final stage of construc-tion, these nodes are excluded from the set of nodesthat will participate in online path planning. As thearea that they cover is small, their exclusion doesnot cause significant accuracy loss.

A.4. Path planning

We now overview the three phases of autono-mous pedestrian path planning, which from global

to local include planning global paths between

regions, planning intermediate length paths within

a region, and planning detailed local paths. Forthe full details of the path planning algorithms, werefer the reader to[31].

A.4.1. Planning global paths between regions

In global path planning, the path-to-viainforma-tion is useful in identifying intermediate regions thatlead to the target region. Any intermediate regioncan be picked as the next region and, by applyingthe path-searching schemes described below, a pathcan be planned from the current location to the

boundary between the current region and that nextregion. The process is repeated in the next region,and so on, until it can take place in the target regionto terminate at the target location. Although theextent of the path is global, the processing is local.

A.4.2. Planning intermediate length paths within a

region

To plan a path between two widely separatedlocations within a region, we first employ one of sev-eral variants ofA* algorithm to search for a path onthe quadtree path map of the region, taking pedes-trian congestion into consideration [31]. Fig. 18compares paths computed by the four search vari-ants of the path planning algorithm. Sometimes,however, quadtree maps may fail to find a path eventhough one exists. This is because low level nodes ona quadtree map are ignored during path search inorder to decrease the search space (see SectionA.3.2). Therefore, paths that must go through somelow level nodes cannot be found on a quadtree map.To resolve this, we turn to grid path maps wheneversearch fails on a quadtree map. The standard A*

algorithm is used to find a path on grid maps. For

(3) Construct Node list

1111

1111

1111

1111

111198

111176

101054

101032

Invalid1

1

1

1

1

10

11

Invalid

Invalid

1

3

10

2

4

5

6

7

8

9

11

(2) Construct Grid Map List

(1) Input:A Grid Map

(5) Output:A Quadtree Map

(4) FindNeighbors

Fig. 17. Constructing a quadtree map.


7/24/2019 sdarticle (40)

22/29

efficiency, a list of grid maps with decreasing cellsizes is kept for each region. Grid map path searchstarts at the coarsest level and progresses downthe list so long as the search fails on coarser levels.Although grid maps with large cells are likely tomerge separate objects due to aliasing, and thuscause the search to fail, the multiresolution search

ascertains that, as long as the path exists, it will

always be found eventually on maps with smallercells.

A.4.3. Planning detailed local paths

If two locations are nearby but obstacles inter-vene, the finest grid path map of this region is usedfor finding a detailed path. The path will be slightly

smoothed, and the pedestrian will follow the pro-

Fig. 18. Comparison of path planning algorithms on quadtree maps. (1) Visualization of the quadtree map of the upper concourse in thePenn Station environment model. The white quads denote ground nodes and the blue ones denote obstacles. The green circle (left) is thestart location and orange circle (right) is the destination. With obstacle quads suppressed for clarity, (2)(5) show results of the four pathplanning schemes: (2) SortedQ, (3) SingleQ, (4) MultiQ, and (5) PmultiQ. The search space is color coded with the distance variable values

increasing from green to orange. Note that, although the four paths are similar, the sizes of search space differ. Refer to [31]. (Forinterpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

http://-/?-http://-/?-

7/24/2019 sdarticle (40)

23/29

cessed path strictly when approaching the target aswe describe in Section 4.3.2.

A.5. Specialized environment objects

At the lowest level of our environmental hierarchyare environmental objects. Cells of grid maps andquadtree maps maintain pointers to a set of objectsthat are partly or wholly within their covered area.Every object has a list of properties, such as name,type, geometry, color/texture, functionality, etc.Many of the objects are specialized to support quickperceptual queries. For instance, every ground objectcontains an altitude function which responds toground height sensing queries. A bench object keepstrack of how many people are sitting on it and wherethey are sitting. By querying nearby bench objects,

weary pedestrians are able to determine the availableseat positions and decide where to sit without furtherreference to the perceptual maps. Other types of spe-cialized objects include queues (where pedestrianswait in line), purchase points (where pedestrianscan make a purchase), entrances/exits, etc. In short,these objects provide a higher level interpretation ofthe world that would be awkward to implement withperception maps alone, and this simplifies the situa-tional analysis for pedestrians when they performautonomous behaviors.

Appendix B. The basic reactive behavior routines

This appendix describes the six basic reactivebehavior routines.

B.1. Routine A: Static obstacle avoidance

If there is a nearby obstacle in the direction oflocomotion, lateral directions to the left and rightare tested until a less cluttered direction is found(Fig. 4(b)). If a large angle (currently set to 90)must be swept before a good direction is found, thenthe pedestrian will start to slow down, which mimicsthe behavior of a real person upon encountering atough array of obstacles; i.e., slow down while turn-ing the head to look around, then proceed.

B.2. Routine B: Static obstacle avoidance in a

complex turn

When a pedestrian needs to make a turn that can-not be finished in one step, it will consider turns with

increasing curvatures in both directions, starting

with the side that permits the smaller turning angle,until a collision-free turn is found (Fig. 5(b)). If thesurrounding space is too cluttered, the curve is likelyto degenerate, causing the pedestrian to stop andturn on the spot. The turn test is implemented by

checking sample points along a curve with intervalequal to the distance of one step of the pedestrianmoving with the anticipated turn speed.

B.3. Routine C: Maintain separation in a moving

crowd

For a pedestrianH, other pedestrians are consid-ered to be inHstemporary crowdif they are movingin a similar direction to Hand are situated within aparabolic region in front ofHdefined by y= (4/R)x2 +R whereR is the sensing range, y is oriented

in Hs forward direction and x is oriented laterally(Fig. 5(c)). To maintain a comfortable distance fromeach individual Ciin this temporary crowd, a direc-ted repulsive force (cf. [12]) given by fi=ri(di/|di|)/(|di| dmin) is exerted on H, where di is the vectorseparation ofCifrom H, and dminis the predefinedminimum distance allowed between H and otherpedestrians (usually 2.5 times Hs bounding boxsize). The constant ri is Cis perceived repulsive-ness to H(currently set to 0.025 for all pedestri-ans). The repulsive acceleration due to Hs

temporary crowdis given by a P

ifi=mwhere m isthe inertia ofH. The acceleration vector is decom-posed into a forward component af and a lateralcomponent al. The components afDt and alwiDt areadded toHs current desired velocity. The crowdingfactor widetermines Hs willingness to follow thecrowd, with a smaller value ofwigivingHa greatertendency to do so (currently 1.0 6 wi6 5.0).

B.4. Routine D: Avoid oncoming pedestrians

To avoid pedestrians not in ones temporarycrowd, a pedestrian H estimates its own velocityv and the velocities vi of nearby pedestrians Ci.Two types of threats are considered here. Byintersecting its own linearly extrapolated trajectoryTwith the trajectories Tiof each of the Ci, pedes-trian H identifies potential collision threats of thefirst type: cross-collision (Fig. 5(d1)). In the casewhere the trajectories ofHand Ci are almost par-allel and will not intersect imminently, a head-oncollision (Fig. 5(d2)) may still occur if their lateralseparation is too small; hence, H measures its lat-

eral separation from oncoming pedestrians.

http://-/?-http://-/?-

7/24/2019 sdarticle (40)

24/29

Among all collision threats, H will pick the mostimminent one C*. If C* poses a head-on collisionthreat, H will turn slightly away from C*. If C*

poses a cross-collision threat, H will estimatewho will arrive first at the anticipated intersection

point p

. If H

determines that it will arrive sooner,it will increase its speed and turn slightly awayfrom C*; otherwise, it will decrease its speed andturn slightly towards C* (Fig. 5(d1)). This behav-ior will continue for several footsteps, until thepotential collision has been averted.

B.5. Routine E: Avoid dangerously close pedestrians

This is the fail-safe behavior routine, reserved foremergencies due to the occasional failure of Rou-tines C and D, since in highly dynamic situationspredictions have a nonzero probability of beingincorrect. Once a pedestrian perceives anotherpedestrian within its forward safe area (Fig. 5(e)),it will resort to a simple but effective behaviorbrake as soon as possible to a full stop, then tryto turn to face away from the intruder, and proceedwhen the way ahead clears.

B.6. Routine F: verify new directions relative to

obstacles

Since the reactive behavior routines are executedsequentially (see Section4.3.1), motor control com-mands issued by Routines C, D or E to avoid pedes-trians may counteract those issued by Routines A orB to avoid obstacles, thus steering the pedestriantowardsobstacles again. To avoid this, the pedestrianchecks the new direction against surrounding obsta-cles once more. If the way is clear, it proceeds. Other-wise, the original direction issued by either the higher-level path planning modules or by Routine A, which-ever was executed most recently prior to the execution

of Routine F, will be used instead. However, occa-sionally this could lead the pedestrian toward futurecollisions with other pedestrians (Fig. 5(f)) and, if

so, it will simply slow down to a stop, let those threat-ening pedestrians pass, and proceed.

Appendix C. Optimally ordering the reactive behavior

routines

This appendix presents the details of the exhaus-tive search procedure (cf. [24]) that we employed tofind the best permutation ordering for activating thesix reactive behavior routines listed inAppendix B,which is CABFED.

C.1. Fitness

We evaluate the performance of the different acti-vation permutations of the reactive behavior routinesthrough a set of long-term simulations with differentnumbers of pedestrians. The performance measure isthe overall fitness F 1=N

PNi1Fiof a population

of N pedestrians over the course of a simulation.Here,Fi= (SiVi)

4 is the individualpedestrian fitness,where the safety factor Si= (1 Ci/50)

2 if Ci< 50and Si= 0 otherwise, with Cidenoting the averagenumber of frames in every 10,000 frames that pedes-trianiis involved in a collision either with stationaryobstacles or with other pedestrians, and where thevitality factor Vi= 1 ifRi> 0.5 and Vi= (2Ri)

8 other-wise, with the speed ratio Ridefined as the average

speed of pedestrian i in the simulation divided bythe pedestrians preferred speed. Note that the safetyand vitality factors, both of which take values in therange [0,1], conflict in the sense that one can be guar-anteed by sacrificing the other. Their product in Firewards the best tradeoff between safety and vitality.

C.2. Analysis of the results

Fig. 19plots the fitness (y-axis) of all 720 possibleactivation permutations of the six reactive behavior

routines (x-axis). Before carrying out this exhaustiveevaluation, we had originally guessed that a sensibleactivation order of the reactive behavior routines is

Fb

Fc

x= Indices of Permutationsy

=Fitness

0 120 240 360 480 600 720

0.2

0.4

0.6

0.8

1.0

Fig. 19. Plot of the fitness values of all permutations. The best activation permutation CABFED is atx = 246.


7/24/2019 sdarticle (40)

25/29

simply ABCDEF, which occurs atx = 1 in theplot. Clearly, this choice yields a performance nobetter than average. Although the plot at firstappears chaotic, we see a significant increase in thefitness function afterx = 240, which marks the tran-

sition from permutations that begin with reactivebehavior Routine B (in the range 121 6 x 6 240)to those that begin with Routine C (in the range241 6 x 6 360). Comparing the fitness values Fb inthe former range to thoseFcin the latter, we see thatthe mean fitness value of activation permutationsthat begin with Routine C is higher than the maxi-mum fitness value of those that begin with RoutineB. Sorting permutations by partial ordering andcomparing the shape of different parts of the plot,etc., we made the following observations:

Permutations P starting with A and B usuallygive poor performance.

The later that D appears in P, the better theperformance.

It is better for C to appear before A, but theyneed not be adjacent.

It is better for A to appear before B, but theyneed not be adjacent.

If F appears earlier than C, D, and E, it results inthe same performance as when Fis omitted.2

Almost all of the high-performance permutations

have A before B and after C, and end with D.

It is difficult to fully explain the above results, butwe offer the following four plausible heuristics:

1. Routine C should be activated befor

Documents

sdarticle (40)