Constraints and Design Choices in Building Intelligent Pilots for Simulated Aircraft ... · 2006-01-11 · To begin our effort, we have focused on building intelligent pilot agents

Constraints and Design Choices in Building Intelligent Pilotsfor Simulated Aircraft: Extended Abstract

Milind Tambe, Paul S. Rosenbloom and Karl SchwambInformation Sciences Institute

University of Southern California4676 Admiralty Way, Marina del Rey, CA 90292email: {tambe, rosenbloom, schwamb} @isi.edu

1. IntroductionThis paper focuses on our recent research effort aimed

at developing human-like, intelligent agents (virtualhumans) for large-scale, interactive simulationenvironments (virtual reality). These simulatedenvironments have sufficiently high fidelity andrealism[l 1,23] that constructing intelligent agentsrequires us to face many of the hard research challengesfaced by physical agents in the real world -- inparticular, the integration of a variety of intelligentcapabilities, including goal-driven behavior, reactivity,real-time performance, planning, learning, spatial andtemporal reasoning, and natural languagecommunication. However, since this is a syntheticenvironment, these intelligent agents do not have to dealwith issues of low-level perception and robotic control.Important applications of this agent technology can befound in areas such as education [14],manufacturing [11], entertainment [2, 12] andtraining [7, 24].

To begin our effort, we have focused on buildingintelligent pilot agents to control simulated militaryaircraft in battlefield simulation environments. Theseenvironments are based on Distributed InteractiveSimulation (DIS) technology [11], in which large-scaleinteractive simulations, potentially involving hundreds oreven thousands of simulated entities, are built from a setof independent simulators linked together via a network.These simulations provide cost-effective and realisticsettings for training and rehearsal, as well as testing newdoctrine, tactics and weapon system concepts. Ourintelligent pilot (henceforth IP) agents are to engage simulated combat with other automated agents andhumans in these environments. The IPs interact withtheir environments via simulated aircraft provided byModSAF [4], a distributed simulator that has beencommercially developed for the military.

Most of our early research effort, since the beginningof this project in the Fall of 1992, was focused on thecreation of IPs for simulated fighter jets. These IPs,either individually or in groups, can now engage insimulated "air-to-air" combat with other individual orgroups of fighters. I Since the summer of 1994, we have

begun developing IPs for other types of aircraft, such asbombers and attack helicopters, involved in verydifferent types of missions, such as "air-to-ground"missions. Figure 1 shows a snapshot taken fromModSAF’s plan-view display of a combat situation.Here, IPs are controlling simulated AH-64 Apache attackhelicopters. The other vehicles in the figure are a convoyof "enemy" ground vehicles such as tanks, and anti-aircraft vehicles. The helicopters are at approximately2.5 miles from the convoy of ground vehicles. (All ofthe vehicle icons in the figure have been artificiallymagnified for visibility.) The background shadingdelineates the terrain and contour lines. The contourlines indicate that the two helicopters are on a hill, justbehind a ridge, while the convoy of enemy vehicles is ina valley. This is an ideal location for the IPs to hide theirhelicopters. The IPs unmask their helicopters by poppingout from behind the ridge to launch missiles at the enemyvehicles, and quickly remask (hide) by dipping behindthe ridge to survive retaliatory attacks. The smallerwindows along the side of the figure display informationabout the active goal hierarchies of the IPs controllingthe helicopters. In this case, one of the IPs has seen theenemy vehicles, and is firing a missile at them. Thesecond IP is unmasking its helicopter. (The smallwindow at the bottom left is for simulation control.) Theeventual target of our effort is to deliver to ARPA IPs fora large variety of simulated military aircraft, includingfighter jets, helicopters, bombers, troop transports,refueling aircraft and reconnaissance aircraft. These IPsare to participate in a series of demonstrations called thesimulated theater of war (STOW) demonstrations. Onesuch demonstration, called STOW-E, took place inNovember 1994. It involved about two thousandsimulated military vehicles -- some of them humancontrolled and others by automated agents. IPsparticipated successfully in this demonstration. The nextdemonstration, in 1997, is planned to be a much largerscale one, involving the participation of ten to fiftythousand automated agents, and up to one thousandhumans.

IThe work on fighter jet IPs was a collaborative effort withRandolph Jones, John Laird, Frank Koss and Paul Nielsen of theUniversity of Michigan.

203

From: AAAI Technical Report SS-95-02. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved.

Figure 1: A snapshot of ModSAF’ s simulation of an air-to-ground combat situation.

For research and development of the IPs, we havefocused on Soar, a software architecture being developedas a basis for integrated intelligent systems [10, 18], andas a unified theory of cognition [15, 17]. (For readersunfamiliar with Soar, a brief description appears in theAppendix.) The system we have developed within Soar,TacAir-Soar [6, 22], represents a generic IP. A specificIP is created by specializing it with specific parametersand domain knowledge. TacAir-Soar-based IPs can flyfighter jets, bombers and attack helicopters, on differenttypes of "aircraft vs aircraft", as well as "aircraft vsground target" combat missions. The creation of theseIPs has consumed about two year’s worth of effort of ourresearch team of about six people.

This paper outlines some of the constraints and designchoices in our effort to implement IPs in the form ofTacAir-Soar. Overall, the design of TacAir-Soar IPs isguided by two general sets of constraints. The first set ofconstraints, which we will refer to as top-downconstraints, arises from the combat simulationenvironment. Figure 2 lists these constraints in the formof capabilities that an IP must possess if it is to succeedin this environment. We have attempted to match these

capabilities with the list of issues provided by thesymposium organizers in their call for papers. The matchis not one-to-one, and in some cases, there are multiplecapabilities that are relevant to an issue. Two of theissues -- agent generality and awareness of space andtime -- are our additions to the list of issues. Note thatthis simulation environment has been commerciallydeveloped based on the specifications of the military, andas agent designers, we have little control over it. Thus,the list of capabilities cannot be modified to suit our(agent designer’s) needs.

2O4

Issue l::Knowledge representation and organization: Knowledge must be organized and represented such that IPspossess the following capabilities:

a. Goal-driven behavior: IPs need to accomplish the missions specified to them, while remaining faithful tothe mission-related constraints. In Figure 1, the IP’s mission is to provide support to friendly groundforces while being stationed behind a hill. These missions and mission related parameters help to specifya pilot’s goals, including high-level goals such as execute mission and survive, and lower level goals suchas fire a missile at an opponent, hide from an opponent to evade his/her missile etc.

b. Knowledge-intensive behavior: For an IP to be able to generate goal-driven behavior, it needs a largeamount of knowledge related to the various missions, mission-related parameters, tactics, maneuvers,aircraft, missile and radar types and their parameters, etc.

Issue 2::Coordination of actions and behaviors: In addition to the capability of goal-driven behavior mentionedabove, the capabilities relevant to this issue are:

a. Reactivity: Situations in combat environments can change very rapidly. Unexpected events can occur,e.g., an on-target missile may fail to explode, or an aggressive adversary may take some preemptive actiondisrupting an on-going maneuver. An IP must be able to react to such a rapidly evolving situation.

b. Overlapping of performance of multiple high-level tasks: IPs often have to perform multiple taskssimultaneously, e.g., an IP may be increasing the altitude of its helicopter, while communicating with anIP of a friendly helicopter, and simultaneously attempting to identify the vehicles in its field of view.

c. Planning: A helicopter’s IP may need to replan a route if it sees that the planned one is blocked; or it mayneed to plan a hiding location for its helicopter after launching missiles at the enemy as in Figure 1.

Issue 3::Learning: Although learning has not proven to be a critical requirement so far, we expect learning toeventually play an important role in improving performance both during and across engagements.

Issue 4::Human-like behavior: For realism in training and tactics development, an IP must exhibit human-likecapabilities for goal-driven behavior, reactivity, planning, learning, and in fact all of the other capabilities listedin this figure. In addition, an IP must conform to human reaction times and other human limitation. Thus, an IPmust not maneuver a simulated aircraft like either a "superhuman" or an idiot.

Issue 5::Real-time Performance: IPs have to participate in combat simulations where they may interact withhumans acting in real-time.

Issue 6::Generality: Although our effort began with implementing IPs for simulated F-14 fighter aircraft in aspecific air-to-air combat situation, the effort has extended to other combat situations, other fighter aircraft, andnow to air-to-ground combat situations faced by helicopters. This has forced a continuous movement towardsdesign generality, so that capabilities can be shared and re-used.

Issue 7::Awareness of space and time: The capabilities involved here include:a. Spatial reasoning: Helicopter pilots need to reason about the terrain, e.g., to hide their helicopters behind a

hill, so that they can protect themselves from enemy fire.

b. Temporal reasoning: IPs need to reason about the effect of actions over time intervals, e.g., to locate theposition of an opponent who may have become invisible for a minute.

Issue 8: :Interface with other agents: The capabilities involved include:a. Multi-agent co-ordination: Mission specifications for IPs normally require them to execute missions in

groups of two, four or eight aircraft. Coordination among such agents is key to the success of suchmissions.

b. Agent modeling (especially opponent modeling): For effective interaction with other agents, IPs need totrack other agents’ low-level actions, such as the movement of their aircraft, to infer higher-level plans,maneuvers and goals.

c. Communication: Communication is important for coordination among IPs, e.g., they may communicate inperforming a coordinated maneuver or exchange information about enemy aircraft.

d. Explanation: An IP needs to explain the reasons for its actions to "domain experts", so that they may havesome confidence in its decision-making capability.

Figure 2: List of capabilities for Intelligent Pilots (based on [22]).

205

The second set of constraints arises from the fact thatin designing IPs, we have not engineered a new agentarchitecture from scratch. Instead, we have begun withSoar as the underlying architecture. Soar embodies avariety of constraints resulting from theoretical andpsychological considerations, as well as refinementsinspired by its application in a large number and varietyof domains over the past dozen years. We will refer tothese constraints as bottom up constraints. A majority ofthem are positive in that they provide concrete integratedsolutions, so that TacAir-Soar-based IPs can easilyexhibit many of the required capabilities. However, forsome of the capabilities, the constraints provide only aloose framework, and in fact, in some cases they appearto be a hindrance. Should Soar be modified in responseto such hindrances? The answer is not obvious as yet.Given that the design decisions in Soar attempt tocapture key insights from a variety of domains,architectural modifications are not to be taken lightly.The hindrances that Soar offers could actually beopening up new avenues for addressing the relevantissues.

We may consider the TacAir-Soar IPs, although basedon Soar and constrained by Soar, as nonethelesspossessing an architecture that is distinct from Soar.This paper will focus on the constraints -- both top-down and bottom-up--on this IP architecture, as well asthe design choices involved in it. The next sectiondiscusses these constraints and design choices in moredetail. In doing so, it focuses on the issues listed inFigure 2, particularly issues 1-6. Although issues 7-8 arealso important, most of the relevant capabilities arecurrently under investigation, and their interactions withthe Soar architecture are as yet unclear.

2. Issues in Designing Intelligent PilotsWe begin with a discussion of the knowledge

organization and representation in TacAir-Soar, sincethat illustrates the basic design of the system. Thesubsequent sequencing of issues is an attempt to bestillustrate various design choices in TacAir-Soar.

2.1. Knowledge Representation andOrganization

The top-down environmental constraint on knowledgeorganization and representation is that TacAir-Soar IPsmust have the capabilities for goal-driven and knowledgeintensive behavior. The bottom-up constraint offered bySoar is in the form of its architectural primitives --specifically, goals, problem spaces, states and operatorsat an abstract problem space level, and a rule-basedsystem plus a preference language at a lowerimplementation level -- that aid in knowledgeorganization and representation (Appendix I presents description of these levels). All of the knowledge in

TacAir-Soar is represented in terms of these architecturalprimitives at the two levels. In this section, we willfocus on the problem space level; we will return to thelower implementation level in the next section.

Given the above top-down and bottom-up constraints,the available design choice is to select the appropriateproblem spaces -- with appropriate combinations ofoperators- from the knowledge acquired from thedomain experts. An appropriate problem space refers tothe relevant knowledge being quickly available to an IPwhen it attempts to achieve a particular goal. Forexample, a problem space combining operators foremploying different weapon types enables an IP toquickly access relevant knowledge to achieve its goal ofdestroying an opponent. If the same set of operators arespread out in different problem spaces, an IP would notbe able to easily and quickly access them. As anotherexample, consider a complex tactic employed by an IP toavoid enemy fire in Figure 1. It masks its helicopter bydipping behind the ridge, flies at a very low altitude to adifferent location while still masked behind the ridge (sothe enemy cannot predict its location) and then unmasksby popping out from behind the ridge. Representing thistactic as a set of appropriately conditioned operators thatmask, select a new masking location, move to newmasking location and unmask, ensures that theknowledge is available in other situations where theremay be a need to unmask. If the same tactic isrepresented as a single monolithic operator, it would beunavailable (this motivation is related to an additionalmotivation that big monolithic operators are difficult toexecute in this dynamic environment).

To describe the basics the knowledge organization thatresults from the above consideration, we will focus on aconcrete illustrative example of an IP as it controls ahelicopter and fires a missile at a tank, as shown inFigure 1. Figure 3 depicts an IP’s hierarchy of problemspaces and operators. Goals are represented implicitly inthis diagram as the desires to apply impassed operators.Boxes indicate problem spaces, with names in bold totheir right. Names within boxes indicate operatorsavailable within problem spaces. An italicized nameindicates a currently selected operator in a problemspace; non-italicized names indicate the other availableoperators in the problem space.

This hierarchy gets generated as follows. In the top-most problem space (TOP-PS), the IP is attempting accomplish its mission by applying theEXECUTE-MISSION operator. The terminationcondition of this operator is the completion of the IP’smission (which is to support friendly ground forces for mission-specified time period). Since this has not yetbeen achieved, a subgoal is generated to complete theapplication of EXECUTE-MISSION. TheEXECUTE-MISSION problem space is selected for usein this subgoal. Operators in this problem space----ATTRIT, RAID, FLY-WING and others -- implement

206

A TTR ITRAIDFLY-WING

1 /ENGA G E I

]FLY-ROUTEMASK-&-OBSERVE

EMPLOY-WEAPONS l

~EMPLOY-MISSILE

1L4 UNCN-MISSILE lSELECT-MISSILESUPPORT-M1SS|LE

I~PUSH-FIRE-BUTTON

POINT-AT-TARGETLASER-AT-TARGET

TOP-PS

EXECUTE-MISSION

A’R’RIT

ENGAGE

EMPLOY-WEAPONS

EMPLOY-MISSILE

LAUNCH-MISSILE

Figure 3: An IP’s problem space hierarchy as it launch

the missions that IPs can engage in. IP selects theATTRIT operator to perform its mission. The terminationcondition of this operator -- that the opponent forces arereduced in strength -- is also not yet achieved, leading toa second subgoal of completing this mission. TheATTRIT problem space is selected for use in thissubgoal. Within this problem space the IP selects theENGAGE operator, to engage the opponents in combat.This leads to a fourth subgoal, into the ENGAGEproblem space, within which EMPLOY-WEAPONS isapplied to either destroy the opponents or to force themto run away. This leads to a fifth subgoal into theEMPLOY-WEAPONS problem space. Here, theEMPLOY-MISSILE operator indicates the choice of amissile as the weapon to use in the given situation. Thisleads to the next subgoal. The EMPLOY-MISSILEproblem space is selected for use in this subgoal. TheLA UNCH-MISSILE operator then indicates that the IP isabout to launch a missile. Launching a missile includesthe operations of turning to point the missile launcher atan enemy vehicle, providing initial guidance to themissile (e.g., by illuminating the target with a laser), and

also pushing the fire button. The PUSH-FIRE-BUTTONoperator in the LAUNCH-MISSILE problem spaceperforms this latter function, by issuing a fire missilecommand to the simulator, thus causing a missile to befired at the selected target. If the firing of the missilecauses the termination conditions of any of the operatorsin the hierarchy to be satisfied, then that operator isterminated, and all of the subgoals generated in responseto that operator are removed -- they are now irrelevant.In this case, since the firing of the missile causes thetermination condition of the LA UNCH-MISSILE operatorto be achieved, that operator is terminated and replacedby the SUPPORT-MISSILE operator, to provide remoteguidance to the missile from the helicopter. Thesubgoals associated with the LAUNCH-MISSILEoperator are flushed.

This goal/problem space hierarchy provides TacAir-Soar the capability of goal-driven behavior. In addition,it also provides the capability of knowledge-intensivebehavior. The knowledge-intensive aspect arisesbecause all of TacAir-Soar’s behavior is driven byaccessing relevant knowledge about action. Thegeneration of alternative operators occurs via rules thatexamine the goal, problem space, and state and createinstantiated operators that may be relevant. Once theseoperators are generated, further rules fire to generatepreferences that help select the operator most appropriatefor the current situation. Once an operator is selected,additional rules fire to perform the appropriate actions.

Some evidence of the appropriateness of our initialproblem space and operator design (from fighter jet IPs)came in the form of the extensive re-use of theseproblem spaces and operators for helicopter IPs. The air-to-air combat missions for fighter jet IPs and the air-to-ground combat missions for helicopter IPs are quitedifferent. However, there are some skills that they doshare, such as firing missiles; and indeed, this sharing isreflected in the reuse of problem spaces and operatorsfrom the fighter jet IPs in helicopter IPs. Table Iillustrates this situation. It shows the total number ofproblem spaces, operators and rules in the helicopter IPs,the number reused from the fighter jet IPs, and thepercentage reuse. Note that we are still in the early stagesof developing a helicopter IP, and thus, the amount ofre-use could change in the future.2

2.2. Coordination of Behaviors and ActionsThe constraints here are as follows:

* Top-down: An IP must have the ability to combinegoal-driven and reactive behaviors; it must also

2Since it is not straightforward to get a precise count of rule reuse,it is overestimated somewhat here. In particular, there are rules thatare re-used from the fighter jet IPs, which although not useful for thistask, are harmless, and have not been weeded out.

2O7

Entity Total in Reused from Percentagereused Helicopter IP Fighter jet IP reuse

Problem spaces 14 6 42

Operators 45 29 64

Rules 1196 1030 86

Table I: Reuse of problem spaces, operators and rules in IPs.

have the ability to simultaneously perform multiplehigh-level tasks. In addition, an IP must have thecapability of planning, and constraining its behaviorbased on the resulting plans [3]. However, since wehave only recently begun investigating the issue ofplanning in this environment, we will not discuss ithere.

¯ Bottom-up: The relevant constraints arising fromSoar are its decision procedure and its rule-basedrepresentation of knowledge, both of whichfacilitate TacAir-Soar’s ability to support the abovecapabilities; and its single goal hierarchy, thatpossibly hinders these capabilities.

Here, there are two design decisions for TacAir-SoarIPs. The first, rather obvious one, is to take advantage ofthe features of the Soar architecture that facilitate thecombination of goal-driven and reactive behaviors.These features are its open and integrative decisionprocedure and its production (rule-based) representation.All of TacAir-Soar’s knowledge is encoded in terms ofproductions. These productions can test the current goalsand operators being pursued (e.g., EMPLOY-MISSILE orEVADE-MISSILE), as well as relevant aspects of thecurrent situation (e.g., sensors may indicate a suddenmovement by an enemy vehicle). Based on either orboth of these tests, the productions may suggest newoperators to pursue at any level in the goal hierarchy,generate preferences among suggested operators, andterminate existing operators. When the productions havefinished making their suggestions (generating theirpreferences) the decision procedure integrates allavailable suggestions/preferences together to determinewhat changes should actually be made to thegoal/operator hierarchy. This enables TacAir-Soar-basedIPs to be reactive, and also combine this reactivity withgoal-driven behavior. While using rules in this mannerto provide reactivity is similar to the approach taken inpure situated-action systems [1], there is one importantdifference: the use of persistent internal state. TacAir-Soar IPs cannot rely on the external environment toprovide them all of the required cues for effectiveperformance. For instance, even if an IP evades enemyfire by masking its helicopter behind a hill, it needs tomaintain information regarding enemy position on itsstate. Not doing so would cause it to be repeatedlysurprised by enemy fire when it unmasks, giving theenemy ample time to fire at the helicopter. Therefore,TacAir-Soar IPs do engage in a substantial effort todeliberately maintain internal state information.

With respect to simultaneous performance of multiplehigh-level tasks, there is an apparent hindrance from theSoar architecture -- TacAir-Soar cannot directlyconstruct multiple independent goal hierarchies inservice of multiple high-level tasks. For instance,consider a situation where an IP needs to reason about anew vehicle in its field of view while simultaneouslyattempting to fire a missile. In this case, the IP constructsa goal hierarchy for firing the missile, as shown in Figure3. The problem is that, though a second goal hierarchy isalso required here- to identify the new vehicle in thefield of view, to check if it is a threat to IP’s helicopter,etc. -- the IP cannot independently construct it becauseof the constraint imposed by Soar that there be only asingle goal hierarchy.

There are two general approaches to resolving thisapparent mismatch between Soar’s architectural featuresand requirements of the environment [8]. The firstapproach is to view the single goal hierarchy as anegative constraint, and attempt to relax the constraint byproposing a change in the Soar architecture to allow theuse of a forest of goal hierarchies. Such a forest wouldenable TacAir-Soar IPs to easily engage in simultaneousperformance of multiple high-level tasks. However,changing the Soar architecture in this manner leads toimportant questions regarding the interactions of thisapproach with other aspects of the architecture. Thesequestions are actively under investigation.

The second approach is to consider the mismatchcaused by Soar’s single goal hierarchy as only anapparent one rather than a real one, and attempt todevelop solutions that will conform to this constraint.These include our currently implemented solution inTacAir-Soar. It requires an IP to commit to just one ofthe goal hierarchies, and to install operators for theremaining high-level goals, as needed, at the bottom ofthis goal hierarchy. Thus, in the above example, the IPcommits to a hierarchy for firing a missile, but theninstalls an identify-vehicle operator as a subgoal of thepush-fire-button operator. While this solution can bemade to work, one significant problem here is that anyattempt to update the upper goal hierarchy eliminates thelower one, because Soar automatically interprets "lower"as meaning "dependent on", even though the hierarchiesreally are independent here. We generally attempt tominimize this problem by ensuring that the lowerhierarchy is present for only a brief instant, however, amore general solution seems necessary. A secondsolution in this class is to attempt to merge operatorsacross goal hierarchies so that a single hierarchy canrepresent the state of all of the hierarchiessimultaneously [5]. However, doing automatic mergingof operators is an open issue. A third solution attempts toflush the current goal hierarchy whenever a new oneneeds to be established, while depending on learning tocompile goal hierarchies into single operators, and thusto reduce (or eliminate) the overhead of constantly

208

flushing and rebuilding large goal hierarchies [19].

2.3. Support for LearningThe top-down constraint on learning is that it is an

important capability for IPs, though not a critical one asyet; while the bottom-up constraint is that learning mustoccur via chunking, which is Soar’s only learningmechanism. In general, Soar systems tightly couplechunking with performance, and the benefits from thistight-coupling have already been illustrated in a varietyof Soar systems. However, TacAir-Soar does notperform extensive planning or search (or other non-execution oriented cognitive activities) which is wherelearning can provide extensive speedups -- as explainedlater, TacAir-Soar is for the most part a model of expertexecution behavior. Nonetheless, chunking can be usedwithin the current system to "collapse" the goal hierarchyand compile the knowledge even further for a linearspeedup.

Experiments with chunking in TacAir-Soar havehowever uncovered three interesting issues. The firstissue relates to the fact that external actions in thisdomain take relatively extensive amounts of timecompared to the time it takes to expand the goalhierarchy. For instance, consider an IP controlling thehelicopter by expanding the goal hierarchy to unmaskfrom a position behind a hill (see Figure 1). Expandingthis goal hierarchy takes 0.1 to 0.2 seconds. Chunkingcould cut down this time to say 0.01 seconds, byproviding commands to unmask without requiring theexpansion of the hierarchy. However, the execution ofthe commands in the simulated world takes about 100seconds. Chunking simply cannot improve upon thisexecution time -- given the speed of the helicopter, italways takes about 100 seconds to unmask. Thus, if wecompute the overall speedup in performance due tochunking for the task of unmasking, as is normally donein other Soar systems, it is negligible: 100.1/100.01.Second, typically, in Soar systems, chunks obviatesubgoaling since they instantaneously produce resultsthat originally required the subgoaling. However, inTacAir-Soar, the long execution times inhibit theinstantaneous reproduction of results. For instance, evenafter a chunk issues a command to unmask, thehelicopter takes 100 seconds to unmask. This delaycauses Soar to expand the goal/subgoal hierarchy. Thus,even though chunking seeks to eliminate the expansionof the goal hierarchy, the long execution times preventimmediate feedback and cause the hierarchy to beexpanded nonetheless. While neither of the two issuesthe negligible speedup, or the expansion of the hierarchyafter chunking -- illustrate necessarily inappropriatebehavior, these are contrary to the effects of chunking onmost other Soar systems.

A third issue that comes up in our experiments is theinteraction between chunking and the performance of

multiple high-level tasks (discussed in the previoussection). In particular, chunking is based on the presenceof the goal-subgoal hierarchy. Modifications to thishierarchy -- either in the form of insertion ofindependent goals as subgoals of existing goals, or in theform of creation of a forest of goal hierarchies -- bothinteract with chunking. There are many such issues, andaddressing them is one of the major outstandingchallenges.

There are again two approaches being investigated toaddress this challenge. The first attempts to modifychunking and its dependence on the single goalhierarchy. The second attempts to maintain chunking inits present form, by suggesting changes in other aspectsof the architecture, such as subgoaling. There is muchon-going research on this topic [20]. Since learning hasnot been a critical requirement so far -- particularlysince expert-level performance is already being modeled--the key design choice here has been to not introducelearning, at least as yet, in TacAir-Soar.

2.4. Psychology: Mimicking Human Capabilitiesin Design

Exhibiting human-like capabilities is key to thesuccess of IPs in this environment. This includes human-like flexibility in behavior, as well as conformance tohuman reaction times and human limitations (such asinability to make very hard turns). In particular, themotivation in creating IPs is not to win simulated combatby any and all means available. Rather, it is aid increating an effective battlefield simulations that providesrealistic settings for training humans, developing tacticsor evaluating products. For instance, if the IPs react tooquickly (or not quickly enough), trainees may learn to actin an excessively defensive (or aggressive) manner.Training in these situations could actually be harmful.Additionally, IPs need to be able to explain theirbehavior to human experts so that they can gainconfidence in IP’s capabilities. If experts realize that IPs’are not performing their tasks as humans would, theymay not have confidence in IPs’ behavior.

Thus, closely mimicking human capabilities indesigning IPs is not a choice in this environment, but anecessity. The Soar architecture provides us greatsupport in this endeavor- it is a developing unifiedtheory of cognition, and provides a number ofconstraints, e.g., on knowledge representation, reactiontimes, accuracy and learning, that start us off in the rightball park with respect to psychological verisimilitude.

2.5. Real-time PerformanceThe top-down constraints on real-time performance

are: (i) for human-like behavior, IPs must remain withinplausible cognitive limits [15] (decisions must occur inless than 300 milliseconds); (ii) for reasonable

209

performance, IPs must remain within simulator timelimits (decisions must occur within 500 milliseconds).While these are not hard deadlines, missing themcontinuously can be fatal. In addition to theseconstraints on individual decisions, there is also a top-down constraint on the number of decisions beforeaction. In particular, if IPs engage in extensive meta-level reasoning or planning before acting, that can beproblematic in time-critical segments of combat. Thebottom-up constraints for real-time performance are thatat a lower implementation level, all of the knowledge inSoar is represented in the form of rules. All of theserules, plus all of the changes that they cause, must beprocessed at every decision. Fortunately, the Soararchitecture is already heavily optimized with respect tothis processing, with a new C-based implementation [9]that is a factor of 15-20 faster than the previous Lisp-based implementation. Nonetheless, rule processing canbe a significant burden for real-time performance.

Given these constraints, there were two key designdecisions for TacAir-Soar. The first involved attackingthe problem of IPs engaging in potentially large numbersof decisions before action. The design decision here wasto focus on expert-level performance. This expert-levelknowledge allows operators and actions to be selectedwithout impasses occurring in Soar, and thereforewithout extensive numbers of decisions.

The second design decision here is with respect to thetime for individual decisions in TacAir-Soar. We haveattempted to minimize the time it takes to process rulesby using rule coding styles that help ensure inexpensiverule match [21], and detecting and eliminatingunnecessary rule firings. In addition, we have attemptedto avoid flooding an IP with unnecessary information,and thus avoid forcing it to perform a lot of low-levelcomputation. For instance, consider an IP controlling ahelicopter that is given the task of flying "NAP-of-the-EARTH" (NOE). This involves flying very close to theterrain, about 25 feet above ground level. Here, the IPneeds some information about the terrain, so that it doesnot crash into the (simulated) forests, hills, or otherterrain features. Yet, providing the IP a description ofevery terrain feature in its field of view would beproblematic. The solution we have adopted is to providethe IP information about the altitude and location of thehighest point within a hundred meters of its direction offlight -- the IP then attempts to fly 25 feet above thishighest point. Although this topic remains underinvestigation, preliminary results indicate that theinformation about this single point is sufficient to enablean IP to fly NOE. This approach may seeminappropriate. However, it turns out that designers ofmilitary aircraft use similar strategies to reduce a pilot’scognitive load as much as possible by off-loading manyof the low-level, bottom-up computations onto theplane’s instruments.

3. ConclusionThis paper described the lessons learned in designing

some specific agents- IPs for simulated battlefieldenvironments. For these IPs, the simulated environmentis their target environment; we do not intend for them tofly real aircraft! Nonetheless, the IPs face the hardresearch challenge of integrating a broad range ofcapabilities (see Figure 2) so that they can participate simulated combat, both with and against humans. Toaddress this challenge, we have been building a systemcalled TacAir-Soar that is based on the Soar integratedarchitecture. TacAir-Soar IPs may be considered ashaving their own architecture, which although distinctfrom Soar, is nonetheless based on Soar, and constrainedby both Soar and the combat simulation environment.The paper discussed the design decisions in TacAir-SoarIPs, and illustrated that: (i) the Soar architecture capable of providing specific integrated solutions so thatTacAir-Soar can easily exhibit a number of the requiredcapabilities; (ii) there are other capabilities for which theSoar architecture provides only a loose framework fortheir development, or indeed proves somewhat of ahindrance to their development -- these have presentedan interesting set of research issues, and they may likelycontribute to the further evolution of the Soararchitecture.

AcknowledgmentWe gratefully acknowledge the contribution of the

other members of the TacAir-Soar team involved in thecreation of the TacAir-Soar system, particularly,Randolph Jones, John Laird, Frank Koss, Lewis Johnson,Jill Lehman, Paul Nielsen, and Robert Rubinoff. Thisresearch was supported under contract N00014-92-K-2015 from the Advanced Systems Technology Office(ASTO) of the Advanced Research Projects Agency(ARPA) and the Naval Research Laboratory (NRL) the University of Michigan (and a subcontract to theUniversity of Southern California Information SciencesInstitute from the University of Michigan). This projectwould not have been possible without the critical supportof Cmdr. Dennis McBride of ARPAJASTO; and thetechnical support by Ed Harvey, Tom Brandt, BobRichards, and Craig Petersen of BMH Inc.; AndyCeranowicz and Joshua Smith of Loral Inc.; and DavidKeirsey of Hughes Aircraft Co.

Appendix I. SoarThe Soar integrated architecture forms the basis of our

work on intelligent automated pilots. In the following,Soar is characterized as a two-level hierarchy ofdescriptions. The higher, problem space level isdiscussed briefly, followed by a more detailedarchitecture level or symbol level description.

At the higher, problem space level of description, Soar

210

can be described as a set of interacting problem spaces.All tasks are formulated as achieving goals in theseproblem spaces. A problem space consists of states andoperators. Within a problem space, knowledge is used torepeatedly select and apply operators to a state to reach agoal. A state in a problem space is a Soar agent’s currentsituation, possibly including perceptions of the externalenvironment as well as internally generated structures.An operator may directly modify this state or it mayinstigate an external action/event, such as causing anaircraft to turn. The perception of the effects of thisexternal action can then indirectly modify the state. Theoperator application may be explicitly terminated eitherupon its successful completion (e.g., the aircraft hasturned to the desired heading), or upon a need to beabandon it (e.g., it is necessary to dive instead ofcontinuing to turn the aircraft).

At the lower, architectural (implementation) level description, Soar’s problem space operations can bedescribed in more detail as follows. The knowledge forselection, application and termination of operators isstored in the form of productions (condition-action rules)in Soar’s knowledge-base, a production system.Changes in Soar’s state, including changes caused byperceptions, can trigger these productions to fire. Uponfiring, the productions generate preferences for objects inthe state as well as for operators. A preference is anabsolute or relative statement of worth of an entity, suchas an operator, given the current situation. For instance,a preference might be that an operator to turn left isbetter than an operator to turn right, given the currentsituation. Following their generation, Soar’s decisionmechanism examines the operator preferences andselects an operator as the current one. This selection cantrigger productions for operator application, which asmentioned above, can modify the state directly orindirectly (by instigating external actions). Thesemodifications may in turn trigger additional productionsthat further modify the state or generate preferences forthe termination of the selected operator and the selectionof the next operator. Based on the new set ofpreferences, the current operator may then be replaced bya different operator. The cycle of operator selection,application and termination can thus continue. However,in some cases, productions generating appropriatepreferences may fail to be triggered, e.g., no preferencesmay be generated for the termination of the selectedoperator, or for the selection of the next operator. In suchcases, an impasse occurs.

Soar responds to an impasse by creating a subgoal toresolve it. A new problem space is installed in service ofthis subgoal. The process of selection and application ofoperators recurs in this subgoal. If a further impasseoccurs while resolving one impasse, a new subgoal iscreated. With this recursive subgoaling, a wholegoal/problem-space hierarchy may be generated. Animpasse is resolved if the processing in a subgoal can

return the requisite result (leading to a decision in supergoal) to the original space.

Subgoals and their results form the basis of Soar’s solelearning mechanism, chunking. Chunking acquires newproductions, called chunks, that summarize theprocessing that leads to results of subgoals. A chunk’sactions are based on the results of the subgoals. Itsconditions are based on those aspects of the goals abovethe subgoal that are relevant to the determination of theresults. Once a chunk is learned, it fires in relevantlysimilar future situations, directly producing the requiredresult, possibly avoiding the impasse that led to itsformation. This chunking process is a form ofexplanation-based learning (EBL) [13, 16].

References1. Agre, P. E., and Chapman, D. Pengi: Animplementation of a theory of activity. Proceedings ofthe National Conference on Artificial Intelligence,August, 1987.

2. Bates, J., Loyall, A. B., and Reilly, W. S. Integratingreactivity, goals and emotions in a broad agent. Tech.Rept. CMU-CS-92-142, School of Computer Science,Carnegie Mellon University, May, 1992.

3. Bratman, M. E., Israel, D. J., and Pollack, M. E."Plans and resource-bounded practical reasoning".Computational Intelligence 4, 4 (1988), 349-355.

4. Calder, R. B., Smith, J. E., Courtemanche, A. J., Mar,J. M. F., Ceranowicz, A. Z. ModSAF behaviorsimulation and control. Proceedings of the Conferenceon Computer Generated Forces and BehavioralRepresentation, March, 1993.

5. Covrigaru, A. Emergence of meta-level control inmulti-tasking autonomous agents. Ph.D. Th., Universityof Michigan, 1992.

6. Johnson, W. L., Jones, R. M., Keirsey, D., Koss,F. V., Laird, J. E., Lehman, J. F., Neilsen, P. E.,Rosenbloom, P. S., Rubinoff, R., Schwamb, K., Tambe,M., van Lent, M., and Wray, R. Collected Papers of theSoar/IFOR project, Spring 1994. Tech. Rept. CSE-TR-207-94, University of Michigan, Department ofElectrical Engineering and Computer Science, 1994.Also available as Technical Reports ISI/SR-94-379 fromthe University of Southern California InformationSciences Institute and CMU-CS-94-134 from CarnegieMellon University, WWW accesshttp://ai.eecs.umich.edu/ifor/papers/index.html.

7. Jones, R. M., Tambe, M., Laird, J. E., andRosenbloom, P. Intelligent automated agents for flighttraining simulators. Proceedings of the ThirdConference on Computer Generated Forces andBehavioral Representation, March, 1993.

211

8. Jones, R., Laird, J. E., Tambe, M., and Rosenbloom,P.S. Generating behavior in response to interactinggoals. Proceedings of the Fourth Conference onComputer Generated Forces and BehavioralRepresentation, May, 1994.

9. Laird, J. E., Congdon, C. B., Altmann, E., &Doorenbos, R. Soar User’s Manual: Version 6. 1edition, 1993.

10. Laird, J. E., Newell, A. and Rosenbloom, P. S."Soar: An architecture for general intelligence".Artificial Intelligence 33, 1 (1987), 1-64.

11. The DIS steering committee. The DIS vision: Amap to the future of distributed simulation. Tech. Rept.IST-SP-94-01, Institute for simulation and training,University of Central Florida, May, 1994.

12. Maes, P., Darrell, T., Blumberg, B., and Pentland, S.Interacting with Animated Autonomous Agents.Proceedings of the AAAI Spring Symposium onBelievable Agents, 1994.

13. Mitchell, T. M., Keller R. M., and Kedar-Cabelli,S.T. "Explanation-based generalization: A unifyingview". Machine Learning 1, 1 (1986), 47-80.

14. Moravec, H. Mind ChiMren. Harvard UniversityPress, Cambridge, Massachusetts, 1990.

15. Newell, A. Unified Theories of Cognition. HarvardUniversity Press, Cambridge, Massachusetts, 1990.

16. Rosenbloom, P. S. and Laird, J. E. Mappingexplanation-based generalization onto Soar. Proceedingsof the Fifth National Conference on ArtificialIntelligence, 1986, pp. 561-567.

17. Rosenbloom, P. S. and Laird, J. E. On UnifiedTheories of Cognition: A response to the reviews. InContemplating Minds: A Forum for ArtificialIntelligence, W. J. Clancey, S. W. Smoliar, &M. J. Stefik, Eds., MIT Press, Cambridge, MA, 1994, pp.141-165. (Reprint of Rosenbloom & Laird, 1993, inArtificial Intelligence, vol. 59, pp. 389-413).

18. Rosenbloom, P. S., Laird, J. E., Newell, A., andMcCarl, R. "A preliminary analysis of the Soararchitecture as a basis for general intelligence".Artificial Intelligence 47, 1-3 (1991), 289-325.

19. Rubinoff, R., and Lehman, J. Natural languageprocessing in an IFOR pilot. Proceedings of the FourthConference on Computer Generated Forces andBehavioral Representation, 1994.

20. Laird, J. E. Discussion of external interaction.Panel Discussion at Soar Workshop XIII.

21. Tambe, M. and Rosenbloom, P. Eliminatingexpensive chunks by restricting expressiveness.Proceedings of the Eleventh International Joint

Conference on Artificial Intelligence, 1989, pp. 731-737.

22. Tambe, M., Johnson, W. L., Jones, R., Koss, F.,Laird, J. E., Rosenbloom, P. S., and Scwhamb, K."Intelligent agents for interactive simulationenvironments". AI Magazine (To appear) (1995).

23. Thorpe, J. A., Shiflett, J. E., Bloedorn, G. W.,Hayes, M. F., Miller, D. C. The SIMNET network andprotocols. Tech. Rept. 7102, BBN systems andtechnologies corporation, July, 1989.

24. Webber, B., and Badler, N. Virtual interactivecollaborators for simulation and training. Proceedings ofthe Conference on Computer Generated Forces andBehavioral Representation, May, 1993.

212

Documents

Constraints and Design Choices in Building Intelligent Pilots for Simulated Aircraft ... · 2006-01-11 · To begin our effort, we have focused on building intelligent pilot agents