View
224
Download
4
Embed Size (px)
Citation preview
Learning a Multiagent Behavior
Decision Tree Learning for Pass Evaluation
Pass Evaluation
passing requires action by two agents: The receiver‘s task is identical to that of the defender in Chapter 5 --› use the learned ball-interception skill
It‘s easier to train a pass-evaluation function than to code such a function by hand
collecting data and using it to train the agents
Decision Tree Learning
using the C4.5 training algorithm when many features are availabledetermining the relevant featureshandle missing features (i.e. player
not visible)assessing the likelyhood that a pass
will succeed
Training
constrained training scenarioomnipotent agent monitors the trialstraining examples do not include full
teams5000 training examples174 features (passer and receiver)the features from receiver‘s perspective
are communicated to the passer
1. The players are placed randomly within a region 2. The passer announces its intention to pass 3. The teammates reply with their views of the field when
ready to receive 4. The passer chooses a receiver randomly during
training, or with a DT during testing 5. The passer collects the features of the training
instance 6. The passer announces to whom it is passing 7. The reveiver and four opponents attempt to get the
ball 8. The training example is classified as a success if the
receiver manages to advance the ball towards the opponent‘s goal; a failure if one of the opponents clears the ball in the opposite direction; or a miss if the receiver and the opponents all fail to intercept the ball
The Training Procedure
The Features
The trained Decision Tree
Pruned tree with 87 nodes51% successes, 42% failures, 7% misses 26% error rate on the training set
Function Φ(passer, receiver) -› [-1,1]the DT predicts class κ with confidence γ є [0,1]
γ if κ = S (success) Φ(passer, receiver) = 0 if κ = M (miss)
-γ if κ = F (failure)
Testing
For testing the DT chooses the receiver
other steps are the same as during training
if more then one is classified to be successful it passes to the teammate with maximum Φ(passer, teammate)
the passer passes in every case
Results
Success rate without opponents is 86% Success rate when passing to the closest
teammate is 64%
Using the Learned Behaviors
Scaling up to Full Games
Extent basic learned behaviors into a full multiagent behavior (designed for testing)
The player needs to have some mechanism when it does not have the ball
Is there enough time to execute the ideal pass?
RCF Receiver Choice Function
What should I do if I have the ball?Input: the agents perception of the
current stateOutput: an action (dribble, kick or
pass) and a direction (i.e. towards the goal)
Function: the RCF identifies a set of candidate receivers. Then the RCF select a receiver or dribble or kick
Three RCFs: PRW, RAND, DT
PRW: prefer right wing RCF: Uses a fixed ordering on the candidate receivers
RAND: random RCF: It chooses randomly from among all candidate receivers
DT: decision tree RCF: if the DT does not predict that any pass will succeed the agent with the ball should dribble or kick
Player Positions
Specification of the RCF DT
1. Determines the set C of candidat receivers 2. Eliminate receivers that are closer than 10 or farther than 40 3. Eliminate receivers that are away from their home position 4. When there is an opponent within 15, then eliminate
receivers to which the passer cannot kick directly (+/- 130°) 5. IF C = Ø THEN
IF opponent < 15 THEN return KICK ELSE return DRIBBLE
6. ELSE eliminate receivers with Φ (passer, receiver)<=0 IF C = Ø THEN return kick or dribble (as in step 5) ELSE return pass to the receiver with max Φ(passer, receiver)
Reasoning about Action Execution Time
no turnball behavior5 - 15 simulator cycles to move out
of the ball’s path opponent can steal the ball--› reasoning about the available
time
The RCF in a Behavior
RCF: only when the ball is within kickable-area
1. Find the ball’s location (after 3 seconds without seeing the ball the player don’t know the ball’s location)
use NN to intercept the ballwhen not chasing the ball --›
ball-dependent flexible positioning
Complete Agent Behavior
dchase=10
Testing
Behaviors differ only in their RCF’s4-3-3 formation (makes passing useful)use only the ball-dependent player-
positioning algorithm --› every player is covered by one opponent
in reality some players are typically more open then others --› test the RCF’s against the OPR (only play right - formation)
Results
34 five-minutes games
Action-Execution Time
Assumption: there is never an opponent within dmin --› No rush DT