Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Learning a Multiagent Behavior

Decision Tree Learning for Pass Evaluation

Pass Evaluation

passing requires action by two agents: The receiver‘s task is identical to that of the defender in Chapter 5 --› use the learned ball-interception skill

It‘s easier to train a pass-evaluation function than to code such a function by hand

collecting data and using it to train the agents

Decision Tree Learning

using the C4.5 training algorithm when many features are availabledetermining the relevant featureshandle missing features (i.e. player

not visible)assessing the likelyhood that a pass

will succeed

Training

constrained training scenarioomnipotent agent monitors the trialstraining examples do not include full

teams5000 training examples174 features (passer and receiver)the features from receiver‘s perspective

are communicated to the passer

1. The players are placed randomly within a region 2. The passer announces its intention to pass 3. The teammates reply with their views of the field when

ready to receive 4. The passer chooses a receiver randomly during

training, or with a DT during testing 5. The passer collects the features of the training

instance 6. The passer announces to whom it is passing 7. The reveiver and four opponents attempt to get the

ball 8. The training example is classified as a success if the

receiver manages to advance the ball towards the opponent‘s goal; a failure if one of the opponents clears the ball in the opposite direction; or a miss if the receiver and the opponents all fail to intercept the ball

The Training Procedure

The Features

The trained Decision Tree

Pruned tree with 87 nodes51% successes, 42% failures, 7% misses 26% error rate on the training set

Function Φ(passer, receiver) -› [-1,1]the DT predicts class κ with confidence γ є [0,1]

γ if κ = S (success) Φ(passer, receiver) = 0 if κ = M (miss)

-γ if κ = F (failure)

Testing

For testing the DT chooses the receiver

other steps are the same as during training

if more then one is classified to be successful it passes to the teammate with maximum Φ(passer, teammate)

the passer passes in every case

Results

Success rate without opponents is 86% Success rate when passing to the closest

teammate is 64%

Using the Learned Behaviors

Scaling up to Full Games

Extent basic learned behaviors into a full multiagent behavior (designed for testing)

The player needs to have some mechanism when it does not have the ball

Is there enough time to execute the ideal pass?

RCF Receiver Choice Function

What should I do if I have the ball?Input: the agents perception of the

current stateOutput: an action (dribble, kick or

pass) and a direction (i.e. towards the goal)

Function: the RCF identifies a set of candidate receivers. Then the RCF select a receiver or dribble or kick

Three RCFs: PRW, RAND, DT

PRW: prefer right wing RCF: Uses a fixed ordering on the candidate receivers

RAND: random RCF: It chooses randomly from among all candidate receivers

DT: decision tree RCF: if the DT does not predict that any pass will succeed the agent with the ball should dribble or kick

Player Positions

Specification of the RCF DT

1. Determines the set C of candidat receivers 2. Eliminate receivers that are closer than 10 or farther than 40 3. Eliminate receivers that are away from their home position 4. When there is an opponent within 15, then eliminate

receivers to which the passer cannot kick directly (+/- 130°) 5. IF C = Ø THEN

IF opponent < 15 THEN return KICK ELSE return DRIBBLE

6. ELSE eliminate receivers with Φ (passer, receiver)<=0 IF C = Ø THEN return kick or dribble (as in step 5) ELSE return pass to the receiver with max Φ(passer, receiver)

Reasoning about Action Execution Time

no turnball behavior5 - 15 simulator cycles to move out

of the ball’s path opponent can steal the ball--› reasoning about the available

time

The RCF in a Behavior

RCF: only when the ball is within kickable-area

1. Find the ball’s location (after 3 seconds without seeing the ball the player don’t know the ball’s location)

use NN to intercept the ballwhen not chasing the ball --›

ball-dependent flexible positioning

Complete Agent Behavior

dchase=10

Testing

Behaviors differ only in their RCF’s4-3-3 formation (makes passing useful)use only the ball-dependent player-

positioning algorithm --› every player is covered by one opponent

in reality some players are typically more open then others --› test the RCF’s against the OPR (only play right - formation)

Results

34 five-minutes games

Action-Execution Time

Assumption: there is never an opponent within dmin --› No rush DT

Documents

Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation