24
Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

  • View
    224

  • Download
    4

Embed Size (px)

Citation preview

Page 1: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Learning a Multiagent Behavior

Decision Tree Learning for Pass Evaluation

Page 2: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Pass Evaluation

passing requires action by two agents: The receiver‘s task is identical to that of the defender in Chapter 5 --› use the learned ball-interception skill

It‘s easier to train a pass-evaluation function than to code such a function by hand

collecting data and using it to train the agents

Page 3: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Decision Tree Learning

using the C4.5 training algorithm when many features are availabledetermining the relevant featureshandle missing features (i.e. player

not visible)assessing the likelyhood that a pass

will succeed

Page 4: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Training

constrained training scenarioomnipotent agent monitors the trialstraining examples do not include full

teams5000 training examples174 features (passer and receiver)the features from receiver‘s perspective

are communicated to the passer

Page 5: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

1. The players are placed randomly within a region 2. The passer announces its intention to pass 3. The teammates reply with their views of the field when

ready to receive 4. The passer chooses a receiver randomly during

training, or with a DT during testing 5. The passer collects the features of the training

instance 6. The passer announces to whom it is passing 7. The reveiver and four opponents attempt to get the

ball 8. The training example is classified as a success if the

receiver manages to advance the ball towards the opponent‘s goal; a failure if one of the opponents clears the ball in the opposite direction; or a miss if the receiver and the opponents all fail to intercept the ball

The Training Procedure

Page 6: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

The Features

Page 7: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

The trained Decision Tree

Pruned tree with 87 nodes51% successes, 42% failures, 7% misses 26% error rate on the training set

Function Φ(passer, receiver) -› [-1,1]the DT predicts class κ with confidence γ є [0,1]

γ if κ = S (success) Φ(passer, receiver) = 0 if κ = M (miss)

-γ if κ = F (failure)

Page 8: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation
Page 9: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Testing

For testing the DT chooses the receiver

other steps are the same as during training

if more then one is classified to be successful it passes to the teammate with maximum Φ(passer, teammate)

the passer passes in every case

Page 10: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Results

Success rate without opponents is 86% Success rate when passing to the closest

teammate is 64%

Page 11: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Using the Learned Behaviors

Page 12: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Scaling up to Full Games

Extent basic learned behaviors into a full multiagent behavior (designed for testing)

The player needs to have some mechanism when it does not have the ball

Is there enough time to execute the ideal pass?

Page 13: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

RCF Receiver Choice Function

What should I do if I have the ball?Input: the agents perception of the

current stateOutput: an action (dribble, kick or

pass) and a direction (i.e. towards the goal)

Function: the RCF identifies a set of candidate receivers. Then the RCF select a receiver or dribble or kick

Page 14: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Three RCFs: PRW, RAND, DT

PRW: prefer right wing RCF: Uses a fixed ordering on the candidate receivers

RAND: random RCF: It chooses randomly from among all candidate receivers

DT: decision tree RCF: if the DT does not predict that any pass will succeed the agent with the ball should dribble or kick

Page 15: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Player Positions

Page 16: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Specification of the RCF DT

1. Determines the set C of candidat receivers 2. Eliminate receivers that are closer than 10 or farther than 40 3. Eliminate receivers that are away from their home position 4. When there is an opponent within 15, then eliminate

receivers to which the passer cannot kick directly (+/- 130°) 5. IF C = Ø THEN

IF opponent < 15 THEN return KICK ELSE return DRIBBLE

6. ELSE eliminate receivers with Φ (passer, receiver)<=0 IF C = Ø THEN return kick or dribble (as in step 5) ELSE return pass to the receiver with max Φ(passer, receiver)

Page 17: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Reasoning about Action Execution Time

no turnball behavior5 - 15 simulator cycles to move out

of the ball’s path opponent can steal the ball--› reasoning about the available

time

Page 18: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

The RCF in a Behavior

RCF: only when the ball is within kickable-area

1. Find the ball’s location (after 3 seconds without seeing the ball the player don’t know the ball’s location)

use NN to intercept the ballwhen not chasing the ball --›

ball-dependent flexible positioning

Page 19: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Complete Agent Behavior

dchase=10

Page 20: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Testing

Behaviors differ only in their RCF’s4-3-3 formation (makes passing useful)use only the ball-dependent player-

positioning algorithm --› every player is covered by one opponent

in reality some players are typically more open then others --› test the RCF’s against the OPR (only play right - formation)

Page 21: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Results

34 five-minutes games

Page 22: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation
Page 23: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation
Page 24: Learning a Multiagent Behavior Decision Tree Learning for Pass Evaluation

Action-Execution Time

Assumption: there is never an opponent within dmin --› No rush DT