Reinforcement Learning PresentationReinforcement Learning Presentation
Markov Games as a Framework for Markov Games as a Framework for Multi-agent Reinforcement LearningMulti-agent Reinforcement Learning
Mike L. LittmanMike L. Littman
Markov Games as a Framework for Markov Games as a Framework for Multi-agent Reinforcement LearningMulti-agent Reinforcement Learning
Mike L. LittmanMike L. Littman
Jinzhong Niu
March 30, 2004
Markov Games as a Framework for Multi-agent Reinforcement Learning 2
Overview
MDP is capable of describing only single-agent
environments.
New mathematical framework is needed to support
multi-agent reinforcement learning.
Markov Games
A single step in this direction is explored.
2-player zero-sum Markov Games
Markov Games as a Framework for Multi-agent Reinforcement Learning 3
Definitions
Markov Decision Process (MDP)
Markov Games as a Framework for Multi-agent Reinforcement Learning 4
Definitions (cont.)
Markov Game (MG)
Markov Games as a Framework for Multi-agent Reinforcement Learning 5
Definitions (cont.)
Two-player zero-sum Markov Game (2P-MG)
Markov Games as a Framework for Multi-agent Reinforcement Learning 6
2P-MG Is Capable?
Precludes cooperation!
Generalizes
MDPs (when |O|=1)
The opponent has a constant behavior, which may be
viewed as part of the environment.
Matrix Games (when |S|=1)
The environment doesn’t hold any information and rewards
are totally decided by the actions.
Yes
Markov Games as a Framework for Multi-agent Reinforcement Learning 7
Matrix Games
Example – “rock, paper, scissors”
Markov Games as a Framework for Multi-agent Reinforcement Learning 8
What does ‘optimality’ exactly mean?
MDPA stationary, deterministic, and undominated optimal policy always exists.
MGThe performance of a policy depends on the opponent’s policy, so we cannot evaluate them without context.
New definition of ‘optimality’ in game theory Performs best at its worst case compared with others
At least one optimal policy exists, which may or may not be deterministic because the agent is uncertain of its opponent’s move.
Markov Games as a Framework for Multi-agent Reinforcement Learning 9
Finding Optimal Policy - Matrix Games
The optimal agent’s minimum expected reward should be as large as possible.
Use V to express the minimum value, then consider how to maximize it
Markov Games as a Framework for Multi-agent Reinforcement Learning 10
Finding Optimal Policy - MDP
Value of a state
Quality of a state-action pair
Markov Games as a Framework for Multi-agent Reinforcement Learning 11
Finding Optimal Policy – 2P-MG
Value of a state
Quality of a s-a-o triple
V(s)
Q(s,a3,o3)Q(s,a2,o2)Q(s,a1,o1)
o1
o2
o3
V(s,o2)
min
(s,a1) (s,a2)(s,a3)
Markov Games as a Framework for Multi-agent Reinforcement Learning 12
Learning Optimal Polices
Q-learning
minimax-Q learning
Markov Games as a Framework for Multi-agent Reinforcement Learning 15
Experiment - Training
4 agents trained through 106 stepsminimax-Q learning
vs. random opponent - MR
vs. itself - MM
Q-learningvs. random opponent - QR
vs. itself - QQ
Markov Games as a Framework for Multi-agent Reinforcement Learning 16
Experiment - Testing
Test 3QR, QQ – 100% loser?
Test 1QR > MR?
Test 2QR<<QQ?
Markov Games as a Framework for Multi-agent Reinforcement Learning 17
Contributions
A solution to 2-player Markov games with a modified Q-learning method in which minimax is in place of max
Minimax can also be used in single-agent environments to avoid risky behavior.
Markov Games as a Framework for Multi-agent Reinforcement Learning 18
Future work
Possible performance improvement of the minimax-Q learning method
Linear programming caused large computational complexity.
Iterative methods may be used to get approximate solutions to minimax much faster, which is sufficiently satisfactory.