View
216
Download
0
Tags:
Embed Size (px)
Citation preview
2
Vision StatementVision Statement
Helping the world understand … and make informed decisions.
** Potential beneficiaries:• commercial games companies, and • their customers.
games and the people who play themgames and the people who play them
**
3
MotivationMotivation
Multi-billion dollar industry, with considerable Canadian activity
U. of A. has one of the best AI & Games research groups in the world
Games are good testbeds for A.I. researchMachine learning has a key role to play:
Opponent/user modelling Massive datasets (e.g. play logs)
Challenging problems for machine learning Opponent modelling: very short time frame, weak data Massive datasets: large number of low-level features Active learning opportunities Human element in the overall system
4
Projects and StatusProjects and Status
1. Gameplay Analysis (ongoing)
2. Poker (ongoing, poster)3. Counter-strike Log Analysis
(new, poster) 4. Go (ongoing, poster)5. General Game Playing (new)6. Threat Modelling (complete,
poster)
5
AICML personnel (cumulative)AICML personnel (cumulative)AICML PI’s: M. Bowling, R. Holte, J.
Schaeffer8 Software developers 3 Postdoctoral Fellows14 Grad students
7
ResourcesResources
Grants$490K over 3 years, NSERC strategic grant$10k/year BioWare giftPortion of Jonathan Schaeffer’s iCORE chair
In-kindNeverwinter Nights source code (BioWare)FIFA’2004 source code (EA) with our
gameplay analysis hooks installed at their expense
BioTools support of competitions we organize
8
Highlights Highlights IJCAI’03 best paper awardWinner of AAAI’06 poker-bot competitions,
competitive with top human playersWorld’s first man-versus-machine poker matchCurrently world’s best 9x9 Go program,
competitive with very good humans (Scientific American article)
Electronic Arts interest in gameplay analysis GDC paperHQP to EA, BioWare, BioTools, Invidi, Google,
Yahoo!
10
The ChallengesThe Challenges
Large game tree (1018)Stochastic elementVariable number of players (2–10)Imperfect information (during play, and
after)Aim is to maximize winnings not just win
The last two make it essential to discover and exploit the opponent’s weaknesses
11
Many Approaches over 12 yearsMany Approaches over 12 years
Rule-based (“expert system”) – LokiSearch-based – PokiGame-theoretic – PsOpti and othersOpponent modelling
VexbotPDF cuttingParameter Estimation (Bayesian)Strategy Value estimation (“experts”)
12
PsOpti (Sparbot)PsOpti (Sparbot)
Nash Equilibrium of an abstract poker game
Bluffing, slow play, etc. fall out from the mathematics.
Best paper award at IJCAI’03Won the AAAI’06 poker-bot
competitionsHas held its own against 2 world-class
humans
13
PsOpti2 vs. “theCount”PsOpti2 vs. “theCount”
DIVAT: an unbiased, low variance estimator of winnings
14
Weaknesses of the PsOpti’sWeaknesses of the PsOpti’s
The equilibrium strategy for the highly abstract game is far from perfect.
No opponent modelling.Nash equilibrium not the best strategy:
Non-adaptiveDefensive
Even the best humans have weaknesses that should be exploited
15
Why is Opponent Modelling Hard ?Why is Opponent Modelling Hard ?
Short time to learn and exploit model (< 200 hands). Want to simultaneously:Collect information about the opponentUse the information to get higher payoffNot “pay” too much for the informationNot be exploitable ourselves
Imperfect information, even after hand finishes
High variancechance in the game (the shuffled deck)stochastic opponent strategies
Properties of the opponent… (next slide)
16
Difficult OpponentsDifficult Opponents
We assume a “smart” opponent – it has exploitable weaknesses but does not make outright errorsplays a non-equilibrium strategydoes not play a dominated strategy
Opponent’s strategy is non-stationarychanges during the gamemay be modelling me to exploit my
weaknesses
17
ConclusionsConclusions
In Kuhn poker against exploitable, stationary opponents …
Convergence to best-response is slow.
Opponent modelling is superior to a static Nash equilibrium strategy.often produces positive expected valuerobust to game length (50-400) and opponent type
Bad initial estimates of P2’s parameters overcome in 25-50 hands.
“Aggressive” exploration strategies slightly superior to “safe” exploration strategies.
18
Poker – Future WorkPoker – Future Work
Improved Algorithms for Information-Gathering and Modelling
Scaling upNon-stationary OpponentsOther poker variants: no-limit, multi-
player
20
Software Behaviour AnalysisSoftware Behaviour Analysis
How to test if game software behaves as intended by the designer ?
22
Visualization of BehaviourVisualization of Behaviour
Corner kicks to the coloured areas score. This was discovered by our SAGA-ML system.