Upload
nguyenlien
View
220
Download
3
Embed Size (px)
Citation preview
The Ever-Changing Scope of AI
AI is what humans are currently better at ... Jim Hendler
HumanCognitiveChallenges
AIComputerScience
Keith L. Downing Adversarial Search
AI and Chess
1959-1962 (MIT) - Most of the machine’s moves are neither brilliant norstupid. It must be admitted that it occasionally blunders - JohnMcCarthy
1967 (Stanford + Moscow) - Russia -vs- USA; McCarthy + Kotok’sgroup -vs- Adelson-Velskiy’s group. USSR won.
1967 (MIT) - Greenblatt’s MAC HACK VI was first computer to play in ahuman tournament. Won 2; Tied 2 to achieve an amateur rating.
1967 (MIT) - MAC HACK VI beat Hubert Dreyfus, a renowned critic ofAI.
1970 (Northwestern University) - CHESS 3.0 won first ACM computerchess tournament. Evaluated 100 states per second.
1974 (Stockholm) - Kaissa (Russian) became first world computerchess champion.
Chess has been called the Drosophila of AI.
The Quest for Artificial Intelligence, Nilsson (2010).
Keith L. Downing Adversarial Search
Chess: Humans -vs- Computers
A human uses prodigious amounts of knowledge in thepattern-recognition process and a small amount of calculation to verifythe fact that the proposed solution is good in the presentinstance....However, the computer would make the same maneuverbecause it found at the end of a very large search that it was the mostadvantageous way to proceed out of the hundreds of thousands ofpossibilities it looked at. CHESS 4.6 has to date made several wellknown maneuvers without having the slightest knowledge of themaneuver, the conditions for its applications, and so on; but onlyknowing that the end result of the maneuver was good.... HansBerliner (Nature, 1978) The Quest for Artificial Intelligence, Nilsson(2010).
Computers don’t really understand chess the ways humans do, but theysearch thoroughly and quickly. Does this matter?
McCarthy suggests tournaments where computers have limitedcomputation time, like a millisecond.
Keith L. Downing Adversarial Search
Deep Blue
Lost to Kasparov (reigning world champion) in 1996. IBM then improvedit to handle some of its weaknesses. Beat Kasparov in 1997.
I am a human being. When I see something that is well beyond myunderstanding, I’m afraid...Kasparov.
State evaluations per second: Deep Blue - 200 million; Kasparov - 3
Deep chess knowledge: Deep Blue - very little; Kasparov - volumes.
But Deep Blue had memorized thousands of book moves andgrandmaster games.
8000 features are analyzed by Deep Blue. Fast evaluation only checksfor the simplest of them (like piece count), while a slow evaluationchecks for complex patterns (like king safety, center control, etc.).
Deep Blue uses no machine learning.
Deep Blue was dismantled after 1997 match.
Claimed to use no AI, just brute force search (with heuristics): minimaxwith alpha-beta pruning, which McCarthy deems essential.
Deep Fritz (2006) - German AI supposedly better than Deep Blue.
The Quest for Artificial Intelligence, Nilsson (2010).Keith L. Downing Adversarial Search
AI and Checkers
1967 (IBM) Samuel’s Checker’s player learned own evaluation functionby playing against itself.
1990 (University of Alberta) Chinook developed by JonathanSchaeffer’s group.
Blondie 24 (Fogel, 2001) - Evolved a Checker’s player with very highranking on zone.com. Beats Chinook once.
2004 - Chinook beats best human and becomes world champion.
2007 - (University of Alberta) - Schaeffer’s group proves all Checkersgames can be a tie if played perfectly by each side.
18 years of computer runtime.5x1020 positions were evaluated. Chess has ≈ 1040.”Its been 18 years!....obsessive-compulsive behavior...notnormal ...Get a life, Jonathan.” ... his wife.
The Quest for Artificial Intelligence, Nilsson (2010).
Keith L. Downing Adversarial Search
Early AI Poker (D. Waterman, 1968)
5-card draw using an expert system.Uses deep knowledge of the game and human strategy.
Dynamic State Vector
(VDHAND, POT, LASTBET, BLUFFO,
POTBET, ORP, OSTYLE)
GameState
InterpretationRule
ActionRule
Action: Fold, Call, BetUpdated DSV
DSV
Keith L. Downing Adversarial Search
More Recent AI Poker (Billings et. al. 2002)
Simulates millions of possible outcomes offline to build strategydatabases.
Simulates thousands of outcomes online to guide play.
No deep knowledge of the game; just brute force calculation.
?
? ?
?
?
?
?Flop Turn River
Player 1Player 3
Player 4
Roll-OutPerspectivefor Player 2
Player 2
Keith L. Downing Adversarial Search
Actor-Critic AI Systems
Critic = a system that evaluates search states. Very similarto both heuristics and objective functions.Many AI applications involve standard AI search algorithms(A*, Minimax, etc.) plus problem-specific critics. Buildingthe critic is the hard part.Actor = a system for mapping search states to actions.Actor-Critic systems use AI to both compute actions andevaluate states. In some versions, the best action is simplythe one that leads to a state with the highest evaluation.Reinforcement Learning (RL) formalizes all of this - chapter21 of our AI textbook.
Keith L. Downing Adversarial Search
AI Backgammon: G. Tesauro (1995)
ANN-based critic that improved via self-play.Discovered new strategies that humans now use.
V(St)St = state of
the board
St
r1 + V(S1t+1) r2 +V(S2
t+1) rk +V(Sk+1)
S1t+1
a1 a2
S2t+1 Sk
t+1
ak
Max
r* + V(S*t+1)
δt= r* + V(S*t+1) - V(St)
δt
Do move a*Learn
Keith L. Downing Adversarial Search
Classes of Games
Perfect Info - state of playing arena and other players holdings known atall times.Imperfect Info - some info about arena or players not available.Deterministic - outcome of any action is certain.Stochastic - some actions have probabilistic outcomes, e.g. dealt cards,rolled dice, etc.
Deterministic Stochastic
Perfect
Information
Imperfect
Information Bridge
Chess
Backgammon
Monopoly
OthelloCheckers
Hearts
PokerRisk
Battleship
Go Ludo
Scrabble
ScotlandYard
2048
Keith L. Downing Adversarial Search
Adversarial Search Trees
s0
s1 s2 s3 s4
a1 a2 a3My Actions
Similar to an ORnode in AND/OR Search. I
can make ANY of these choices.
Similar to ANDnodes in AND/OR . Imust account for ALL
of the opponent'spossible moves.
a6 a1a9 a4
OpponentActions
My ActionsEvaluate bottom states
and propagate values upward.
Keith L. Downing Adversarial Search
Best-First Search - Review
Generate one large search tree.Use heuristics to evaluate the promise of all states in thetree.When a goal state is found, return the entire path from thestart state as the solution.
Keith L. Downing Adversarial Search
Basic Process of Adversarial Search
1 From the start state, generate a search tree in adepth-limited, depth-first manner, alternating between ownand opponent’s possible moves.
2 Only use heuristics to evaluate the promise of thebottom-level nodes; then propagate values upwards,combining them using MAX and MIN operators.
3 Return to the root and choose the action, Abest , leading tothe highest-rated child state, Sbest .
4 Apply Abest to the current game state, producing Sbest .5 Wait for the opponent to choose an action, which then
produces the new game state, Snew .6 Generate a whole new search tree, with root state = Snew .7 GO TO step 2.
Keith L. Downing Adversarial Search
Pure Minimax: Lookahead
3 2 5 6 8 5 7 8 6 9 8 4 9 78
My Moves
OpponentMoves
My Moves
OpponentMoves
Apply Heuristic FunctionAll scores are from the perspective of the top (max) player.
Keith L. Downing Adversarial Search
Pure Minimax: Backup
5
5
4
4
85
42
3 2 5
5
6 8 5
7
7
7 8
6
6 9 8 4
7
9 7
8
8
Maximize
Minimize
Maximize
Minimize
Best Move
These subtrees did not need to be generated.
Keith L. Downing Adversarial Search
Saving Evaluations in Min-Max Trees
Max
Min Min
7
10 7 5 Who cares??
< 5
Keith L. Downing Adversarial Search
Alpha-Beta Pruning
MAX
MIN
MAX
a = 7b = ?
a = ?b =7
7 3 -4
78
12a = 7b =?
7 a = 7
MIN
MAX
5-1
2
5
5 < a => Stop generating children.This node's value cannot
win at the root. Prune it!
There could be manylarge subtrees down here, butthere is no need to generate
them, since theycannot affect the choice made
at the root.
X X
Keith L. Downing Adversarial Search
Alpha and Beta
Alpha
A modifiable property of a MAX node.
Indicates the lower bound on the evaluation of that MAX node.
Updated whenever a LARGER evaluation is returned from a child MINnode.
Used for pruning at MIN nodes.
Beta
A modifiable property of a MIN node.
Indicates the upper bound on the evaluation of that MIN node.
Updated whenever a SMALLER evaluation is returned from a child MAXnode.
Used for pruning at MAX nodes.
* Both alpha and beta are passed between MIN and MAXnodes.
Keith L. Downing Adversarial Search
A Nanosecond in the Life of a MAX node
a, b
V
MIN
Properties: alpha (a), beta (b), Evaluation(E)
If V > E: E ←V
If V > a: a ← V
If E ≥ b: Prune this node!
MIN
a, b
After each V is received, do:
Generate next child
Keith L. Downing Adversarial Search
A Nanosecond in the Life of a MIN node
a, b
V
MAX
Properties: alpha (a), beta (b), Evaluation(E)
If V < E: E ←V
If V < b: b ← V
If E ≤ a: Prune this node!
MAX
a, b
After each V is received, do:
Generate next child
Keith L. Downing Adversarial Search
The Game of Nim
-1
-2
+2 +1 +1
MAX
MIN
MAX
MINActions: Choose 1 or 2 stones
Goal: Choose the LAST stone(s)Points = number of stones chosen on very last move
Keith L. Downing Adversarial Search
Minimax Applied to Nim
-1
-2
+2 +1 +1
max(-1,2) = 2
2 1
min(2,1) = 1
1
min(1,-2) = -2
1
-2
max(1,-2) = 1
Best ActionChoice
Starting player can always win byinsuring that the opponent alwayshas an odd number of stones to
choose from.Keith L. Downing Adversarial Search
Minimax with Alpha-Beta Pruning
-1
a = ?b = ?
a = ?b = ?
a = -1b = ?
a = ?b = -1 -1
Minimax with alpha-beta pruningis depth-first.
Nodes containing final states have concrete scores.
Nodes at max-depth that are not final states are
scored with an evaluation function.
Keith L. Downing Adversarial Search
Minimax with Alpha-Beta Pruning
-1
a = ?b = ?
a = ?b = 2
a = 2b = ?
a = ?b = -1 +2-1
+2
max(-1,2) = 2a = -1
Update a after 2nd child returns +2
Keith L. Downing Adversarial Search
Minimax with Alpha-Beta Pruning
-1
a = 1b = ?
a = ?b = 1
a = 2b = ?
a = ?b = -1 +2-1
+2
+1
+1a = 1b = 2
b = 2
Update b after 2nd child returns +1
Keith L. Downing Adversarial Search
Minimax with Alpha-Beta Pruning
-1
a = 1b = ?
a = ?b = 1
a = 2b = ?
a = ?b = -1 +2-1
+2
+1
+1
+1
a = 1b = ?
a = 1b = 1
a = 1
a = 1
a = 1b = ? +1
X
At node X, we know that its final evaluation E ≤ 1 = a. So there is no point generating more children for X.
It will not be chosen anyway.
Keith L. Downing Adversarial Search
Alpha-Beta Pruning CAN Save Time
MAX
MIN MIN
MAXMAX MAX
? ?
MAX
? ? ? ?? ?
Arrange the integer evaluations 1, 2 .... 8 in leaf nodes so that:The maximum number of nodes can be pruned.ALL nodes need to be generated; no pruning possible.
Keith L. Downing Adversarial Search
Minimax with Stochasticity
Making intelligent decisions in UNCERTAIN situations.Theoretically, standard minimax has no uncertainty, since we assumethat we will ALWAYS choose our BEST option, while the opponent willALWAYS choose our WORST option.
Max
Min
7 5
0.2
0.3
0.5 0.60.3
0.1
Min Min Min Min Min
58 2 6 1257 94
25 8 2 5 1
5.6 3.5
5.6Best Move
Heuristic Evaluations
Chance Node
Keith L. Downing Adversarial Search
Expectimax
Minimax with stochastic nodes.MIN and MAX nodes operate in the same way.CHANCE nodes compute the expected value of the consequences oftheir stochastic options.
Stochastic Action:Rolling 2 dice
Roll = 2Roll = 7 Roll = 12
p = 1/6p = 1/36 p = 1/36
Max
6
3 56
Max
27
Max
85
78
E = 8(1/36) + ... + 6(1/6) + .... + 7(1/36)
E
E = Expected Value
Keith L. Downing Adversarial Search
The Game of 2048
32 16 8
16 4 4
2 2
32 16 8
16 8
4
2
After each move, the "opponent" addsa 2 or 4 to a random open cell.
Prob (2) = 0.9; prob(4) = 0.1Game ends when allcells are filled. Goal: produce 2048 in a
cell before the game ends.
When 2 equal cells collide,their summed value goes in
the resulting cell.
Keith L. Downing Adversarial Search
The Game of 2048
32 16 8
16 4 4 4
2 2 2 2
2 32 16 8
16 4 8
4 4
When multiple (2) pairs of tiles ofequal value collide (yellow arrows), the two
product tiles become neighbors.
Newly-added2-tile
Keith L. Downing Adversarial Search
Playing 2048 with Minimax Search
L
R U
D
MAX4 possible
slide moves
MIN?? All possible spawns (2 & 4)
L
R U
DMAX
Keith L. Downing Adversarial Search
Playing 2048 with Minimax Search
2 32 16 8
16 4 8
4 4
2 32 16 8
16 4 8
8
2 32 16 8
16 4 8
8
2
32 16
16 48
16
2 32 16
16 4
16
8
Left Right Up Down
2 32 16 8
16 4 8
8
2
2 32 16 8
16 4 8
8
4
2 32 16 8
16 4 8
82
2 32 16 8
16 4 8
8
2
Consider all possiblespawns
LeftRight Up DownMAX
MIN
MAX
Keith L. Downing Adversarial Search
Expectimax for 2048
2 32 16 8
16 4 8
8
2
2 32 16 8
16 4 8
8
4
2 32 16 8
16 4 8
82
2 32 16 8
16 4 8
8
2
MAX
p(2 spawn) = 0.9p(4 spawn) = 0.12 32 16 8
16 4 8
8 p(choosing a cell) = 1/8 = 0.125
0.125 x 0.9 0.125 x 0.1
7 3 5 6
E =7 x 0.1125 +3 x 0.1125 +5 x 0.1125 +6 x 0.0125 +
...
...
= 0.1125 = 0.0125
E
0.1125 0.1125
Keith L. Downing Adversarial Search
Minimax -vs- Expectimax for 2048
The opponent in 2048 really is the stochastic spawning process. It isnot minimizing at all.However, by assuming a perfectly evil opponent, Minimax can playintelligently, albeit very conservatively (i.e., maximally safe).Still, the most appropriate algorithm is Expectimax with alternating MAXand CHANCE nodes.Alpha-Beta pruning works poorly for Expectimax (even when it hasMAX, MIN and CHANCE nodes). A node’s current value cannot bedirectly compared to an alpha or beta (for pruning), since that value willbe scaled by probabilities (and combined with other scaled values) atancestral chance nodes.Considering all spawns creates a large branching factor (with Minimaxand Expectimax), and the cost is higher with Expectimax, since it can’texploit alpha-beta pruning (very well).Approximate Minimax or Expectimax: Consider filling every open cell,but only with a 2 or 4 (not both), and use 2’s and 4’s according to theirspawning probabilities. This gives a maximum branching factor of C(number of open cells) instead of 2C. Then minimize (or compute theexpected value) over all child nodes.
Keith L. Downing Adversarial Search