46
Game playing

2.1 unit 2

Embed Size (px)

Citation preview

Page 1: 2.1 unit 2

Game playing

Page 2: 2.1 unit 2

Outline• Optimal decisions• α-β pruning• Imperfect, real-time decisions

Page 3: 2.1 unit 2

Games vs. search problems• "Unpredictable" opponent specifying a

move for every possible opponent reply

• Time limits unlikely to find goal, must approximate

Page 4: 2.1 unit 2

Game tree (2-player, deterministic, turns)

Page 5: 2.1 unit 2

Optimal strategy• Perfect play for deterministic games• Minimax Value for a node n

• This definition is used recursively

• Idea: minimax value is the best achievable payoff against best play

Page 6: 2.1 unit 2

Minimax example• Perfect play for deterministic games• Minimax Decision at root: choose the action a

that lead to a maximal minimax value • MAX is guaranteed for a utility which is at least

the minimax value – if he plays rationally.

Page 7: 2.1 unit 2

Minimax algorithm

Page 8: 2.1 unit 2

Properties of minimax• Complete? Yes (if tree is finite)• Optimal? Yes (against an optimal opponent)• Time complexity? O(bm)• Space complexity? O(bm) (depth-first exploration)

• For chess, b ≈ 35, m ≈100 for "reasonable" games exact solution completely infeasible

Page 9: 2.1 unit 2

Multiplayer games• Each node must hold a vector of values• For example, for three players A, B, C (vA, vB, vC) • The backed up vector at node n will always be the one

that maximizes the payoff of the player choosing at n

Page 10: 2.1 unit 2

α-β pruning example

Page 11: 2.1 unit 2

α-β pruning example

Page 12: 2.1 unit 2

α-β pruning example

Page 13: 2.1 unit 2

α-β pruning example

Page 14: 2.1 unit 2

α-β pruning example

Page 15: 2.1 unit 2

Properties of α-β• Pruning does not affect final result

• Good move ordering improves effectiveness of pruning

• With "perfect ordering," time complexity = O(bm/2) doubles depth of search

• A simple example of the value of reasoning about which computations are relevant (a form of metareasoning)

Page 16: 2.1 unit 2

The α-β algorithm

Page 17: 2.1 unit 2

The α-β algorithm

Page 18: 2.1 unit 2

Why is it called α-β? is the value of the best

(i.e., highest-value) choice found so far for MAX at any choice point along the path to the root.

• If v is worse than , MAX will avoid it prune that branch

is the value of the best (i.e., lowest-value) choice found so far for MIN at any choice point along the path for to the root.

Page 19: 2.1 unit 2

Another example

5 7 10 3 1 2 9 9 8 2 9 3

Page 20: 2.1 unit 2

How much do we gain? Assume a game tree of uniform branching factor b Minimax examines O(bh) nodes, so does alpha-beta

in the worst-case The gain for alpha-beta is maximum when:

• The MIN children of a MAX node are ordered in decreasing backed up values

• The MAX children of a MIN node are ordered in increasing backed up values

Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975]

But this requires an oracle (if we knew how to order nodes perfectly, we would not need to search the game tree)

If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4)

Page 21: 2.1 unit 2

Heuristic Ordering of Nodes

Order the nodes below the root according to the values backed-up at the previous iteration

Order MAX (resp. MIN) nodes in decreasing (increasing) values of the evaluation function computed at these nodes

Page 22: 2.1 unit 2

Games of imperfect information

• Minimax and alpha-beta pruning require too much leaf-node evaluations.

• May be impractical within a reasonable amount of time.

• SHANNON (1950):– Cut off search earlier (replace TERMINAL-

TEST by CUTOFF-TEST)– Apply heuristic evaluation function EVAL

(replacing utility function of alpha-beta)

Page 23: 2.1 unit 2

Cutting off search• Change:

– if TERMINAL-TEST(state) then return UTILITY(state)

into– if CUTOFF-TEST(state,depth) then return EVAL(state)

• Introduces a fixed-depth limit depth– Is selected so that the amount of time will not exceed what

the rules of the game allow.• When cutoff occurs, the evaluation is

performed.

Page 24: 2.1 unit 2

Heuristic EVAL• Idea: produce an estimate of the expected

utility of the game from a given position.• Performance depends on quality of EVAL.• Requirements:

– EVAL should order terminal-nodes in the same way as UTILITY.

– Computation may not take too long.– For non-terminal states the EVAL should be

strongly correlated with the actual chance of winning.

• Only useful for quiescent (no wild swings in value in near future) states

Page 25: 2.1 unit 2

Heuristic EVAL example

Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)

Page 26: 2.1 unit 2

Heuristic EVAL example

Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)

Addition assumes independence

Page 27: 2.1 unit 2

Heuristic difficultiesHeuristic counts pieces won

Page 28: 2.1 unit 2

Horizon effectFixed depth search Makes black thinkit can avoid the queening move of White pawn

Page 29: 2.1 unit 2

Games that include chance

• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

Page 30: 2.1 unit 2

Games that include chance

• Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-16) and (5-11,11-16)

• [1,1], [6,6] chance 1/36, all other chance 1/18

chance nodes

Page 31: 2.1 unit 2

Games that include chance

• [1,1], [6,6] chance 1/36, all other chance 1/18 • Can not calculate definite minimax value, only

expected value

Page 32: 2.1 unit 2

Expected minimax valueEXPECTED-MINIMAX-VALUE(n)=

UTILITY(n) if n is a terminalmaxs successors(n) MINIMAX-VALUE(s) if n is a max nodemins successors(n) MINIMAX-VALUE(s) if n is a max nodes successors(n) P(s) . EXPECTEDMINIMAX(s) if n is a chance node

These equations can be backed-up recursively all the way to the root of the game tree.

Page 33: 2.1 unit 2

Position evaluation with chance nodes

• Left, A1 wins• Right A2 wins• Outcome of evaluation function may not change

when values are scaled differently.• Behavior is preserved only by a positive linear

transformation of EVAL.

Page 34: 2.1 unit 2

State-of-the-Art

Page 35: 2.1 unit 2

Checkers: Tinsley vs. Chinook

Name: Marion TinsleyProfession:Teach mathematicsHobby: CheckersRecord: Over 42 years loses only 3 games of checkersWorld champion for over 40 years

Mr. Tinsley suffered his 4th and 5th losses against Chinook

Page 36: 2.1 unit 2

Chinook

First computer to become official world champion of Checkers!Has all endgame table for 10 pieces or less: over 39 trillion entries.

Page 37: 2.1 unit 2

Chess: Kasparov vs. Deep Blue

Kasparov

5’10” 176 lbs 34 years50 billion neurons

2 pos/secExtensiveElectrical/chemicalEnormous

HeightWeight

AgeComputers

SpeedKnowledge

Power SourceEgo

Deep Blue

6’ 5”2,400 lbs

4 years32 RISC processors

+ 256 VLSI chess engines200,000,000 pos/sec

PrimitiveElectrical

None

1997: Deep Blue wins by 3 wins, 1 loss, and 2 draws

Page 38: 2.1 unit 2

Chess: Kasparov vs. Deep Junior

August 2, 2003: Match ends in a 3/3 tie!

Deep Junior

8 CPU, 8 GB RAM, Win 2000

2,000,000 pos/secAvailable at $100

Page 39: 2.1 unit 2

Othello: Murakami vs. Logistello

Takeshi MurakamiWorld Othello Champion

1997: The Logistello software crushed Murakami by 6 games to 0

Page 40: 2.1 unit 2

Backgammon • 1995 TD-Gammon by Gerald Thesauro

won world championship on 1995• BGBlitz won 2008 computer backgammon

olympiad

Page 41: 2.1 unit 2

Go: Goemate vs. ??Name: Chen ZhixingProfession: RetiredComputer skills:

self-taught programmerAuthor of Goemate (arguably the

best Go program available today)

Gave Goemate a 9 stonehandicap and still easilybeat the program,thereby winning $15,000

Page 42: 2.1 unit 2

Go: Goemate vs. ??Name: Chen ZhixingProfession: RetiredComputer skills:

self-taught programmerAuthor of Goemate (arguably the

strongest Go programs)

Gave Goemate a 9 stonehandicap and still easilybeat the program,thereby winning $15,000

Jonathan Schaeffer

Go has too high a branching factor for existing search techniquesCurrent and future software must rely on huge databases and pattern-recognition techniques

Page 43: 2.1 unit 2

– March 2016

• Developed by  Google DeepMind in London to play the board game Go.

• Plays full 19x19 games• October 2015: the distributed version of AlphaGo

defeated the European Go champion Fan Hui - five to zero

• March 2016 AlphaGo played South Korean professional Go player Lee Sedol, ranked 9-dan, one of the best Go players – four to one.

• A significant breakthrough in AI research!!!

Page 44: 2.1 unit 2

Secrets Many game programs are based on alpha-beta +

iterative deepening + extended/singular search + transposition tables + huge databases + ...

For instance, Chinook searched all checkers configurations with 8 pieces or less and created an endgame database of 444 billion board configurations

The methods are general, but their implementation is dramatically improved by many specifically tuned-up enhancements (e.g., the evaluation functions)

Page 45: 2.1 unit 2

Perspective on Games: Con and Pro

Chess is the Drosophila of artificial intelligence. However, computer chess has developed much as genetics might have if the geneticists had concentrated their efforts starting in 1910 on breeding racing Drosophila. We would have some science, but mainly we would have very fast fruit flies.

John McCarthy

Saying Deep Blue doesn’t really think about chess is like saying an airplane

doesn't really fly because it doesn't flap

its wings.Drew McDermott

Page 46: 2.1 unit 2

Other Types of Games Multi-player games, with alliances or

not Games with randomness in successor

function (e.g., rolling a dice) Expectminimax algorithm

Games with partially observable states (e.g., card games) Search of belief state spaces

See R&N p. 175-180