Computing Science (CMPUT) 496 - University of Albertammueller/courses/496-Winter-2017/... · Computing Science (CMPUT) 496 Search, Knowledge, and Simulations Martin Müller Department

Computing Science (CMPUT) 496Search, Knowledge, and Simulations

Martin Müller

Department of Computing ScienceUniversity of Alberta

[email protected]

Winter 2017

[email protected]

CMPUT 496

Part IV

Knowledge

CMPUT 496

496 Today - Mar 9

Announcements:Quiz 7 still openQuiz 8 simulations, due Mar 13Added small clarifications to Assignment 3 specification- list of atari defense moves in policy_moves commandvs generate a single move in policy of simulation player

Today’s topics:Knowledge for Heuristic Search and SimulationsState EvaluationMove Evaluation

CMPUT 496

Using Knowledge in Heuristic Search

Review - use of knowledge for search and simulationsso farHow can knowledge be used?Basic concepts - properties and interpretations ofknowledge for heuristic searchRepresenting knowledgeAcquiring knowledge - manual vs machine learning

CMPUT 496

Knowledge for Search and Simulations - Storyso Far

Discussed techniques for heuristic search andsimulationMany were “knowledge-free”

Blind search, uniform random simulationsMany others used a “black box” heuristic evaluationfunction

Goal-distance heuristics in best-first searchAdmissible heuristics in A*Depth- or time-limited alphabeta search

We did not discuss much how to build such a function.We will do that now.

CMPUT 496

Story so Far (Continued)

We also used some knowledge to improve simulationpolicies

3x3 patternsMove filtersAssignment 3 - atari capture and atari defenseProbabilistic simulation policies (Coulom paper)

Now, look deeper into knowledge for heuristic searchWhat is knowledge used for? Where does it comefrom?How is it selected? constructed? learned?

CMPUT 496

Knowledge for State and Move Evaluation

Evaluation function: mapping from state to number -how good is that state? (state evaluation, positionevaluation)Move evaluation: mapping from move (action) tonumber - how good is that move? (e.g. probabilities insimulation policy)Filter: which moves are bad and should be filtered out(pruned)The “big two” for us now are: state evaluation and

CMPUT 496

Other Kinds of Knowledge

Many other kinds of knowledge in heuristic searchExamples:

time control, search depth controlgame-specific knowedge to reduce size of state space(we discussed DAG vs tree already)Efficient state representation (we discussed)Knowledge about algorithm optimization and tuning...

CMPUT 496

Using Knowledge Part 1: State Evaluation

We know exact evaluation in terminal statesGames: Win, loss, draw, win by 23.5 points,...Best-first search: distance to goal h(s) = 0

What about heuristic evaluation in non-terminal states?In games, two kinds of evaluation are popularHeuristic evaluation: higher is betterWinning probability: higher is better, plus has aninterpretation as probability

CMPUT 496

What is State Evaluation used For?

Most important: as evaluation function in searchLeaf nodes evaluated by this functionInterior nodes evaluated by minimax ruleHeuristic evaluation of interior nodes for move ordering,for leaves of depth-limited searchesWhat does an evaluation mean?

CMPUT 496

What does Winning Probability Mean?

Different interpretationsClearest case: game with chance element, e.g. dicerollsThe winning probability is the minimax score!Example - backgammon

In some state s I need to roll two sixes to win, otherwiseI loseProbability of rolling two sixes = 1/6 × 1/6 = 1/36Value v(s) = 1/36

CMPUT 496

Winning Probability in Games with No Chance

There are no probabilities in the game itselfA perfect player would always know - winningprobability is either 0% or 100%Probability comes from either imperfect opponents, orour imperfect understanding of the gameExample - again simulation-based playerWinning probability = winrate in simulationProbabilities come from both players using randomizedpolicies in simulationMonte Carlo Tree Search also uses winningprobabilities of simulations

Main difference: they have a non-random “in-tree”phase followed by the randomized simulation

CMPUT 496

More on Winning Probabilities

Image source: Silver et al,

Mastering the game of Go with

deep neural networks and tree

search, Nature

Simulations are not the only way toget probabilitiesCan use machine learning to learnwin probabilities

Example: AlphaGo’s value network- deep neural net that maps statesto win probabilities

Can also define rules for estimatingwinning probabilities

Translate heuristic evaluation into aprobability (more later)Difficult, not used frequently

CMPUT 496

Heuristic Evaluation Function

Heuristic evaluation: higher is betterPossible: estimate of score of game

+12 = “Black is about 12 pointsahead”

Possibly no other meaning, “just anumber”

Example: material evaluationfunction in chessQueen = 9, rook = 5, bishop = 3,...Evaluation = sum of my material’svalues - sum of opponent’smaterial’s values

CMPUT 496

One Interpretation

General motto: “Similar evaluation values for similarstates”(I believe the more precise version below is due toStuart Russell, author of the “AIMA” textbook)All states with the same evaluation are “equally good”means they have the same (but not known to us)probability of winningAny state with a higher evaluation has a higherprobability of winningEvaluation function partitions set of all states S intosubsets Sv , where each state in Sv is equally good forus

CMPUT 496

Relative vs Absolute Evaluation

Unless numbers in evaluation have a meaning such asprobability or score, the numbers themselves do notmatterOnly the ordering given by the numbers matters - itdecides the preference or ranking between movesExample 1: multiply all function values by 10Example 2: add 7 to all function valuesThe search will be exactly the sameAny mapping by a monotonically increasing(order-preserving) function will give the same searchbehaviorIn utility theory this is called ordinal utility

CMPUT 496

Skill-Testing Question

Everything I said on last slide was true for minimaxsearchWhat about negamax?5 minutes for discussion with neighbor or on chatRepeated claim from last slide: Any mapping by amonotonically increasing (order-preserving) functionwill give the same search behaviorIs it true? False?Is it true under some conditions? Which?

CMPUT 496

Mixing Exact and Heuristic Evaluation

We can mix both kinds of evaluationsIf we are careful, we can get true proofs of wins andlosses this wayExample: win = 10000, highest heuristic score = 5000If alphabeta returns 10000, it is a proven winHaving a good heuristic can help speed up an exactproof

Provides good move ordering for iterative deepeningsearchA better move sorted first means more cuts in the treesearch

CMPUT 496

Using Knowledge Part 2: Move Evaluation

Given a state, and the possible moves from that statePut a numeric value on each moveMain use: action selection in search, in simulationCan also be used for move ordering in searchAgain, we can have evaluation both with and without aprobabilistic interpretation

CMPUT 496

Move Evaluation as Probability

Move i with probability pi :Interpretation 1:

pi is probability that move i is a winInterpretation 2:

pi is probability that move i is the best move

Both make senseWhich one you use depends on how you compute orestimate those numbers

CMPUT 496

Move Evaluation as “A Number”

No interpretationAs with state evaluation, bigger isbetterExample: “classical” Go programExplorer (ca. 1989 - 1995)Next big question: where doevaluations come from?In Explorer, they come from a largenumber of heuristics for differenttypes of moves

CMPUT 496

Details on Move Generation in “Classical” GoProgram Explorer

Each move has list of “move motives”, each with anumberSum of numbers = evaluation of the move“Pure guessing”, no check of the state after playing themove

CMPUT 496

Relation between State and Move Evaluation(1)

Case 1: we have only state evaluation, but need moveevaluationEasy - do a 1 ply searchEvaluation of move = evaluation of state after makingthat moveExample: Go3 and Go4 - simulation-based playersPlay move, run simulations from state s′ afterwardsMove evaluation = winrate of simulations starting fromstate s′

CMPUT 496

Relation between State and Move Evaluation(2)

Case 2: we have only move evaluation, but need stateevaluationNo easy solutionWe could try to do “greedy rollout” by following thesequence of best movesStill, in the end we have to evaluate the terminal stateto get a value

CMPUT 496

Acquiring Evaluation Knowledge

Where do evaluations come from?(now) Machine learning(old) Local goal-directed search(old) Handcoded rulesFirst, discuss how to represent knowledge in a program

CMPUT 496

Representing Knowledge for Evaluation

Many ways to represent knowledgeHandcoded rulesSimple featuresPattern databasesNeural nets

CMPUT 496

Handcoded Rules

def selfatari(board, move, color):maxoldliberty = maxliberty(board, move, color, 2)if maxoldliberty > 2:

return Falsecboard = board.copy()isLegal = cboard.move(move, color)if isLegal:

newliberty = cboard.liberty(move,color)if newliberty == 1:

return Truereturn False

Most direct wayExample: move filters and some of the rules in Go4

CMPUT 496

Simple Features in Fuego

enum FeBasicFeature{FE_PASS_NEW,FE_PASS_CONSECUTIVE,FE_CAPTURE_ADJ_ATARI,...FE_CAPTURE_MULTIPLE,FE_EXTENSION_NOT_LADDER,FE_EXTENSION_LADDER,...FE_TWO_LIB_SAVE_LADDER,FE_TWO_LIB_STILL_LADDER,...FE_SELFATARI,FE_ATARI_LADDER,...FE_DOUBLE_ATARI,FE_DOUBLE_ATARI_DEFEND,FE_LINE_1,FE_LINE_2,FE_LINE_3,...}move feature vector: (0,0,1,...,1,1,0,...,1,0,...0,0,...)

Idea: each feature is a booleanstatement about a state, or amoveEach feature is simple and easyto computeWith machine learning, we canconstruct an evaluation functionfrom a combination of manysimple featuresExamples: see Remi Coulom’spaper for list, Fuego screenshotfor examples (on next few slides)

CMPUT 496

Remi Coulom’s Simple Features (1)

Source: Remi Coulom, Computing Elo Ratings of Move Patterns in the Game of Go

CMPUT 496

Remi Coulom’s Simple Features (2)

Source: Remi Coulom, Computing Elo Ratings of Move Patterns in the Game of Go

CMPUT 496

Fuego Simple Features

Simple features in Fuego Go programSimilar to Coulom’s featuresEach legal move will have a (small) set of features

CMPUT 496

Pattern Databases

Image source: Stern et al, Bayesian

Pattern Ranking for Move Prediction in

the Game of Go

Large patterns can be learnedfrom master games, if they arefrequently usedIn Go, typically we have manydifferent sizes of pattern, from3x3 to full boardA main question is how toevaluate such patternsMeasure how often the move inthe center is played immediately,or later

CMPUT 496

Neural Nets

Image source:

https://www.slideshare.net/

ShaneSeungwhanMoon/

how-alphago-works

Represent knowledge in (largenumber of) weights of the neuralnetLower levels have localknowledge (e.g. 3x3, 5x5)Higher levels can combine localinformation for global evaluationMuch more on nets later in thecourse

https://www.slideshare.net/ShaneSeungwhanMoon/how-alphago-works



CMPUT 496

Example of Exact Knowledge: Benson’sAlgorithm

Benson’s algorithm finds stonesand territories that areunconditionally aliveNo matter what the opponentplays, they cannot capture thesestonesA generalization of the “twoeyes” conceptCan be used as an exact filter ina program - do not generatemoves in safe territory

CMPUT 496

Summary

Many kinds of knowledgeUsed for evaluating states and movesHeuristic rules, patterns, neural networksExact knowledge, e.g. safe stonesNext: details - how to represent knowledge in program

Documents

Computing Science (CMPUT) 496 - University of Albertammueller/courses/496-Winter-2017/... · Computing Science (CMPUT) 496 Search, Knowledge, and Simulations Martin Müller Department