Upload
buidien
View
233
Download
2
Embed Size (px)
Citation preview
Computing Science (CMPUT) 496Search, Knowledge, and Simulations
Martin Müller
Department of Computing ScienceUniversity of Alberta
Winter 2017
CMPUT 496
Part IV
Knowledge
CMPUT 496
496 Today - Mar 9
Announcements:Quiz 7 still openQuiz 8 simulations, due Mar 13Added small clarifications to Assignment 3 specification- list of atari defense moves in policy_moves commandvs generate a single move in policy of simulation player
Today’s topics:Knowledge for Heuristic Search and SimulationsState EvaluationMove Evaluation
CMPUT 496
Using Knowledge in Heuristic Search
Review - use of knowledge for search and simulationsso farHow can knowledge be used?Basic concepts - properties and interpretations ofknowledge for heuristic searchRepresenting knowledgeAcquiring knowledge - manual vs machine learning
CMPUT 496
Knowledge for Search and Simulations - Storyso Far
Discussed techniques for heuristic search andsimulationMany were “knowledge-free”
Blind search, uniform random simulationsMany others used a “black box” heuristic evaluationfunction
Goal-distance heuristics in best-first searchAdmissible heuristics in A*Depth- or time-limited alphabeta search
We did not discuss much how to build such a function.We will do that now.
CMPUT 496
Story so Far (Continued)
We also used some knowledge to improve simulationpolicies
3x3 patternsMove filtersAssignment 3 - atari capture and atari defenseProbabilistic simulation policies (Coulom paper)
Now, look deeper into knowledge for heuristic searchWhat is knowledge used for? Where does it comefrom?How is it selected? constructed? learned?
CMPUT 496
Knowledge for State and Move Evaluation
Evaluation function: mapping from state to number -how good is that state? (state evaluation, positionevaluation)Move evaluation: mapping from move (action) tonumber - how good is that move? (e.g. probabilities insimulation policy)Filter: which moves are bad and should be filtered out(pruned)The “big two” for us now are: state evaluation and
CMPUT 496
Other Kinds of Knowledge
Many other kinds of knowledge in heuristic searchExamples:
time control, search depth controlgame-specific knowedge to reduce size of state space(we discussed DAG vs tree already)Efficient state representation (we discussed)Knowledge about algorithm optimization and tuning...
CMPUT 496
Using Knowledge Part 1: State Evaluation
We know exact evaluation in terminal statesGames: Win, loss, draw, win by 23.5 points,...Best-first search: distance to goal h(s) = 0
What about heuristic evaluation in non-terminal states?In games, two kinds of evaluation are popularHeuristic evaluation: higher is betterWinning probability: higher is better, plus has aninterpretation as probability
CMPUT 496
What is State Evaluation used For?
Most important: as evaluation function in searchLeaf nodes evaluated by this functionInterior nodes evaluated by minimax ruleHeuristic evaluation of interior nodes for move ordering,for leaves of depth-limited searchesWhat does an evaluation mean?
CMPUT 496
What does Winning Probability Mean?
Different interpretationsClearest case: game with chance element, e.g. dicerollsThe winning probability is the minimax score!Example - backgammon
In some state s I need to roll two sixes to win, otherwiseI loseProbability of rolling two sixes = 1/6 × 1/6 = 1/36Value v(s) = 1/36
CMPUT 496
Winning Probability in Games with No Chance
There are no probabilities in the game itselfA perfect player would always know - winningprobability is either 0% or 100%Probability comes from either imperfect opponents, orour imperfect understanding of the gameExample - again simulation-based playerWinning probability = winrate in simulationProbabilities come from both players using randomizedpolicies in simulationMonte Carlo Tree Search also uses winningprobabilities of simulations
Main difference: they have a non-random “in-tree”phase followed by the randomized simulation
CMPUT 496
More on Winning Probabilities
Image source: Silver et al,
Mastering the game of Go with
deep neural networks and tree
search, Nature
Simulations are not the only way toget probabilitiesCan use machine learning to learnwin probabilities
Example: AlphaGo’s value network- deep neural net that maps statesto win probabilities
Can also define rules for estimatingwinning probabilities
Translate heuristic evaluation into aprobability (more later)Difficult, not used frequently
CMPUT 496
Heuristic Evaluation Function
Heuristic evaluation: higher is betterPossible: estimate of score of game
+12 = “Black is about 12 pointsahead”
Possibly no other meaning, “just anumber”
Example: material evaluationfunction in chessQueen = 9, rook = 5, bishop = 3,...Evaluation = sum of my material’svalues - sum of opponent’smaterial’s values
CMPUT 496
One Interpretation
General motto: “Similar evaluation values for similarstates”(I believe the more precise version below is due toStuart Russell, author of the “AIMA” textbook)All states with the same evaluation are “equally good”means they have the same (but not known to us)probability of winningAny state with a higher evaluation has a higherprobability of winningEvaluation function partitions set of all states S intosubsets Sv , where each state in Sv is equally good forus
CMPUT 496
Relative vs Absolute Evaluation
Unless numbers in evaluation have a meaning such asprobability or score, the numbers themselves do notmatterOnly the ordering given by the numbers matters - itdecides the preference or ranking between movesExample 1: multiply all function values by 10Example 2: add 7 to all function valuesThe search will be exactly the sameAny mapping by a monotonically increasing(order-preserving) function will give the same searchbehaviorIn utility theory this is called ordinal utility
CMPUT 496
Skill-Testing Question
Everything I said on last slide was true for minimaxsearchWhat about negamax?5 minutes for discussion with neighbor or on chatRepeated claim from last slide: Any mapping by amonotonically increasing (order-preserving) functionwill give the same search behaviorIs it true? False?Is it true under some conditions? Which?
CMPUT 496
Mixing Exact and Heuristic Evaluation
We can mix both kinds of evaluationsIf we are careful, we can get true proofs of wins andlosses this wayExample: win = 10000, highest heuristic score = 5000If alphabeta returns 10000, it is a proven winHaving a good heuristic can help speed up an exactproof
Provides good move ordering for iterative deepeningsearchA better move sorted first means more cuts in the treesearch
CMPUT 496
Using Knowledge Part 2: Move Evaluation
Given a state, and the possible moves from that statePut a numeric value on each moveMain use: action selection in search, in simulationCan also be used for move ordering in searchAgain, we can have evaluation both with and without aprobabilistic interpretation
CMPUT 496
Move Evaluation as Probability
Move i with probability pi :Interpretation 1:
pi is probability that move i is a winInterpretation 2:
pi is probability that move i is the best move
Both make senseWhich one you use depends on how you compute orestimate those numbers
CMPUT 496
Move Evaluation as “A Number”
No interpretationAs with state evaluation, bigger isbetterExample: “classical” Go programExplorer (ca. 1989 - 1995)Next big question: where doevaluations come from?In Explorer, they come from a largenumber of heuristics for differenttypes of moves
CMPUT 496
Details on Move Generation in “Classical” GoProgram Explorer
Each move has list of “move motives”, each with anumberSum of numbers = evaluation of the move“Pure guessing”, no check of the state after playing themove
CMPUT 496
Relation between State and Move Evaluation(1)
Case 1: we have only state evaluation, but need moveevaluationEasy - do a 1 ply searchEvaluation of move = evaluation of state after makingthat moveExample: Go3 and Go4 - simulation-based playersPlay move, run simulations from state s′ afterwardsMove evaluation = winrate of simulations starting fromstate s′
CMPUT 496
Relation between State and Move Evaluation(2)
Case 2: we have only move evaluation, but need stateevaluationNo easy solutionWe could try to do “greedy rollout” by following thesequence of best movesStill, in the end we have to evaluate the terminal stateto get a value
CMPUT 496
Acquiring Evaluation Knowledge
Where do evaluations come from?(now) Machine learning(old) Local goal-directed search(old) Handcoded rulesFirst, discuss how to represent knowledge in a program
CMPUT 496
Representing Knowledge for Evaluation
Many ways to represent knowledgeHandcoded rulesSimple featuresPattern databasesNeural nets
CMPUT 496
Handcoded Rules
def selfatari(board, move, color):maxoldliberty = maxliberty(board, move, color, 2)if maxoldliberty > 2:
return Falsecboard = board.copy()isLegal = cboard.move(move, color)if isLegal:
newliberty = cboard.liberty(move,color)if newliberty == 1:
return Truereturn False
Most direct wayExample: move filters and some of the rules in Go4
CMPUT 496
Simple Features in Fuego
enum FeBasicFeature{FE_PASS_NEW,FE_PASS_CONSECUTIVE,FE_CAPTURE_ADJ_ATARI,...FE_CAPTURE_MULTIPLE,FE_EXTENSION_NOT_LADDER,FE_EXTENSION_LADDER,...FE_TWO_LIB_SAVE_LADDER,FE_TWO_LIB_STILL_LADDER,...FE_SELFATARI,FE_ATARI_LADDER,...FE_DOUBLE_ATARI,FE_DOUBLE_ATARI_DEFEND,FE_LINE_1,FE_LINE_2,FE_LINE_3,...}move feature vector: (0,0,1,...,1,1,0,...,1,0,...0,0,...)
Idea: each feature is a booleanstatement about a state, or amoveEach feature is simple and easyto computeWith machine learning, we canconstruct an evaluation functionfrom a combination of manysimple featuresExamples: see Remi Coulom’spaper for list, Fuego screenshotfor examples (on next few slides)
CMPUT 496
Remi Coulom’s Simple Features (1)
Source: Remi Coulom, Computing Elo Ratings of Move Patterns in the Game of Go
CMPUT 496
Remi Coulom’s Simple Features (2)
Source: Remi Coulom, Computing Elo Ratings of Move Patterns in the Game of Go
CMPUT 496
Fuego Simple Features
Simple features in Fuego Go programSimilar to Coulom’s featuresEach legal move will have a (small) set of features
CMPUT 496
Pattern Databases
Image source: Stern et al, Bayesian
Pattern Ranking for Move Prediction in
the Game of Go
Large patterns can be learnedfrom master games, if they arefrequently usedIn Go, typically we have manydifferent sizes of pattern, from3x3 to full boardA main question is how toevaluate such patternsMeasure how often the move inthe center is played immediately,or later
CMPUT 496
Neural Nets
Image source:
https://www.slideshare.net/
ShaneSeungwhanMoon/
how-alphago-works
Represent knowledge in (largenumber of) weights of the neuralnetLower levels have localknowledge (e.g. 3x3, 5x5)Higher levels can combine localinformation for global evaluationMuch more on nets later in thecourse
CMPUT 496
Example of Exact Knowledge: Benson’sAlgorithm
Benson’s algorithm finds stonesand territories that areunconditionally aliveNo matter what the opponentplays, they cannot capture thesestonesA generalization of the “twoeyes” conceptCan be used as an exact filter ina program - do not generatemoves in safe territory
CMPUT 496
Summary
Many kinds of knowledgeUsed for evaluating states and movesHeuristic rules, patterns, neural networksExact knowledge, e.g. safe stonesNext: details - how to represent knowledge in program