B.tech CS S8 Artificial Intelligence Notes Module 3

1

Module 3

Game Playing

Games provided a structured task in which it was very easy to measure success or failure. Games did not obviously require large amounts of knowledge, thought to be solvable by straightforward search.Games are good vehicles for research because they are well formalized, small, and self-contained. They are therefore easily programmed. Games can be good models of competitive situations, so principles discovered in game-playing programs may be applicable to practical problems.

Game playing has been a major topic of AI since the very beginning. Beside the attraction of the topic to people, it is also because its close relation to "intelligence", and its well-defined states and rules. The most common used AI technique in game is search. In other problem-solving activities, state change is solely caused by the action of the agent. However, in Multi-agent games, it also depends on the actions of other agents who usually have different goals.

A special situation that has been studied most is "two-person zero-sum game", where the two players have exactly opposite goals. (Not all competition are zero-sum!). Given sufficient time and space, usually an optimum solution can be obtained for the former by exhaustive search, though not for the latter. However, for most interesting games, such a solution is usually too inefficient to be practically used.

Game playing is always popular with humans. Chess and Go are two popular games. Claude Shannon and Alan Turing introduced the first chess programs in 1950. Chess was chosen due to its simplicity, concrete representations, and popularity. Chess programs were viewed as existence proof of a machine doing something thought to require intelligence.

Having an opponent at the game introduces uncertainty. In all game programs, one should deal with the contingency problem. It refers to the fact that every branch of the search tree deals with a possible contingency that may arise. Games are often too hard to solve. Uncertainty arises since there is no enough time to calculate the exact consequences of any move. Games have heavy penalties for inefficiency. Pruning helps us remove parts of the search tree that make no difference to the game while heuristic

evaluation functions get us close to a true utility of a state without having to complete a full search.

Representation of games

The games studied by game theory are well-defined mathematical objects. A game consists of a set of players, a set of moves (or strategies) available to those players, and a specification of payoffs for each combination of strategies. There are two ways of representing games that are common in the literature.

2

1. Normal form

The normal (or strategic form) game is usually represented by a matrix which shows the players, strategies, and payoffs (see the example below). More generally it can be represented by any function that associates a payoff for each player with every possible combination of actions. In the accompanying example there are two players; one chooses the row and the other chooses the column.

Each player has two strategies, which are specified by the number of rows and the number of columns. The payoffs are provided in the interior. The first number is the payoff received by the row player (Player 1 in our example); the second is the payoff for the column player (Player 2 in our example). Suppose that Player 1 plays Up and that Player 2 plays Left. Then Player 1 gets a payoff of 4, and Player 2 gets 3.

When a game is presented in normal form, it is presumed that each player acts simultaneously or, at least, without knowing the actions of the other. If players have some information about the choices of other players, the game is usually presented in extensive form.

2. Extensive form

The extensive form can be used to formalize games with some important order. Games here are often presented as trees (as pictured to the left). Here each vertex (or node) represents a point of choice for a player. The player is specified by a number listed by the vertex. The lines out of the vertex represent a possible action for that player. The payoffs are specified at the bottom of the tree. Fig : An extensive form game

Player 2 chooses Left

Player 2 chooses Right

Player 1 chooses Up

4, 3 –1, –1

Player 1 chooses Down

0, 0 3, 4

Normal form or payoff matrix of a 2-player, 2-

strategy game

3

In the game pictured here, there are two players. Player 1 moves first and chooses either F or U. Player 2 sees Player 1's move and then chooses A or R. Suppose that Player 1 chooses U and then Player 2 chooses A, then Player 1 gets 8 and Player 2 gets 2.

The extensive form can also capture simultaneous-move games and games with incomplete information. Either a dotted line or circle is drawn around two different vertices to represent them as being part of the same information set (i.e., the players do not know at which point they are).

Types of games

1. Symmetric and asymmetric

A symmetric game is a game where the payoffs for playing a particular strategy depend only on the other strategies employed, not on who is playing them. If the identities of the players can be changed without changing the payoff to the strategies, then a game is symmetric. Many of the commonly studied 2×2 games are symmetric. The standard representations of prisoner's dilemma, and the stag hunt are all symmetric games.

Most commonly studied asymmetric games are games where there are not identical strategy sets for both players. For instance, the ultimatum game and similarly the dictator game have different strategies for each player. It is possible, however, for a game to have identical strategies for both players, yet be asymmetric. For example, the game pictured to the right is asymmetric despite having identical strategy sets for both players.

2. Zero sum and non-zero sum

Zero sum games are a special case of constant sum games, in which choices by players can neither increase nor decrease the available resources. In zero-sum games the total benefit to all players in the game, for every combination of strategies, always adds to zero (more informally, a player benefits only at the expense of others). Poker exemplifies a zero-sum game (ignoring the possibility of the house's cut), because one

E F

E 1, 2 0, 0

F 0, 0 1, 2

An asymmetric game

4

wins exactly the amount one's opponents lose. Other zero sum games include classical board games such as go and chess.

Many games studied by game theorists are non-zero-sum games, because some outcomes have net results greater or less than zero. Informally, in non-zero-sum games, a gain by one player does not necessarily correspond with a loss by another.

Constant sum games correspond to activities like theft and gambling, but not to the fundamental economic situation in which there are potential gains from trade. It is possible to transform any game into a (possibly asymmetric) zero-sum game by adding an additional dummy player (often called "the board"), whose losses compensate the players' net winnings.

3. Simultaneous and sequential

Simultaneous games are games where both players move simultaneously, or if they do not move simultaneously, the later players are unaware of the earlier players' actions (making them effectively simultaneous). Sequential games (or dynamic games) are games where later players have some knowledge about earlier actions. This need not be perfect knowledge about every action of earlier players; it might be very little information.

For instance, a player may know that an earlier player did not perform one particular action, while he does not know which of the other available actions the first player actually performed. Normal form is used to represent simultaneous games, and extensive form is used to represent sequential ones.

4. Perfect information and imperfect information

A game of imperfect information (the dotted line represents ignorance on the part of player 2) is shown below. An important subset of sequential games consists of games of perfect information. A game is one of perfect information if all players know the moves previously made by all other players. Thus, only sequential games can be games

A B

A –1, 1 3, –3

B 0, 0 –2, 2

A zero-sum game

5

of perfect information, since in simultaneous games not every player knows the actions of the others. Perfect information games include chess, go.

Perfect information is often confused with complete information, which is a similar concept. Complete information requires that every player knows the strategies and payoffs of the other players but not necessarily the actions.

5. Infinitely long games

Games, as studied by economists and real-world game players, are generally finished in a finite number of moves. Pure mathematicians are not so constrained, and set theorists in particular study games that last for infinitely many moves, with the winner (or other payoff) not known until after all those moves are completed.

The focus of attention is usually not so much on what is the best way to play such a game, but simply on whether one or the other player has a winning strategy. (It can be proven, using the axiom of choice, that there are games—even with perfect information, and where the only outcomes are "win" or "lose"—for which neither player has a winning strategy.) The existence of such strategies, for cleverly designed games, has important consequences in descriptive set theory.

Games as search problem

� Initial state: Initial State is the current board/position � Operators: legal moves and their resulting states � A terminal test: decide if the game has ended � A utility function (Payoff function): produces a numerical value for (only) the

terminal states. Example: In chess, outcome = win/loss/draw, with values +1, -1, 0 respectively

Game Trees

The sequence of states formed by possible moves is called a game tree ; each level of the tree is called a ply. In 2 player games we call the two players Max (us) and

6

Min (the opponent).WIN refers to winning for Max. At each ply, the "turn" switches to the other player.

Each level of search nodes in the tree corresponds to all the possible board configurations for a particular player – Max or Min. Utility values found at the end can be returned back to their parent nodes. So, winning for Min is losing for Max. Max wants to end in a board with +1 and Min in a board with a value of -1.

Max chooses the board with the max utility value, Min the minimum. Max is the first player and Min is the second. Every player needs a strategy. For example, the strategy for Max is to reach a winning terminal state regardless of what Min does. Even for simple games, the search tree is huge.

Minimax Algorithm

The Minimax Game Tree is used for programming computers to play games in which there are two players taking turns to play moves. Physically, it is just a tree of all possible moves.

With a full minimax tree, the computer could look ahead for each move to determine the best possible move. Of course, as you can see in example diagram shown below, the tree can get very big with only a few moves. Thus, for large games like Chess and Go, computer programs are forced to estimate who is winning or losing by focusing on just the top portion of the entire tree. In addition, programmers have come up with all sorts of algorithms and tricks such as Alpha-Beta pruning.

The minimax game tree, of course, cannot be used very well for games in which the computer cannot see the possible moves. So, minimax game trees are best used for games in which both players can see the entire game situation. These kinds of games, such as checkers, othello, chess, and go, are called games of perfect information.

For instance, take a look at the following (partial) search tree for Tic-Tac-Toe. Notice that unlike other trees like binary trees, 2-3 trees, and heap trees, a node in the game tree can have any number of children, depending on the game situation. Let us assign points to the outcome of a game of Tic-Tac-Toe. If X wins, the game situation is given the point value of 1. If O wins, the game has a point value of -1. Now, X will be trying to maximize the point value, while O will be trying to minimize the point value. So, one of the first researchers on the minimax tree decided to name player X as Max and player O as Min. Thus, the entire data structure came to be called the minimax game

tree.

The games we will consider here are:

• two-person: there are two players. • perfect information: both players have complete information about the state of the

game. (Chess has this property, but poker does not.)

7

• zero-sum: if we count a win as +1, a tie as 0, and a loss as -1, the sum of scores for both players is always zero.

Examples of such games are chess, checkers, and tic-tac-toe.

This minimax logic can also be extended to games like chess. In these more complicated games, however, the programs can only look at the part of the minimax tree; often, the programs can't even see the end of the game because it is so far down the tree. So, the computer only looks at a certain number of nodes and then stops. Then the computer tries to estimate who is winning and losing in each node, and these estimates result in a numerical point value for that game position. If the computer is playing as Max, the computer will try to maximize the point value of the position, with a win (checkmate) being equal to the largest possible value (positive 1 million, let's say). If the computer is playing as Min, it will obviously try to minimize the point value, with a win being equal to the smallest possible value (negative 1 million, for instance).

Fig : A (partial)search tree for the game of Tic-Tac-Toe is shown above. The top node is the initial state and max moves first placing an X in an empty square. We show part of the search tree, giving alternating moves by min(O) and max until we eventually reach terminal states, which can be assigned utilities according to the rules of the game.

8

Minimax Evaluation

The Min-Max algorithm is applied in two player games, such as tic-tac-toe, checkers, chess, go, and so on. All these games have at least one thing in common, they are logic games. This means that they can be described by a set of rules and premises. With them, it is possible to know from a given point in the game, what are the next available moves. So they also share other characteristic, they are ‘full information games’. Each player knows everything about the possible moves of the adversary.

� A search tree is generated, depth-first, starting with the current game position up to the end game position.

� Compute the values (through the utility function) for all the terminal states. � Afterwards, compute the utility of the nodes one level higher up in the search tree

(up from the terminal states. The nodes that belong to the MAX player receive the maximum value of its children. The nodes for the MIN player will select the minimum value of its children.

� Continue backing up the values from the leaf nodes towards the root. � When the root is reached, Max chooses the move that leads to the highest value

(optimal move).

Given a game tree the optimal strategy can be determined by examining the minimax value of each node which we can write as MINIMAX-VALUE (n). Utility value is the value of a terminal node in the game tree. Minimax value of a terminal state is just its utility. Minimax value indicates the best value that the current player can possibly get. It’s either the max or the min of a bunch of utility values. The minimax algorithm is a depth-first search. The space requirements are linear with respect to b and m where b is the no of legal moves at each point and m is the maximum depth of the tree. For real games, the time cost is impractical.

9

The minimax search will be as below:

Algorithm:

The MINIMAX value of a node can be calculated as

10

Above shown is an algorithm for calculating minimax decisions. It returns the action corresponding to the best possible move, that is, the move that leads to the outcome with the best utility, under the assumption that the opponent plays to minimize utility. The function MAXVALUE and MINVALUE go through the whole game tree all the way to the leaves to determine the backed up value of a state.

Completeness: Yes (if tree is finite) Time complexity: O(bd), where d is the depth of the tree. Space complexity: O(bd) (depth-first exploration), where d is the depth of the tree. Optimality: Yes, provided perfect info (evaluation function) and opponent is optimal

Alpha-Beta Search

The problem with minimax search is that the no of game states it has to examine is exponential in the no of moves. Minimax helps us look ahead four or five ply in chess. Average human chess players can make plans six or eight ply ahead. But it is possible to compute the correct minimax decision without looking at every node in the game tree. Alpha-beta pruning can be used in this context. It is similar to the minimax algorithm (applied to the same tree) it returns the same move as minimax would but prunes branches that cannot possibly influence the final decision.

� alpha is the value of the best choice (highest value) we have found till now along the path for MAX.

� beta is the value of the best choice (lowest value) we have found so far along the path for MIN.

Alpha-beta pruning changes the values for alpha or beta as it moves and prunes a

sub tree as soon as it finds out that it is worse than the current value for alpha or beta. If two nodes in the hierarchy have incompatible inequalities (no possible overlap), then we know that the node below will not be chosen, and we can stop search.

Implementing Alpha-Beta Search

Alpha-beta search is easily implemented by adding &alpha and &beta parameters to a depth-first minimax search. The alpha-beta search simply quits early and returns the current value when &alpha or &beta threshold is exceeded. Alpha-beta search performs best if the best moves (for Max) and worst moves (for Min) are considered first; in this case, the search complexity is reduced to O(bd/2) . For games with high symmetry (e.g. chess), a transposition table (Closed list) containing values for previously evaluated positions can greatly improve efficiency.

Alpha-Beta Search Example

Bounded depth-first search is usually used, with the alpha-beta algorithm, for game trees. However:

11

1. The depth bound may stop search just as things get interesting (e.g. in the middle of a piece exchange in chess. For this reason, the depth bound is usually extended to the end of an exchange.

2. The search may tend to postpone bad news until after the depth bound: the horizon effect.

Frequently, large parts of the search space are irrelevant to the final decision and can be pruned. No need to explore options that are already definitely worse than the current best option. Consider again the 2 ply game tree from Fig 5.2. If we go through the calculation of optimal decision once more we can identify the minimax decision without ever evaluating 2 of the leaf nodes.

Let the 2 unevaluated nodes be x and y and let z be the minimum of x and y. The value of root node is given by

MINIMAX-VALUE (root) = max(min(3,12,8), min(2,x,y),min(14,5,2))

= max(3,min(2,x,y),2)

= max(3,z,2) where z≤2

= 3.

In other words the value of root and hence the minimax decision are independent of the values of the pruned leaves x and y. Alpha beta pruning can be applied to trees of any depth, and it is often possible to prune entire sub trees rather than just leaves. The general principle is this: consider a node n somewhere in the tree such that a player has a choice of moving to that node. If the player has a better choice m either at the parent node of n or at any choice point further up, then n will never be reached in actual play.

12

Player

Player

Opponent

Opponent

…

m

n

If m is better than n for Player, we will never reach n in play.

Algorithm:

13

The minimax search is depth first so at any one time we just have to consider the nodes along a single path in the tree. The effectiveness of alpha beta pruning is highly dependent on the order in which the successors are examined. In games repeated states occur frequently because of transpositions- different permutations of the move sequence that end up in the same position. It is worthwhile to store the evaluation of this position in a hash table the first time it is encountered so that we don’t have to recompute it on subsequent occurrences.

The hash table of previously seen positions is traditionally called transposition table. It is essentially identical to the CLOSED list in graph search. If we are evaluating a million nodes per second, it is not practical to keep all of them in the transposition table. Various strategies can be used to choose the most valuable ones.

Limitations of alpha-beta Pruning

• Still dependant on search order. Can apply best-first search techniques

• Still has to search to terminal states for parts of the tree

• Depth may not be practical due to time Constraints

Completeness: Yes, provided the tree is finite

Time complexity: O(bd) in worst case where d is the depth of the tree. In best case Alpha-Beta, i.e., minimax with alpha-beta pruning, runs in time O(bd/2) , thus allowing us to evaluate a game tree twice as deep as we could with plain minimax. Space complexity: O(bd) (depth-first exploration), where d is the depth of the tree

Optimality: yes, provided perfect info (evaluation function) and opponent is optimal where d: Depth of tree and b: Legal moves at each point

14

15

Optimal decisions in multiplayer games

Many popular games allow more than 2 players. Let us examine how to extend

minimax idea to multiplayer games. First we need to replace the single values for each node with a vector of values. For e.g. in a 3 player game with players A, B and C a vector <vA,vB,vC> is associated with each node. For terminal state this vector gives the utility of the state from each player’s viewpoint. In 2 player games the 2 element vector can be reduced to a single value because the values are always opposite. The simplest way to implement this is to have the utility function return a vector of utilities.

Now we have to consider non terminal states. Consider the node marked X in the

game tree shown below. In that state, player C chooses what to do. The two choices lead to terminal states with utility vectors <vA=1,vB=2,vC=6> and <vA=4,vB=2,vC=3>. Since 6 is bigger than 3, C should choose the first move. This means that if state X is reached, subsequent play will lead to terminal states with utilities <vA=1,vB=2,vC=6>. Hence the backed-up value of X is this vector. In general the backed-up value of a node n is the utility vector of whichever successor has the highest value for the player choosing at n.

16

Anyone who plays multiplayer games quickly becomes aware that there is a lot

more going on than in 2 player games. Multiplayer games usually involve alliances, whether formal or informal, among the players. Alliances are made and broken as the game proceeds. In general the backed up value of a node is the utility vector of whichever successor has the highest value for the player choosing at n.

Imperfect Decisions

The minimax algorithm generates the entire game search space whereas the alpha beta algorithm allows us to prune large parts of it. However alpha beta search still has to search all the way to terminal states for at least a portion of the search space. The minimax is impractical since we assume that the program has time to search all the way to the terminal states. Shannon proposed the use of a heuristic evaluation function instead of the utility function that will enable us to stop the search earlier.

Although Alpha-Beta pruning helps to reduce the size of the search space, one still has to reach a terminal node before he can obtain a utility value. In other words, the suggestion is to alter minimax or alpha beta in 2 ways: the utility function is replaced by a heuristic evaluation function EVAL which gives us an estimate of the expected utility of the game from a given viewpoint and the terminal test is replaced by a cutoff test . For example, each pawn in chess is worth 1, a knight or bishop 3, and the queen 9.

Evaluation function

An evaluation function, also known as a heuristic evaluation function or static

evaluation function, is a function used by game-playing programs to estimate the value or goodness of a position in the minimax and related algorithms. The evaluation function is typically designed to be fast and accuracy is not a concern (therefore heuristic); the function looks only at the current position and does not explore possible moves (therefore static).

One popular strategy for constructing evaluation functions is as a weighted sum of various factors that are thought to influence the value of a position. For instance, an evaluation function for chess might take the form

c1 * material + c2 * mobility + c3 * king safety + c4 * center control +

...

Chess beginners, as well as the simplest of chess programs, evaluate the position taking only "material" into account, i.e they assign a numerical score for each piece (with pieces of opposite color having scores of opposite sign) and sum up the score over all the pieces on the board. On the whole, computer evaluation functions of even advanced programs tend to be more materialistic than human evaluations. This, together with their reliance on tactics at the expense of strategy, characterizes the style of play of computers.

17

Most evaluation functions work by calculating various features of a state for e.g. the number of pawns possessed by each side in the game of chess. The features taken together define various categories or equivalence classes of states: the states in each category have the same values for all the features. Any given category will contain some states that lead to wins, some that lead to draws and some that lead to losses. The evaluation function cannot know which states are which but it can return a single value that reflects the proportion of states with each outcome.

The evaluation function should not take long and should comply with the utility

function on terminal states. Probabilities can help. For example, a position A with 40% chance of winning, 35% of losing, 25% of being a draw, has as evaluation:

1 x.4+-1x.35+0x.25=.05

In practice this kind of analysis requires too many categories and hence too much experience to estimate all the probabilities of winning. Instead most evaluation functions compute separate numerical contributions from each feature and then combine them to find the total value. Mathematically this kind of evaluation function is called a weighted linear function is n

EVAL(s)=w1f1(s)+w2f2(s)+………wnfn(s) = ∑ wifi(s) i=1 where wi’s are the weights (e.g., 1 for a pawn,5 for a rook,9 for a queen) and fi ‘s are the features of a particular position. For chess the fi could be the number of each kind of piece on the board and the wi could be the values of the pieces.

Adding up the values of features seems like a reasonable thing to do but in fact it involves a very strong assumption that the contribution of each feature is independent of the other features. A static evaluation function evaluates a position without doing any search.

Example 1: Tic-tac-toe

e(p) = nrows(Max) - nrows(Min) where nrows(i) is the number of complete rows, columns, or diagonals that are still open for player i.

Example 2: Chess

18

e(p) = (sum of values of Max's pieces) - (sum of values of Min's pieces) + k * (degree of control of the center)

Cutting off search

The next step is to modify alpha-beta search so that it will call heuristic EVAL function when it is appropriate to cut off the search.In terms of implementation we replace the 2 lines that mention TERMINAL-TEST with the following line.

If CUTOFF-TEST(state,depth) then return EVAL(state)

E.g., Depth limited Minimax search.

We would like to

do Minimax on

this full game

tree but ... �

… but we don’t

have time, so we

will explore it to

some manageable

depth.

cutoff

We must also arrange some bookkeeping so that the current depth is incremented

on each recursive call. The most straight forward approach to controlling the amount of search is to set a fixed depth limit, so that CUTOFF-TEST returns true for all depth greater than some fixed depth d. Linear evaluation functions are used often but nonlinear ones are not uncommon. One should tune the weights by “trial-and-error”. Another way is to apply iterative deepening. Heuristic functions evaluate board without knowing where exactly it will lead to. It is used to estimate the probability of winning from that node.

When searching a large game tree (for instance using minimax or alpha-beta

pruning) it is often not feasible to search the entire tree, so the tree is only searched down to a certain depth. This results in the horizon effect where a significant change exists just

19

over the "horizon" (slightly beyond the depth the tree has been searched). Evaluating the partial tree thus gives a misleading result. The horizon effect is more difficult to eliminate. It arises when the program is facing a move by opponent that causes serious damage and is ultimately unavoidable.

An example of the horizon effect occurs when some negative event is inevitable but postponable, but because only a partial game tree has been analyzed, it will appear to the system that the event can be avoided when in fact this is not the case. For example, in chess, if the white player is one move away from queening a pawn, the black AI can play moves to stall white and push the queening "over the horizon", and mistakenly believe that it avoided it entirely.

The horizon effect can be avoided by using "singular extension" in addition to the search algorithm. A singular extension is a move that is clearly better than all other moves in a given position. This gives the search algorithm the ability to look beyond its horizon, but only at moves of major importance (such as the queening in the previous example). This does not make the search much more expensive, as only a few specific moves are considered. In chess, the computer player may be looking ahead 20 moves. If there are subtle flaws in its position that only matter after 40 moves, then the computer player can be beaten.

Horizon Effect

Fixed depth search thinks it can avoid the queening move

Black to move

The evaluation function should be applied only to positions which are quiescent (no dramatic changes in the near future). Quiescence search deals with the expansion of nonquiescent positions in order to reach quiescent positions. For example, one may try to postpone the queening move of a pawn “over the horizon” (where it cannot be detected). Essentially, a quiescent search is an evaluation function that takes into account some

20

dynamic possibilities. Sometimes it is restricted to consider only certain types of moves such as capture moves that will quickly resolve the uncertainties in the position.

So far we have talked about cutting off search at a certain level and about doing alpha beta pruning that provably has no effect on the result. It is also possible to do forward pruning meaning that some moves at a given node are pruned immediately without further consideration. Clearly most humans playing chess only consider a few moves from each position. Unfortunately this approach is rather dangerous because there is no guarantee that the best move will not be pruned away. This can be disastrous if applied near the root, because every so often the program will miss some obvious moves. Combining all the techniques described here results in a program that can play creditable chess.

State-Of- The-Art-Game Programs

Introduction

We give some details of recent programs in variety of different types of games for playing two-person games. For some time there have been good programs for playing Backgammon, Othello and Draughts and in fact now there exist programs which are world champion status in all three areas. TD-Gammon has revolutionized the classic game of backgammon. TD-Gammon actually taught itself to play backgammon, starting from scratch, by using breakthrough research in Artificial Neural Networks. TD-Gammon learned to play well enough to rank among the best players in the world. Also Connect–4 and go–moku can be played perfectly by programs. We begin with Chess, where it has taken quite a while to produce a world–class program.

1) Chess

Deep Blue the chess program from IBM beat the world champion Kasparov 3.5 – 2.5 in May 1997. It has special purpose hardware and can evaluate 200 million moves per second. The special purpose hardware clearly helped, but carefully tuned software is also essential.

If we take a typical length game of chess, then there are 10125 possible nodes in

the full game tree. This would take more than 10108 years to search if 109 moves could be investigated per second.

Deep Blue, uses search using mainly what is called singular extensions. It does

say a 10 ply search, and then identifies the interesting lines of play to extend the search to a deeper level (say 30 – 40 ply).

It also has other types of extensions. Threat extension is used if a position of

threat is discovered. Influence extension if for example there is a free pawn which can move to become a queen.

21

The software is used to perform the initial search and then the special purpose hardware for the last 5 ply. The evaluation heuristic function makes use of a great deal of information such as: • how good a position is it e.g. difference in number of pieces; • position of important pieces (e.g. bishop on black is no good if all opponents pieces on white);

Deep Blue is compared with grand masters games to tune the evaluation function. Also did some manual tuning by playing against a grand master until poor move identified. Much of the tree is evaluated in parallel using the 512 special purpose chips with minimal communication. IBM are interested in exploiting what has been learnt from Deep Blue in real world problems. These include such areas as: • financial risk (using Monte Carlo simulations); • data mining; • molecular dynamics simulation – of interest in pharmaceutical modelling (hope to deal with 100,000 atoms. • use the visualisation tools developed for use in other domains

More recently, a program called X3D Fritz has come on the Chess scene and is a competitor to Deep Blue. X3D Fritz has played four games against Kasparov in November 2003. The scores were 1 win for Kasparov, 1 win for X3D Fritz and two draws. The game is played on a computer with a 3D chess board when viewed through special glasses. Kasparov received $175K for the games.

2) Checkers

The game of Checkers or Draughts is played on a board of sixty-four squares of alternate colors, usually referred as black and white although not necessarily of those colors. Twenty-four pieces (fat round counters), called men (twelve on each side) also of opposite colors are located on the game board. The game is played by two persons; the one having the twelve black or red pieces is technically said to be playing the tint side, and the other having the twelve white "men", to be playing the second side. Samuels learning program eventually beat developer. Chinook is world champion, previous champ held title for 40 years had to withdraw for health reasons. It has all end games from 8 pieces pre–computed. This is 440×109 positions and is compressed to 6 gigabytes. It uses 25 play look-ahead tree, and so in the tree soon comes to end positions. Chinook drew with the world champion Tinsley in 1994. He was considered the best champion there has ever been in draughts.

The techniques developed are being used in BioTools which is a program for DNA synthesis. David Fogel has used evolutionary programming to automatically generate a draughts playing program called Blondie24. This was played against human players on the web and eventually obtained a rating of 2,000. The evolutionary program generated a neural network which was used to evaluate a heuristic function for mini-max.

22

The object of the game is to capture all of the opponent's men, or block them so

they cannot be moved, the person whose side is brought to this state loses the game.

3) Othello

Reversi and Othello are names for a strategic board game which involves play by two parties on an eight-by-eight square grid with pieces that have two distinct sides. Pieces typically appear coin-like, but with a light and a dark face, each side representing one player. The object of the game is to make your pieces constitute a majority of the pieces on the board at the end of the game, by turning over as many of your opponent's pieces as you can.

Othello also called Reversi is probably more popular as a computer game than as

a board game. It has a smaller search space than chess, usually 5 to 15 legal moves, but evaluation expertise had to be developed from scratch. In 1997 the Logistello program defeated the human world champion Takesh Murakami by 6 games to none. It is generally acknowledged that humans are no match for computers at Othello.

4) Backgammon

Most work on blackgammon has gone into improving the evaluation function.

Gerry Tesauro combined Samuel’s reinforcement learning method with neural network techniques to develop a remarkably accurate evaluator that is used with a search to depth 2 or 3. After playing more than a million training games against itself, Tesauro’s program, TD Gammon is reliably ranked among the top 3 players in the world.

5) Go

Go is a strategic board game for two players. It is also known as Weiqi in Chinese, Igo or Go in Japanese. Go originated in ancient China, centuries before it was first mentioned in writing 548 BC. It is now popular throughout the world, especially in East Asia.

Go is played by two players alternately placing black and white stones on the

vacant intersections of a 19×19 line grid. A stone or a group of stones is captured and removed if it is tightly surrounded by stones of the opposing color. The objective is to control a larger territory than the opponent's by placing one's stones so they cannot be captured. The game ends and the score is counted when both players consecutively pass on a turn, indicating that neither side can increase its territory or reduce its opponent's.

In game theory terms, Go is a zero-sum, perfect information, deterministic

strategy game, putting it in the same class as chess, checkers (draughts), and reversi (othello), although it is not similar in its play to these. Although the game rules are very simple, the practical strategy is extremely complex.

23

The game emphasizes the importance of balance on multiple levels, and has internal tensions. To secure an area of the board, it is good to play moves close together; but to cover the largest area one needs to spread out, perhaps leaving weaknesses that can be exploited. Playing too low (close to the edge) secures insufficient territory and influence; yet playing too high (far from the edge) allows the opponent to invade. Many people find Go attractive for its reflection of the conflicting demands of real life.

6) Bridge

Contract bridge, usually known simply as bridge, is a trick-taking card game of skill and chance (the relative proportions depend on the variant played). It is played by four players who form two partnerships (sides); the partners sit opposite each other at a table. The game consists of the auction (often called bidding) and play, after which the hand is scored.

The bidding ends with a contract, which is a declaration by one partnership that

their side shall take at least a stated number of tricks, with specified suit as trump or without trumps. The rules of play are similar to other trick-taking games with the addition of the fact that one player's hand is displayed face up on the table as the "dummy".

Much of bridge's popularity owes to the possibility that it can be played in

tournaments of theoretically unlimited number of players; this form is referred to as duplicate bridge.

Knowledge Representation

It is a medium for human expression. An intelligent system must have Knowledge Representations that can be interpreted by humans. We need to be able to encode information in the knowledge base without significant effort. We need to be able to understand what the system knows and how it draws its conclusions.

It is necessary to represent the computer's knowledge of the world by some kind of data structures in the machine's memory. Traditional computer programs deal with large amounts of data that are structured in simple and uniform ways. A.I. programs need to deal with complex relationships, reflecting the complexity of the real world.

Several kinds of knowledge need to be represented:

• Factual Data: Known facts about the world. • General Principles: ``Every dog is a mammal.'' • Hypothetical Data: The computer must consider hypotheticals in order to reason

about the effects of actions that are being contemplated.

24

Types

1) Logic (prepositional, predicate) 2) Network representation

• Semantic nets 3) Structured representation

• Frames

Semantic Nets

Semantic Nets were invented by Quillian in 1969. Quillian was a Psychologist (at UMich) and was trying to define the structure of human knowledge. Semantic Nets are about relations between concepts. Semantic Nets use a graph structure so that concepts are nodes in the graph.

An important feature of human memory is the high number of connections or associations between the different pieces of information contained in it. A semantic network is one of the knowledge representation languages based on this intuition. The basic idea of a semantic network representation is very simple: There are two types of primitive, nodes and links or arcs. Links are unidirectional connections between nodes. Nodes correspond to objects, or classes of objects, in the world, whereas links correspond to relationships between these objects.

Semantic nets are usually used to represent static, taxonomic, concept dictionaries. Semantic nets were originally designed as a way to represent the meanings of English words. In a semantic net, information is represented as a set of nodes connected to each other by a set of labeled arcs, which represent relationships among the nodes. Major problem with semantic nets is that although the name of this knowledge representation language is semantic nets, there is not, ironically, clear semantics of the various network representations. A fragment of a typical semantic net is shown in Figure 1 below.

Figure 1: A Semantic Network

25

It is useful to think about semantic nets using a graphical notation, as shown in the figure. Of course, they cannot be represented inside a program that way. Instead, they are usually represented using some kind of attribute-value memory structure.

Notice that each property stores a one-way link, such as the arc from MY-CHAIR to ME. To store bidirectional links, it is necessary to store each half separately. So if we wanted to be able to answer the question "What do I own?" without searching the entire net, we would need arcs from ME to all the nodes that connect to ME via OWNER arcs. Since it means something different to go from MY-CHAIR to ME than it does to go from ME to MY-CHAIR, we need a new kind of arc, which we can call OWNED. It is, of course, more efficient to store only one-way arcs. But then it is difficult to form inferences that go in the missing direction.

For each system, it is necessary to decide what kind of inferences will be needed and then to design the links appropriately; this is the same problem that designers of any database must solve when they decide what fields to index on.

Important semantic relations:

• Meronymy (A is part of B, i.e. B has A as a part of itself)

• Holonymy (B is part of A, i.e. A has B as a part of itself)

• Hyponymy (or troponymy) (A is subordinate of B; A is kind of B)

• Hypernymy (A is superordinate of B)

• Synonymy (A denotes the same as B)

• Antonymy (A denotes the opposite of B)

Representing Non-binary Predicates

Semantic nets can be used to represent relationships that would appear as two-place predicates in predicate logic. For example, some of the arcs from Figure 1 could be represented in logic as

ISA(chair, furniture)

ISA(me,person)

COVERING(my-chair,leather)

COLOR(my-chair ,tan)

26

We have already seen that many one-place predicates in logic can be thought of as two-place predicates using some very general-purpose predicates, such as ISA. But the knowledge expressed by other predicates can also be expressed in semantic nets. So, for example,

MAN(Marcus)

could be rewritten as

ISA(Marcus, man)

thereby making it easy to represent in a semantic net.

Three or more place predicates can also be converted to a binary form by creating one new object representing the entire predicate statement and then introducing binary predicates to describe the relationship to this new object of each of the original arguments. For example, suppose we know that

SCORE(red blue (17 3))

This can be represented in a semantic net by creating a node to represent the specific game and then relating each of the three pieces of information to it. Doing this produces the network shown in Figure 2.

Figure 2: A Semantic Net for an N-Place Predicate

This technique is particularly useful for representing the contents of a typical declarative sentence, which describes several aspects of a particular event. The sentence

John gave the book to Mary

could be represented by the network shown in Figure 3.

27

Figure 3: A Semantic Net Representing a Sentence

The node labeled BK23 represents the particular book that was referred to by the phrase the book. Note that we are again using a case grammar analysis of the sentence. Semantic nets have been used to represent a variety of kinds of knowledge in a variety of different programs.

Reasoning with the Knowledge

As with any other mechanism for representing knowledge, the power of semantic nets lies in the ability of programs to manipulate them to solve problems. One of the early ways that semantic nets were used was to find relationships among objects by spreading activation out from each of two nodes and seeing where the activation met. This process was called intersection search. Using this process, it is possible to use the network of figure below to answer questions such as

Question: “What is the relation between Chicago cubs and Brooklyn Dodgers?”

Answer: “They are both teams of baseball players.”

More recent applications of semantic nets have used more directed search procedures to answer specific questions. Although in principle there are no restrictions on the way information can be represented in the networks, these search procedures can only

28

be effective if there is consistency in what each node and each link mean. Let's look at a couple of examples of this.

There is a difference between a concept (such as GAME or GIVE) and instances of that concept (such as G23 or EV7). Whereas my chair can get wet, the concept chair

cannot. If information about specific instances is stored at the node representing the concept, it is not possible to differentiate among multiple instances of the same concept. For example, we could represent the fact that

My chair is tan.

using the semantic net as

But we would then have a hard time representing the additional fact that Mary's chair is green. To avoid this problem, links from the concept node should usually be used to describe properties of all (or most) instances of the concept, while links from an instance node describe properties of the individual instance. It is important to make explicit this difference between a node representing a class of objects and a node representing an instance of a class.

We have been using ISA links to represent hierarchical relationships between concept nodes, and sometimes also to relate nodes representing specific objects to their associated concepts. But in order to maintain the distinction between concept nodes and instance nodes, we need to introduce a new kind of link by which instance nodes can be connected to the concepts that describe them. We will call this new link INSTANCE-OF. An example of it occurs in the following fragment of a network:

Sometimes it is also important to distinguish between a node representing the canonical instance of a concept and a node representing the set of all instances of it. For example, the set of all Americans is of size two and a quarter million, while the typical American is of size five and a half feet. There is a difference between a link that defines a new entity and one that relates two existing entities. For example

John is 6 ft tall

29

can be represented using semantic net as below

Both nodes represent objects that exist independently of their relationship to each other. But now suppose we want to represent the fact that John is taller than Bill, using the net

The nodes HI and H2 represent the new concepts JOHN'S-HEIGHT and BILL'S-HEIGHT, respectively. They are defined by their relationships to the nodes JOHN and BILL. Using these defined concepts, it is possible to represent such facts as that John's height increased, which we could not do before. (The number 72 increased?). Sometimes it is useful to introduce the arc VALUE to make this distinction clear. Thus we might represent the fact that John is 6 feet tall and that he is taller than Bill by the net

The procedures that operate on nets such as this can exploit the fact that some arcs, such as HEIGHT, define new entities, while others, such as GREATER-THAN and VALUE, merely describe relationships among existing entities.

Partitioned Semantic Nets

One of the major problems with the use of semantic nets to represent knowledge is how to handle quantification. One way of solving the problem is to partition the semantic net into a hierarchical set of spaces, each of which corresponds to the scope of one or more variables. To see how this works, consider first the simple net shown in Figure 4(a). This net corresponds to the statement

The dog bit the postman.

30

The nodes DOGS, BITE, and POSTMEN represent the classes of dogs, biting, and postmen, respectively, while the nodes D, B, and P represent a particular dog, a particular biting, and a particular postman. This fact can easily be represented by a single net with no partitioning.

But now suppose that we want to represent the fact that

Every dog has bitten a postman.

or, in logic x dog(x)�( y postman(y) bite(x,y))

To represent this fact, it is necessary to encode the scope of the universally quantified variable x. This can be done using partitioning as shown in Figure 4(b). The node g stands for the assertion given above. Node g is an instance of the special class GS of general statements about the world (i.e., those with universal quantifiers). Every element of GS has at least two attributes" a FORM, which states the relation that is being asserted, and one or more V connections, one for each of the universally quantified variables.

In this example, there is only one such variable d, which can stand for any element of the class DOGS. The other two variables in the form, b and p, are understood to be existentially quantified. In other words, for every dog d, there exists a biting event b, and a postman p, such that d is the assailant of b and p is the victim.

Figure 4: Using Partitioned Semantic Nets

31

To see how partitioning makes variable quantification explicit, consider next similar sentence:

Every dog in town has bitten the constable.

The representation of this sentence is shown in Figure 7-7(c). In this net, the node c representing the victim lies outside the form of the general statement. Thus it is not viewed as an existentially quantified variable whose value may depend on the value of d. Instead it is interpreted as standing for a specific entity (in this case, a particular constable), just as do other nodes in a standard, non partitioned net.

Figure 4(d) shows how yet another similar sentence:

Every dog has bitten every postman.

In this case, g has two V links, one pointing to d, which represents any dog, and one pointing to p, representing any postman. The spaces of a partitioned semantic net are related to each other by an inclusion hierarchy. For example, in Figure 4(d), space S1 is included in space SA. Whenever a search process operates in a partitioned semantic net, it can explore nodes and arcs in the space from which it starts and in other spaces that contain the starting point, but it cannot go downward, except in special circumstances, such as when a FORM arc is being traversed.

So, returning to Figure 4(d), from node d it can be determined that d must be a dog. But if we were to start at the node DOGS and search for all known instances of dogs traversing ISA links, we would not find d, since it and the link to it are in the space S1, which is at a lower level than space SA, which contains DOGS. This is important, since d does not stand for a particular dog; it is merely a variable that can be instantiated with a value that represents a dog. Partitioned semantic nets have a variety of other uses besides their ability to encode quantification. Semantic nets have been widely used to represent a variety of kinds of knowledge.

Advantages of Semantic nets

• Easy to visualize • Formal definitions of semantic networks have been developed. • Related knowledge is easily clustered. • Efficient in space requirements

– Objects represented only once – Relationships handled by pointers

Semantic nets have two significant advantages over unstructured sets of logic clauses: • Economy: each property only needs to be represented once: friendly to people for cat, and so on down to long tail for Maine Coon and orange for Charles Douglas. Imagine what your system would look like if, for every single individual Maine Coon cat, you had

32

to write out all the information about Maine Coons, pedigree cats, cats, ... and so on up to animals (and beyond). • Significant generalization: in a large system like that, it might not be easy to see that all the Maine Coons had long tails, and even if you did see it, you might think it was a coincidence. By making long tail a property of the class Maine Coon, you make clear both the fact, and that it is significant.

Disadvantages of Semantic nets

• Inheritance (particularly from multiple sources and when exceptions in inheritance are wanted) can cause problems.

• Facts placed inappropriately cause problems. • No standards about node and arc values

Uses of Semantic Nets

• Coding static world knowledge • Built-in fast inference method (inheritance)

Frames

Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where the distinction between a semantic net and a frame ends. Semantic nets initially we used to represent labeled connections between objects. As tasks became more complex the representation needs to be more structured. The more structured the system it becomes more beneficial to use frames. A frame is a collection of attributes or slots and associated values (and possibly constraints on values) that describe some real world entity. Frames on their own are not particularly helpful but frame systems are a powerful way of encoding information to support reasoning. Set theory provides a good basis for understanding frame systems. Each frame represents:

• a class (set), or • an instance (an element of a class).

A single frame taken alone is rarely useful. Instead we build frame systems out of a collection of frames that are connected to each other by virtue of the fact that the value of an attribute of one frame may be another frame.

Frames as Sets and Instances

Set theory provides a good basis for understanding frame systems. In this view,

each frame represents either a class (a set) or an instance (an element of a class). To see how this works, consider the frame system shown in Figure 9.5. In this example, the frames Person, Adult-Male, ML-Baseball-Player (corresponding to major league baseball

33

players Pitcher, and ML-Baseball-Team (for major league baseball team) are all classes. The frames Pee-Wee-Reese and Brooklyn-Dodgers are instances.

The isa relation that we have been using without a precise definition is in fact the subset relation. The set of adult males is a subset of the set of people. The set of major league baseball players is a subset of the set of adult males, and so forth. Our instance

relation corresponds to the relation element-of. Pee Wee Reese is an element of the set of fielders. Thus he is also an element of all of the supersets of fielders, including major league baseball players and people.

Because a class represents a set, there are two kinds of attributes that can be associated with it. There are attributes about the set itself, and there are attributes that are to be inherited by each element of the set. We indicate the difference between these two by prefixing the latter with an asterisk (*). For example, consider the class ML-Baseball-

Player, We have shown only two properties of it as a set: It is a subset of the set of adult males. And it has cardinality 624 (i.e., there are 624 major league baseball players). We have listed five properties that all major league baseball players have (height, bats,

batting-average, team, and uniform-color), and we have specified default values for the first three of them. By providing both kinds of slots, we allow a class both to define a set of objects and to describe a prototypical object of the set.

But what we have encountered here is an example of a more general problem. A class is a set, and we want to be able to talk about properties that its elements possess. We want to use inheritance to infer those properties from general knowledge about the set. But a class is also an entity in itself. It may possess properties that belong not to the individual instances but rather to the class as a whole. In the case of Brooklyn-Dodgers,

such properties include team size and the existence of a manager. We may even want to inherit some of these properties from a more general kind of set. For example, the Dodgers can inherit a default team size from the set of all major league baseball teams. To support this, we need to view a class as two things simultaneously: a subset (isa) of a larger class that also contains its elements and an instance (instance) of a class of sets from which it inherits its set-level properties.

To make this distinction clear, it is useful to distinguish between regular classes whose elements are individual entities, and metaclasses, which are special, classes whose elements are themselves classes. A class is now an element of (instance) some class (or classes) as well as a subclass (isa) of one or more classes. A class inherits properties from the class of which it is an instance, just as any instance does. In addition, a class passes inheritable properties down from its super classes to its instances.

34

35

Let’s consider an example. Figure 9.6 below shows how we could represent teams as classes using this distinction. Figure 9.7 shows a graphic view of the same classes. The most basic metaclass is the class Class. It represents the set of all classes. All classes are instances of it, either directly or through one of its subclasses. In the example, Team is a subclass (subset) of Class and ML-Baseball-Team is a, subclass of Team. The class Class

introduces the attribute cardinality, which is to be inherited by all instances of Class

(including itself). This makes sense since all the instances of Class are sets and all sets have cardinality.

36

Team represents a subset of the set of all sets, namely those whose elements are sets of players on a team. It inherits the property of having cardinality from Class. Team

introduces the attribute team-size, which all its elements possess. Notice that team.-size is like cardinality in that it measures the size of a set: But it applies to something different; cardinality applies to sets of sets and is inherited by all elements of Class. The slot team-

size applies to the elements of those sets that happen to be teams. Those elements are sets of individuals.

ML-Baseball-Team is also an instance of Class, since it is a set. It inherits the property of having ,a cardinality from the set of which it is an instance, namely Class.

But it is a subset of Team. All of its instances will have the property of having a team-size

since they are also instances of the super class Team. We have added at this level the additional fact that the default team size is 24, so all instances of ML-Baseball-Team will inherit that as well. In addition, we have added the inheritable slot manager.

In all the frame systems we illustrate, all classes are instances of the metaclass Class. As a result, they all have the attribute cardinality. We leave the class Class, the isa links to it, and the attribute cardinality out of our descriptions of our examples, though, unless there is some particular reason to include them.

Every class is a set. But not every set should be described as a class. A class describes a set of entities that share significant properties. In particular, the default information associated with a class can be used as a basis for inferring values for the properties of its individual elements. So there is an advantage to representing as a class those sets for which membership serves as a basis for nonmonotonic inheritance. Typically, these are sets in which membership is not highly ephemeral. Instead, membership is based on some fundamental structural or functional properties. To see the difference, consider the following sets:

37

• People

• People who are major league baseball players

• People who are on my plane to New York

The first 2 sets can be advantageously represented as classes with which a substantial number of inherited attributes can be associated. The only properties that all the elements of that set probably share are a definition of the set itself and some other properties that follow from the definition.

Other Ways of Relating Classes to Each Other

We have talked up to this point about two ways in which classes (sets) can be related to each other. Class l can be a subset of Class2. Or, if Class2 is a metaclass, then Class 1 can be an instance of Class 2. But there are other ways that classes can be related to each other, corresponding to ways that sets of objects in the world can be related.

One such relationship is mutually-disjoint-with which relates a class to one or more other classes that are guaranteed to have no elements in common with it. Another important relationship is is-covered-by, which relates a class to a set of subclasses the union of which is equal to it. If a class is-covered-by a set S of mutually disjoint classes, then S is called a partition of the class. For examples of these relationships, consider the classes shown in Figure 9.8. Slots as Full-Fledged Objects So far, we have provided a way to describe sets of objects and individual objects, both in terms of attributes and values. Thus we have made extensive use of attributes, which we have represented as slots attached to frames. But it turns out that there are several reasons why we would like to be able to represent attributes explicitly and describe their properties. Some of the properties we would like to be able to represent and use in reasoning include:

• The classes to which the attribute can be attached, i.e., for what classes does it make sense? For example, weight makes sense for physical objects but not for conceptual ones.

• Constraints on either the type or the value of the attribute. For example, the age of a person must be a numeric quantity measured in some time frame, and it must be less than the ages of the person's biological parents.

• A value that all instances of a class must have by the definition of the class.

• A default value for the attribute

• Rules for inheriting values for the attribute. The usual rule is to inherit down isa

and instance (links. But some attributes inherit in other ways. For example, last-name inherits down the child-of link.

.

38

• Rules for computing a value separately from inheritance. One extreme form of such a rule is a procedure written in some procedural 'programming language such as LISP.

• An inverse attribute

• Whether the slot is single-valued or multivalued

39

In order to be able to represent these attributes of, attributes, we need to describe

attributes (slots) as frames. These frames will be organized into an isa hierarchy, just as any other frames are, and, that hierarchy can then be used to support inheritance of values for attributes of slots.

A slot is a relation. It maps from elements of its domain (the classes for which it makes sense) to elements of its range (its possible values). A relation is a set of ordered pairs. Thus it makes sense to say that one relation (R1) is a subset of another (R2). In that case, R1, is a specialization of R2, so in our terminology isa(R1, R2). Since a slot is a set, the set of all slots, which we will call Slot, is a metaclass. Its instances are slots, which may have subslots.

Figures 9.9 and 9.10 illustrate several examples of slots represented as frames.

Slot is a metaclass. Its instances are slots(each of which is a set of ordered pair). Associated with the metaclass are attributes that each instance (i.e., each actual slot) ,will inherit. Each slot, since it is a relation, has a domain and a range. We represent the domain in the slot labeled domain. We break up the representation of the range into two parts: range gives the class of which elements of the range must be elements; range-

constraint contains a logical expression that further constrains the range to be elements of range that also satisfy the constraint. If range-constraint is absent, it is taken to be TRUE. The advantage to breaking, the description apart into these two pieces is that type checking is much cheaper than is arbitrary constraint checking, so it is useful to be able to

do it separately and early during some reasoning processes.

The other slots do what you would expect from their names. If there is a value for definition, itll1ust be propagated to all instances of the slot. If there is a value for default,

that value is inherited to all instances of the slot unless there is an overriding value. The attribute transfers-through lists other slots from which values for this slot can be derived through inheritance. The to-compute slot contains a procedure for deriving its value. The inverse attribute contains the inverse of the slot. Although in principle all slots have Inverses, sometimes they are not useful enough in reasoning to be worth representing. And single-valued is used to mark the special cases in which the slot is a function and so

can have only one value.

Of course, there is no advantage to representing these properties of slots if there is no reasoning mechanism that exploits them. In the rest of our discussion, we assume that the frame-system interpreter knows how to reason with all of, these slots of slots as part of its built-in reasoning capability. In particular, we assume that it is capable of

performing the following reasoning actions:

• Consistency checking to verify that when a slot value is added to a frame � The slot makes sense for the frame. This relies on the domain attribute of

the slot. � The value is a legal value for the slot. This relies on the range and range-

constraints attributes.

. . .

40

• Maintenance of consistency between the values for slots and their inverses when-

ever one is updated.

• Propagation of definition values along isa and instance links.

• Inheritance of default values along isa and instance links.

• Computation of a value of a slot, as needed. This relies .on the to-compute and transfers-through attributes.

• Checking that only a single value is asserted for single-valued slots. This is usually done by replacing an old value by the new one when it is asserted. An alternative is to force explicit retraction of the old value, and to signal a contradiction if a new value is asserted when another is already there.

41

I

We have defined the properties range-constraint and default as parts of a slot. But we often think of them as being properties of a slot associated with a particular class. For example, in Figure 9.5, we listed two defaults fort the batting-average slot, one associated with major league baseball players and one associated with fielders. Figure 9.11 shows how this can be represented correctly, by creating a specialization of batting-

average that can be associated with a specialization of ML-Baseball-Player to represent the more specific information that is known about the specialized class.

Unfortunately, although this model of slots is simple and it is internally consistent, it is not easy to use. So we introduce some notational shorthand that allows the four most important properties of a slot(domain, range, definition, and default) to be defined implicitly by haw the slot is used in the definitions of the classes in its domain.

42

We describe the domain implicitly to be the class where the slot appears. We describe the range, and any range constraint’s with the clause MUST BE, as the value of an inherited slot. Figure 9.12 shows an example of this notation. And we describe the definition and the default, if they are present, by inserting them as the value of the slot when it appears. The two will be distinguished by prefixing a definitional value with an asterisk (*). We then let the underlying bookkeeping of the frame system create the frames that represent slots as they are needed.

Now let's look at examples of haw these slots can be used. The slats bats and my-

manager illustrate the use of the to-compute attribute of a slot. The variable x will be bound to the frame to which the slat is attached. We use the dot notation to specify the value of a slot of a frame. Specifically, X.y describes the value(s) of the y slot of frame x.

So we know that to compute a frame's value for my-manager, it is necessary to find thy frame's value for team, then find the resulting team's manager. We have simply composed two slots to form a new one. Computing the value of the bats slot is even simpler. Just go get the value of the handed slot.

The manager slot illustrates the use of a range constraint. It is stated in terms of a variable x, which is bound to the frame whose manager slot is being described. It requires that any manager be not only a person but someone with baseball experience. It relies on

43

the domain-specific function baseball-experience, which must be, defined somewhere in the system.

The slots color and uniform-color illustrate the arrangement of slots in an isa hi-erarchy. The relation color is a fairly general one that holds between physical objects and colors. The attribute uniform-color is a restricted form of color that, applies only between team players and the color that are allowed for team uniforms (anything but pink). Arranging slots in a hierarchy is useful for the same reason that arranging anything else in a hierarchy is: it supports inheritance. In this example, the general slot coIor is known to have high visual salience. The more specific slot uniform-color then inherits this property, so it too is known to have high visual salience.

John

height: 72

Bill

height:

Figure 9.13 Representing Slot- Values

color(x, y) ∧∧∧∧top-level-part-of(z, x)�color(z, y)

In addition to these domain-independent slot attributes, slots may have domain-specific properties that support problem solving in a particular domain. Since these slots are not treated explicitly by the frame-system interpreter, they will be useful precisely to the extent that the domain problem solver exploits them. Slot- Values as Objects

In the last section, we reified the notion of a slot-by making it an explicit object

that we, could make assertions about. In some sense this was not necessary. A. finite relation can be completely described by listing its elements. But in practical knowledge-based systems one often does not have that list. So it can be very important to be able to make assertions about the list without knowing all of its elements. Reification gave us a way to do this.

The next step along this path is .to do the same thing to a particular attribute-value

(an instance of a relation) that we did to the relation itself. We can reify it and make it an object about which assertions can be made. To see why we might want to do this, let us return to the example of John and Bill's height that we discussed in semantic nets. Figure 9.13 shows a frame-based representation of some of the facts. We could easily record Bill's height if we knew it. Suppose, though, that we do not know it: All we know is that John is taller than Bill. We need a way to make an assertion about the value of a slot without knowing what that value is. To do that, we need to view the slot and its value as an object.

44

We could attempt to do this the same way we made slots themselves into objects, namely by representing them explicitly as frames. There seems little advantage to doing that in this case, though, because the main advantage of frames does not apply to slot values: frames are organized into an isa hierarchy and thus support inheritance. There is no basis for such an organization of slot values. So instead, we augment our value representation language to allow the value of a slot to be stated as either or both of:

• A value of the type required by the slot.

• A logical constraint on the value. This constraint may relate the slot's value to the values of other slots or to domain constants.

John

height: 72; λx (x.height > Bill.height) Bill

height: λx; (x.height < John.height) Figure 9.14: Representing Slot- Values with Lambda Notation

If we do this to the frames of Figure 9.13, then we get the frames of Figure 9.14. We again use the lambda notation as a way to pick up the name of the frame that is being described. Inheritance Revisited

To support flexible representations of knowledge about the world, it is necessary

to allow the hierarchy to be an arbitrary directed acyclic graph (DAG). We know that acyclic graphs are adequate because isa corresponds to the subset relation. Hierarchies that are not trees are called tangled hierarchies. Tangled hierarchies require a new inheritance algorithm. In the rest of this section, we discuss an algorithm for inheriting values for single-valued slots in a tangled hierarchy.

Consider the two examples shown in Figure 9.15 (in which we return to a network notation to make it easy to visualize the isa structure). In Figure 9.15(a), we want to decide whether Fifi can fly. The correct answer is no. Although birds in general can fly, the subset of birds, ostriches, does not. Although the class Pet-Bird provides a path from Ffji to Bird and thus to the answer that Fifi can fly, it provides no information that conflicts with the special case knowledge associated with the class Ostrich, so it should have no affect on the answer. To handle this case correctly, we need an algorithm for traversing the isa hierarchy that guarantees that specific knowledge will always dominate more general facts.

In Figure 9.15(b), we consider problem the problem namely determining whether Dick is a pacifist. Again, we must traverse multiple instance links, and more than one answer can be found along the paths. But in this case, there is no well-founded basis for choosing one answer over the other.

45

Possible basis for a new inheritance algorithm is path length. This can be implemented by executing a breadth-first search, starting with the frame for which a slot value is needed. Follow its instance links then follow isa links upward. If a path produces a value, it can be terminated.

This algorithm works for both of the examples in Figure 9.15. In (a), it finds a

value at Ostrich. It continues the other path to the same length (Pet-Bird), fails to find any other answers, and then halts. In the case of (b), it finds two competing answers at the same level, so it can report the contradiction.

But now consider the examples shown in Figure 9.16. In the case of (a), our new algorithm reaches Bird (via Pet-Bird) before it reaches Ostrich. So it reports that Fifi can fly. In the case of (b), the algorithm reaches Quaker and stops without noticing a contradiction. The problem is that path length does not always correspond to the level of generality of a class. Sometimes what it really corresponds to is the degree of elaboration

46

of classes in the knowledge base. If some regions of the knowledge base have been elaborated more fully than others, then their paths will tend to be longer. But this should not influence the result of inheritance if no new information about the desired attribute has been added.

The solution to this problem is to base our inheritance algorithm not on path

length but on the notion of inferential distance, which can he defined as follows: . Class 1, is closer to Class 2 than to Class 3 if and only if Class 1 has an inference path through Class 2 to Class 3 (in other words, Class 2, is between Class 1, and Class 3).

We can now define the result of inheritance as follows: The set of competing values for a slot S in a frame F contains all those values that

• Can be derived from some frame X that is above F in the isa hierarchy

47

• Are not contradicted by some frame Y that has a shorter inferential/distance to F than X does

Using this definition, let us return to our examples. For Figure 9.15(a), we had

two candidate classes from which to get an answer. But Ostrich has a shorter inferential distance to Fifi than Bird does, so we get the single answer no. For Figure 9.15(b), we get two answers, and neither is closer to Dick than the other, so we correctly identify a contradiction. For Figure 9.16(a), we get two answers, but again Ostrich has a shorter inferential distance to Fifi than Bird does. The significant thing about the way we have defined inferential distance is that as long as Ostrich is a subclass of Bird, it will be closer to all its instances than Bird is, no matter how many other classes are added to the system. For, Figure 9.l6(b), we again get two answers and again neither is closer to Dick than the other.

There are several ways that this definition can be implemented as an inheritance algorithm. Algorithm: Property Inheritance

To retrieve a value V for slot S of an instance F do:

1. Set CANDIDATES to empty 2. Do breadth-first or depth-first search up the isa hierarchy from F, following all

instance and isa links., At each step, see if a value for S or one of its generalizations is stored.

a) If a value is found, add it to CANDIDATES and terminate that branch of the search.

b) If no value is found, but there are instance or isa links upward, follow them.

c) Otherwise, terminate the branch.

3. For each element C of CANDIDATES do: a) See if there is any other clement of CANDIDATES that was derived from a

class closer to F than the class from which C came. b) If there is, then remove C from CANDIDATES.

4. Check the cardinality of CANDIDATES:

a) If it is 0, then report that no value was found. b) If it is 1 then return the single element of CANDIDATES as V.

c) If it is greater than I, report a contradiction.

This algorithm is guaranteed to terminate because the isa hierarchy is represented as an acyclic graph.

48

Frame Languages

The idea of a frame system as a way to represent declarative knowledge has been encapsulated in a series of frame oriented knowledge representation languages. Examples of such languages include KRL, FRL, RLL, KL-ONE, KRYPTON, NIKL, CYCL, THEO, and FRAMEKIT.

Benefits of Frames

• Makes programming easier by grouping related knowledge • Easily understood by non-developers • Expressive power • Easy to set up slots for new properties and relations • Easy to include default information and detect missing values

Drawbacks of Frames

• No standards (slot-filler values) • More of a general methodology than a specific representation:

– Frame for a class-room will be different for a professor and for a maintenance worker

• No associated reasoning/inference mechanisms

Slot

ML

Number

49

Documents

B.tech CS S8 Artificial Intelligence Notes Module 3