54
Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Embed Size (px)

Citation preview

Page 1: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Knowledge Acquisition and Modelling

Decision Tables and Decision Trees

Page 2: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Decision Trees Map of a reasoning process Describes in a tree like structure A graphical representation of a decision situation Decision situation points are connected together

by arcs and terminate in ovals Main components

Decision points represented by nodes Actions Particular choices from a decision point represented by

arcs (or straight lines)

Can be left to right Or top down

Page 3: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Decision Trees Essentially flowcharts

A natural order of ‘micro decisions’ (Boolean – yes/no decisions) to reach a conclusion

In simplest form all you need is A start A cascade of Boolean decisions (each with exactly

outbound branches) A set of decision nodes and representing all the ‘leaves’

of the decision tree

Page 4: Knowledge Acquisition and Modelling Decision Tables and Decision Trees
Page 5: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

An Example Bank - loan application Classify application : approved class, denied

class  Criteria - Target Class approved if 3 binary

attributes have certain value: (a) borrower has good credit history (credit rating in

excess of some threshold) (b) loan amount less than some percentage of collateral

value (e.g., 80% home value) (c) borrower has income to make payments on loan

Possible scenarios = 32 = 8 If the parameters for splitting the nodes can be adjusted,

the number of scenarios grows exponentially.

Page 6: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

How They Work Decision rules - partition sample of data Terminal node (leaf) indicates the class

assignment Tree partitions samples into mutually

exclusive groups One group for each terminal node  All paths

start at the root node end at a leaf (terminal node = decision node)

Page 7: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

How They Work Each path represents a decision rule

joining (AND) of all the tests along that path separate paths that result in the same class are

disjunctions (ORs) All paths - mutually exclusive

for any one case - only one path will be followed false decisions on the left branch true decisions on the right branch

Page 8: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

  Disjunctive Normal Form

Non-terminal node - model identifies an attribute to be tested test splits attribute into mutually exclusive disjoint

sets splitting continues until a node - one class

(terminal node or leaf)  Structure - disjunctive normal form

limits form of a rule to conjunctions (adding) of terms

allows disjunction (or-ing) over a set of rules

Page 9: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Predicting Commute Time

Leave At

Stall? Accident?

10 AM 9 AM8 AM

Long

Long

Short Medium Long

No Yes No Yes

If we leave at 10 AM and there are no cars stalled on the road, what will our commute time be?

Page 10: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Inductive Learning Making a series of Boolean decisions and

following the relevant branch: Did we leave at 10 AM? Did a car stall on the road? Is there an accident on the road?

Answering each yes/no question allows traversal of tree to reach a conclusion

Page 11: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Decision Tree as a Rule Set IF hour == 8am THEN commute time = long IF hour == 9am AND accident == yes THEN

commute time = long IF hour == 9am AND accident == no THEN

commute time = medium IF hour == 10am and stall == yes THENcommute time = long IF hour == 10am and stall == no THENcommute time = short

Page 12: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

How to Create a Decision Tree Make a list of attributes that we can measure

These attributes (for now) must be discrete We then choose a target attribute that we

want to predict Then create an experience table that lists

what we have seen in the past

Page 13: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Sample Experience Table

Example Attributes Target

  Hour Weather Accident Stall Commute

D1 8 AM Sunny No No Long

D2 8 AM Cloudy No Yes Long

D3 10 AM Sunny No No Short

D4 9 AM Rainy Yes No Long

D5 9 AM Sunny Yes Yes Long

D6 10 AM Sunny No No Short

D7 10 AM Cloudy No No Short

D8 9 AM Rainy No No Medium

D9 9 AM Sunny Yes No Long

D10 10 AM Cloudy Yes Yes Long

D11 10 AM Rainy No No Short

D12 8 AM Cloudy Yes No Long

D13 9 AM Sunny No No Medium

Page 14: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Choosing Attributes The previous experience decision table

showed 4 attributes: hour, weather, accident and stall

But the decision tree only showed 3 attributes: hour, accident and stall

Why is that?

Page 15: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Choosing Attributes Methods for selecting attributes show that

weather is not a discriminating attribute We use the principle of Occam’s Razor: Given

a number of competing hypotheses, the simplest one is preferable

Page 16: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Decision Tree Algorithms The basic idea behind any decision tree

algorithm is as follows: Choose the best attribute(s) to split the

remaining instances and make that attribute a decision node

Repeat this process for recursively for each child Stop when:

All the instances have the same target attribute value There are no more attributes There are no more instances

Page 17: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Identifying the Best Attributes

Refer back to our original decision tree

Leave At

Stall? Accident?

10 AM 9 AM8 AM

Long

Long

Short MediumNo Yes No Yes

Long

How did we know to split on leave at and then on stall and accident and not weather?

Page 18: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

ID3 Heuristic To determine the best attribute, we look at the

ID3 heuristic ID3 splits attributes based on their entropy. Entropy is the measure of disinformation Entropy is minimized when all values of the

target attribute are the same. If we know that commute time will always be

short, then entropy = 0 Entropy is maximized when there is an equal

chance of all values for the target attribute (i.e. the result is random) If commute time = short in 3 instances, medium in

3 instances and long in 3 instances, entropy is maximized

Page 19: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Entropy Calculation of entropy

Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|) S = set of examples Si = subset of S with value vi under the target attribute l = size of the range of the target attribute

Page 20: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

ID3 ID3 splits on attributes with the lowest

entropy We calculate the entropy for all values of an

attribute as the weighted sum of subset entropies as follows: ∑(i = 1 to k) |Si|/|S| Entropy(Si), where k is the

range of the attribute we are testing We can also measure information gain (which

is inversely proportional to entropy) as follows: Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si)

Page 21: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

ID3

Given our commute time sample set, we can calculate the entropy of each attribute at the root node

Attribute Expected Entropy Information Gain

Hour 0.6511 0.768449

Weather 1.28884 0.130719

Accident 0.92307 0.496479

Stall 1.17071 0.248842

Page 22: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Problems with ID3 ID3 is not optimal

Uses expected entropy reduction, not actual reduction

Must use discrete (or discretized) attributes What if we left for work at 9:30 AM? We could break down the attributes into smaller

values…

Page 23: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Problems with ID3 If we broke down leave time to the minute, we

might get something like this:

8:02 AM 10:02 AM8:03 AM 9:09 AM9:05 AM 9:07 AM

Long Medium Short Long Long Short

Since entropy is very low for each branch, we have n branches with n leaves. This would not be helpful for predictive modeling.

Page 24: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Problems with ID3 Can use a technique known as

discretization choose cut points, such as 9AM for splitting

continuous attributes generally lie in a subset of boundary points,

such that a boundary point is where two adjacent instances in a sorted list have different target value attributes

Page 25: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Pruning (another technique for attribute selection) Pre-Pruning

Decide during the building process when to stop adding attributes (possibly based on their information gain)

May be problematic Individually attributes may not contribute much to a

decision But what about in combination with other attributes

may have a significant impact

Post-Pruning waits until the full decision tree has built and then

prunes the attributes Subtree Replacement Subtree Raising

Page 26: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Subtree Replacement

Entire subtree is replaced by a single leaf node

A

B

C

1 2 3

4 5

Page 27: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Subtree Replacement

Node 6 replaced the subtree Generalizes tree a little more, but may

increase accuracy

A

B

6 4 5

Page 28: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Subtree Raising

Entire subtree is raised onto another node

A

B

C

1 2 3

4 5

Page 29: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Subtree Raising

Entire subtree is raised onto another node

This was not discussed in detail as it is not clear whether this is really worthwhile (as it is very time consuming) A

C

1 2 3

Page 30: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Problems with Decision Trees While decision trees classify quickly, the time

for building a tree may be higher than another type of classifier

Decision trees suffer from a problem of errors propagating throughout a tree A very serious problem as the number of classes

increases

Page 31: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Error Propagation Since decision trees work by a series of local

decisions, what happens when one of these local decisions is wrong? Every decision from that point on may be wrong We may never return to the correct path of the

tree

Page 32: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Decision tree representation of salary decision

Modern Systems Analysisand DesignFourth Edition Jeffrey A. Hoffer , Joey F. George, Joseph S. Valacich

Page 33: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Decision Tables Used to lay out in tabular form all possible

situations which a decision may encounter and to specify which action to take in each of these situations.

A matrix representation of the logic of a decision

Specifies the possible conditions and the resulting actions

Page 34: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Terminology Decision Table

A decision table is a tabular form that presents a set of conditions and their corresponding actions.

Condition Stubs Condition stubs describe the conditions or factors that will affect the

decision or policy. They are listed in the upper section of the decision table.

Action Stubs Action stubs describe, in the form of statements, the possible policy

actions or decisions. They are listed in the lower section of the decision table.

Rules Rules describe which actions are to be taken under a specific

combination of conditions. They are specified by first inserting different combinations of

condition attribute values and then putting X's in the appropriate columns of the action section of the table.

Page 35: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Example

Modern Systems Analysisand DesignFourth Edition Jeffrey A. Hoffer , Joey F. George, Joseph S. Valacich

Page 36: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Decision Table Methodology 1. Identify Conditions & Values

Find the data attribute each condition tests and all of the attribute's values.

2. Compute Max Number of Rules Multiply the number of values for each condition data attribute

by each other. 3. Identify Possible Actions

Determine each independent action to be taken for the decision or policy.

4. Enter All Possible Rules Fill in the values of the condition data attributes in each

numbered rule column. 5. Define Actions for each Rule

For each rule, mark the appropriate actions with an X in the decision table.

6. Verify the Policy Review completed decision table with end-users.

7. Simplify the Table Eliminate and/or consolidate rules to reduce the number of

columns.

Page 37: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

A Simple Example Scenario: A marketing company wishes to construct

a decision table to decide how to treat clients according to three characteristics: Gender, City Dweller, and age group: A (under 30), B (between 30 and 60), C (over

60). The company has four products (W, X, Y and Z) to

test market. Product W will appeal to male city dwellers. Product X will appeal to young males. Product Y will appeal to Female middle aged shoppers

who do not live in cities. Product Z will appeal to all but older males.

Page 38: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Identify Conditions & Values The three data attributes tested by the

conditions in this problem are gender, with values M and F; city dweller, with value Y and N; and age group, with values A, B, and C

as stated in the problem.

Page 39: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

2. Compute Maximum Number of Rules The maximum number of rules is 2 x 2 x 3 =

12

Page 40: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

3. Identify Possible Actions The four actions are:

market product W, market product X, market product Y, market product Z.

Page 41: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

4. Enter All Possible ConditionsRULES

1 2 3 4 5 6 7 8 9 10 11 12

Sex m f m f m f m f m f m f

City y y n n y y n n y Y n n

Age a a a a b b b b c c c c

Page 42: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

5. Define Actions for each Rule

Actions

Market 1 2 3 4 5 6 7 8 9 10 11 12

W X X X

X X X

Y X

Z X X X X X X X X X X

Page 43: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Full table

RULES1 2 3 4 5 6 7 8 9 10 11 12

Sex m f m f m f m f m f m f

City y y n n y y n n y Y n n

Age a a a a b b b b c c c c

ActionsMarket 1 2 3 4 5 6 7 8 9 10 11 12

W X X X

X X X

Y X

Z X X X X X X X X X X

Page 44: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

6. Verify the Policy Let us assume that the client agreed with our

decision table.

Page 45: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

7. Simplify the Table There appear to be no impossible rules. Note that rules 2, 4, 6, 7, 10, 12 have the same action

pattern. Rules 2, 6 and 10 have

two of the three condition values (gender and city dweller) identical and

all three of the values of the non- identical value (age) are covered,

so they can be condensed into a single column 2. The rules 4 and 12 have identical action pattern, but

they cannot be combined because the indifferent attribute "Age" does not have all its values covered in these two columns.

Age group B is missing.

Page 46: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

The revised table is as follows:

RULES1 2 3 4 5 6 7 8 9 10

Sex M F M F M M F M M F

City Y Y N N Y N N Y N N

Age A - A A B B B C C C

ActionsMarket 1 2 3 4 5 6 7 8 9 10

W X X X

X X X

Y X

Z X X X X X X X X

Page 47: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Step 1. Identify Conditions & Values We first examine the problem and identify the

data attributes upon which the decision or policy depends.

We then list the possible values of each data attribute. Often, answering the question: "What do I need to

know in order to take action in this situation?" will help identify the appropriate condition attributes.

Page 48: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Step 2. Compute Maximum Number of Rules A rule is determined by a different combination of

the condition attributes values. Since we have listed these values in the previous step,

the multiplication rule of counting tells us that there will be no more columns than the product of the number of values for each of the condition attributes.

This can be easily verified by constructing a tree diagram listing all possible values of each attribute for each branch of the preceding attribute.

The number of leaves of the tree will be the product described above.

Since some combinations of attribute values may be impossible, the actual number of rules may be less that the maximum.

Page 49: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Step 3. Identify Possible Actions The actions describe the decisions to be made

or the policy rules to be followed. Asking the question, "What are the different

options for implementing the decision or policy?", should help identify the possible actions.

Page 50: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Step 4. Enter All Possible Rules We now begin to build the decision table by listing

the condition descriptions in the left margin of the upper part of the table and

the action descriptions in the left margin of the lower part. Then we write consecutive numbers from 1 to the maximum

number of rules across the top. In the rule columns and the condition rows, we list all

possible combinations of condition attribute values. A rule of thumb for arranging the rule combinations is to

alternate the possible values for the first condition, then repeat each value of the second condition as many times as there are values in the first condition, repeat each value of the third condition as many times as needed to cover one iteration of the second condition values, etc.

Page 51: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Step 5. Define Actions for each Rule In this step we decide which actions are

appropriate for each combination of condition attribute values and mark an X in that column of the action row.

This should be fairly straightforward if the decision making procedure is well defined.

If it is not well defined then the organization of the decision table makes it easier to get the end-user to specify the action(s) for each rule.

Page 52: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

step 6. Verify the Policy Review the completed decision table with the

end-users. Resolve any rules for which the actions are

not specific. Verify that rules you think are impossible or

cannot in actuality occur. Resolve apparent contradictions, such as one

rule with two contradictory actions. Finally, verify that each rule's actions are

correct.

Page 53: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Step 7. Simplify the Decision Table In this step we look for and eliminate impossible rules, and also

combine rules with indifferent conditions. An indifferent condition is one whose values do not affect the decision and always result in the same action.

Impossible rules are those in which the given combination of condition attribute values cannot occur according to the specifications of the problem. (E.G. if we assumed for marketing purposes that all middle-aged men

lived in the city). To determine indifferent conditions, first look for rules with exactly the

same actions. From these, find those whose condition values are the same except for

one and only one condition (called the indifferent condition). This latter set of rules has the potential for being collapsed into a

single rule with the indifferent condition value replaced with a dash. Note that all possible values of the indifferent condition must be

present among the rules to be combined before they can be collapsed.

Page 54: Knowledge Acquisition and Modelling Decision Tables and Decision Trees

Exercise A student may receive a final course grade of A, B, C, D, or F. In deriving the

student's final course grade, the instructor first determines an initial or tentative grade for the student, which is determined in the following manner for a student who has: received a total of no lower than 90 percent on the first 3 assignments and received a

score no lower than 70 percent on the 4th assignment will receive an initial grade of A for the course.

scored a total lower than 90 percent but no lower than 80 percent on the first 3 assignments and received a score no lower 70 percent on the 4th assignment will receive an initial grade of B for the course.

received a total lower than 80 percent but no lower than 70 percent on the first 3 assignments and received a score no lower than 70 percent on the 4th assignment will receive an initial grade of C for the course.

scored a total lower than 70 percent but no lower than 60 percent on the first 3 assignments and received a score no lower 70 percent on the 4th assignment will receive an initial grade of D for the course.

a total lower than 60 percent on the first 3 assignments, or received a score lower than 70 percent on the 4th assignment, will receive an initial grade of F for the course.

Once the instructor has determined the initial course grade for the student, the final course grade will be determined.

The student's final course grade will be the same as his or her initial course grade if no more than 3 class periods during the semester were missed.

Otherwise, the student's final course grade will be one letter grade lower than his or her initial course grade (for example, an A will become a B).