Expert Systems. What are Expert Systems? “An expert systems is a computer system that operates by applying an inference mechanism to a body of specialist

Expert SystemsExpert Systems

What are Expert Systems?What are Expert Systems?

“ “An expert systems is a computer system that An expert systems is a computer system that operates by applying an inference mechanism to operates by applying an inference mechanism to a body of specialist expertise represented in the a body of specialist expertise represented in the form of knowledge that manipulates this form of knowledge that manipulates this knowledge to perform efficient and effective knowledge to perform efficient and effective problem solving in a narrow problem domain.”problem solving in a narrow problem domain.”

Emphasis is on Knowledge not Emphasis is on Knowledge not MethodsMethods

1. Most difficult and interesting problems do not have 1. Most difficult and interesting problems do not have tractable algorithmic solutionstractable algorithmic solutions

2. Human experts achieve outstanding performance 2. Human experts achieve outstanding performance because they are knowledgeablebecause they are knowledgeable

3. Knowledge is a scarce (and therefore, valuable) 3. Knowledge is a scarce (and therefore, valuable) resourceresource

It is better to call these systems:It is better to call these systems:

Knowledge-Based SystemsKnowledge-Based Systems

• Knowledge consists of descriptions, Knowledge consists of descriptions, relationships, and procedures in some domainrelationships, and procedures in some domain

• Knowledge takes many forms and is often hard Knowledge takes many forms and is often hard to categorizeto categorize

Fundamental ConceptsFundamental Concepts

Changing Focus in AIChanging Focus in AI

HIGH

Program

Power

LOW1960 1970 1980 Time Frame

Find general methods for problem-solvingand use them to create general-purposeprograms


HIGH

Program

Power


Find general methods to improve representation and search and use them to create specialized programs


HIGH

Program

Power


Use extensive, high-quality, specific knowledge about some narrow problem area to create very specialized programs

Heuristics vs. AlgorithmsHeuristics vs. AlgorithmsIt is great if we have algorithms but often a heuristic It is great if we have algorithms but often a heuristic will work almost as well at much less costwill work almost as well at much less cost

Prevent Skyjacking

Algorithm Heuristic

Why should we not use human Why should we not use human expertise?expertise?

Human ExpertiseHuman Expertise Artificial ExpertiseArtificial Expertise

PerishablePerishable PermanentPermanent

Difficult to transferDifficult to transfer Easy to transferEasy to transfer

Difficult to documentDifficult to document Easy to documentEasy to document

UnpredictableUnpredictable Consistent Consistent

ExpensiveExpensive Affordable Affordable

Why should we keep using Why should we keep using humans?humans?

Human ExpertiseHuman Expertise Artificial ExpertiseArtificial Expertise

CreativeCreative Uninspired Uninspired

AdaptiveAdaptive Needs to be told Needs to be told

Sensory experienceSensory experience Symbolic input Symbolic input

Broad focusBroad focus Narrow focus Narrow focus

Commonsense knowledgeCommonsense knowledge Technical knowledge Technical knowledge• knows certain things are true

• while others are not

• knows limits of knowledge

Knowledge EngineeringKnowledge Engineering

The process of building an expert systemThe process of building an expert system

Expert System

Domain Expert

Knowledge Engineer

Views of an Expert System: Views of an Expert System: End-userEnd-user

User UserInterface

IntelligentProgram

Data Base

Views of an Expert System: Views of an Expert System: Knowledge EngineerKnowledge Engineer

IntelligentProgram

KnowledgeBase

InferenceEngine

Rules, Semantic Networks, Frames, and Facts

General Problem Solving Knowledge:

Forms of InferenceForms of Inference

• Process of drawing conclusions based on facts Process of drawing conclusions based on facts known or thought to be trueknown or thought to be true

• We commonly use three different type:We commonly use three different type:

DeductionDeduction

AbductionAbduction

InductionInduction

DeductionDeduction

Reasoning from a known principle to an unknown, from Reasoning from a known principle to an unknown, from the general to the specific, or from a premise to a logical the general to the specific, or from a premise to a logical conclusionconclusion

Modus PonensModus Ponens

Modus TolensModus Tolens

> rules, theorems, models> rules, theorems, models

Deduction - ExampleDeduction - ExampleSuppose that we know:Suppose that we know:

X, swimming(X) -> wet(X)X, swimming(X) -> wet(X)

If we are now told:If we are now told:

swimming(andy)swimming(andy)

Then we can derive (using Modus Ponens):Then we can derive (using Modus Ponens):

wet(andy)wet(andy)

Used when generating explanationsUsed when generating explanations

Is an Is an unsound form of reasoning form of reasoning

If we know:If we know:

Can we state:Can we state:

AbductionAbduction

X, swimming(X) -> wet(X)

wet(alex)

swimming(alex) ?

Reasoning from particular facts or individual cases to a Reasoning from particular facts or individual cases to a general conclusiongeneral conclusion

This is the basis of scientific discoveryThis is the basis of scientific discovery

Key technique in machine learning and knowledge Key technique in machine learning and knowledge acquisitionacquisition

IFIFTHENTHEN

> generalization, observation > generalization, observation

InductionInduction

Example - Family Example - Family RelationshipsRelationships

child_of(alex, nicole).

child_of(alina, nicole).

child_of(nicholas, leah).

child_of(phillip, leah).

child_of(melanie, cathy).

child_of(leslie, cathy).

child_of(sarah, cathy).

child_of(angela, cathy).

male(alex).

male(phillip).

male(nicholas).

female(alina).

female(leah).

female(nicole).

female(angela).

female(sarah).

female(leslie).

female(melanie).

female(cathy).

Basic domain facts:

Example - Family RelationshipsExample - Family Relationships (cont.)(cont.)

sisters(nicole, leah).

sisters(X, Z) :- child_of(X, Y),

child_of(Z, Y),

female(X),

female(Z).

brothers(X, Z) :- child_of(X, Y),

child_of(Z, Y),

male(X),

male(Z).

Rules:

Machine LearningMachine Learning(Induction from Examples)(Induction from Examples)

What is learning?What is learning?

• ““changes in a system that enable a system changes in a system that enable a system to do the same task more efficiently the to do the same task more efficiently the next time” -- Herbert Simonnext time” -- Herbert Simon

• ““constructing or modifying representations constructing or modifying representations of what is being experienced” -- Ryszard of what is being experienced” -- Ryszard MichalskiMichalski

• ““making useful changes in our minds” -- making useful changes in our minds” -- Marvin MinskyMarvin Minsky

What is learning?What is learning?

• Shorter Oxford Dictionary defines learning Shorter Oxford Dictionary defines learning as:as: … … to get knowledge of (a subject) or skill (in to get knowledge of (a subject) or skill (in

art, etc) by study, experience or teaching. art, etc) by study, experience or teaching. Also to commit to memory …Also to commit to memory …

• so learning involvesso learning involves acquiring NEW knowledgeacquiring NEW knowledge

improving the use of EXISTING knowledge improving the use of EXISTING knowledge • i.e., performancei.e., performance

Why learn?Why learn?

• understand and improve human learningunderstand and improve human learning learn to teachlearn to teach

• discover new thingsdiscover new things data miningdata mining

• fill in skeletal information about a domainfill in skeletal information about a domain incorporate new information in real timeincorporate new information in real time

make systems less “finicky” or “brittle” by make systems less “finicky” or “brittle” by making them better able to generalizemaking them better able to generalize

Why learn?Why learn?• learning is considered to be a KEY element of AIlearning is considered to be a KEY element of AI

• any autonomous system MUST be able to learn any autonomous system MUST be able to learn and adaptand adapt

• sometimes it is easier to sometimes it is easier to `teach’`teach’ or or `explain’`explain’ than to than to `program’`program’ e.g., consider the difference in explaininge.g., consider the difference in explaining tic-tac-toe and tic-tac-toe and

writing a program to play the gamewriting a program to play the game e.g., consider the difference in using a few example pictures e.g., consider the difference in using a few example pictures

to explain the difference between a lion and a tiger, and to explain the difference between a lion and a tiger, and getting a computer to do likewisegetting a computer to do likewise

• any system that makes the same mistake twice any system that makes the same mistake twice is pretty STUPIDis pretty STUPID all systems (e.g., o/s, database) should have some integral all systems (e.g., o/s, database) should have some integral

learning componentlearning component

State of the ArtState of the Art

• modest achievementsmodest achievements

• mostly isolated solutions to datemostly isolated solutions to date

• but canbut can assist automatic knowledge acquisitionassist automatic knowledge acquisition extract relevant knowledge from very large knowledge extract relevant knowledge from very large knowledge

basesbases abstract higher-level concepts out of data setsabstract higher-level concepts out of data sets … … etc.etc.

• recent trend to integrated systemsrecent trend to integrated systems combine various learning methodscombine various learning methods

• induction, deduction, analogy, abductioninduction, deduction, analogy, abduction• symbolic ML, neural networks, genetic algorithms, ...symbolic ML, neural networks, genetic algorithms, ...

Components of a learning Components of a learning systemsystem

Evaluating PerformanceEvaluating Performance

• several possible criteriaseveral possible criteria predictive accuracy of classifierpredictive accuracy of classifier

speed of learnerspeed of learner

speed of classifierspeed of classifier

space requirementsspace requirements

• Most common criterion is Predictive AccuracyMost common criterion is Predictive Accuracy

Symbolic vs. NumericSymbolic vs. Numeric

• ML has traditionally concerned itself with ML has traditionally concerned itself with symbolicsymbolic representationsrepresentations e.g., e.g., [color = orange][color = orange] rather than rather than [wavelength = 600nm][wavelength = 600nm]

• concepts are inherently symbolicconcepts are inherently symbolic

• required for human understanding and recognitionrequired for human understanding and recognition we think in linguistic terms (i.e., symbols) and not in we think in linguistic terms (i.e., symbols) and not in

numbersnumbers

e.g., e.g., bird := has-wings & flies & has-beak & lays-eggs & ...bird := has-wings & flies & has-beak & lays-eggs & ...

• the relationship between symbolic & numerical the relationship between symbolic & numerical representations is still an open debaterepresentations is still an open debate

Learning as SearchLearning as Search• to learn a concept description, need to search through to learn a concept description, need to search through

a `hypothesis’ spacea `hypothesis’ space the space of possible concept descriptionsthe space of possible concept descriptions

• need a language to describe the conceptsneed a language to describe the concepts the choice of language defines a large (possibly infinite) the choice of language defines a large (possibly infinite)

set of potential concept descriptions (i.e., rules)set of potential concept descriptions (i.e., rules)

• the task of the learning algorithm is to search this the task of the learning algorithm is to search this space in an efficient mannerspace in an efficient manner

• the difficulty is how to ignore the vast majority of the difficulty is how to ignore the vast majority of invalid descriptions without missing the useful one(s)invalid descriptions without missing the useful one(s)

• usually requires heuristic methods to prune the searchusually requires heuristic methods to prune the search

SummarySummary

• Decision Trees are widely usedDecision Trees are widely used easy to understand rationaleeasy to understand rationale

can out-perform humanscan out-perform humans

fast, simple to implementfast, simple to implement

handles noisy data wellhandles noisy data well

• WeaknessesWeaknesses univariate (uses only 1 variable at a time)univariate (uses only 1 variable at a time)

batch (non-incremental)batch (non-incremental)

Induction systems

The power behind an intelligent system isknowledge.

We can trace the system success or failure to the quality of its knowledge.

Difficult task:

1. Extracting the knowledge.

2. Encoding the knowledge.

3. Inability to express the knowledge formally.

Induction

inducing general rules from knowledge contained in a finite set of examples.

Induction is the process of reasoning from a given set of facts to conclude general principles or rules.

Induction looks for patterns in available information to infer reasonable conclusions.

Induction as search

Induction can be viewed as a search through a problem space for a solution to a problem. The problem space is composed of the problem’s major concepts linked together by an inductive process that uses examples of the problem.

The choice of representation for the desired function is probably the most important issue. As well as affecting the nature of the algorithm, it can affect whether the problem is feasible at all. Is the desired function representable in the representation language?

An example is described by the values of the attributes and the value of the goal predicate. We call the value of the goal predicate the classification of the example. The complete set of examples is called the training set.

Induction

Induction - first example

Determine an appropriate gift on the basis of available money and the person’s age. Money and age will represent our decision factors (problem attributes).

Money Age Gift

Much Adult Car

Much Child Computer

Little Adult Toaster

Little Child Calculator

Money

Money

Money

Age Age

Age

Induction - decision trees

A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no “decision.” Decision trees therefore represent Boolean functions.

Each internal node in the tree corresponds to a test of the value of one of the properties, and the branches from the node are labeled with the possible values of the test. Each leaf node in the tree specifies the Boolean value to be returned if that leaf is reached.

Decision trees are implicitly limited to talking about a single object. That is, the decision tree language is essentially propositional, with each attribute test being a proposition. We cannot use decision trees to represent tests that refer to two or more different objects.

Decision trees are fully expressive within the class of propositional languages, that is, any Boolean function can be written as a decision tree. Have each row in the truth table for the function correspond to a path in the tree. The truth table is exponentially large in the number of attributes.

Induction - decision trees

Supervised Concept Learning

• given a training set of positive and negative examples of a concept– construct a description that will accurately classify

future examples.– Learn some good estimate of function f given a

training set:{ (x1,y1), (x2,y2), . . . (xn,yn)}where each yi is either + (positive) or - (negative)

• inductive learning generalizes from specific facts– cannot be proven true, but can be proven false

• falsity preserving– is like searching an Hypothesis Space H of possible f

functions– bias allows us to pick which h is preferable– need to define a metric for comparing f functions to

find the best

Inductive learning framework

• raw input is a feature vector, x, that describes the relevant attributes of an example

• each x is a list of n (attribute, value) pairs– x = (person=Sue, major=CS, age=Young, Gender=F)

• attributes have discrete values– all examples have all attributes.

• each example is a point in n-dimensional feature space

• maintain a library of previous cases• when a new problem arises

– find the most similar case(s) in the library– adapt the similar cases to solving the current

problem

Learning Decision Trees

• Goal: Build a decision tree for classifying examples as positive or negative instances of a concept

• Supervised– batch processing of training examples– using a preference bias

Induction - decision trees - second example

•If there are some positive and some negative examples, then choose the best attribute to split them.

•If all the remaining examples are positive (or all negative), then we are done: we can answer Yes or No.

•If there are no examples left, it means that no such example has been observed, and we return a default value calculated from the majority classification at the node’s parent.

•If there are no attributes left, but both positive and negative examples, we have a problem. It means that these examples have exactly the same description, but different classifications. This happens when some of the data are incorrect; we say there is noise in the data. It also happens when the attributes do not give enough information to fully describe the situation, or when the domain is truly nondeterministic.

Induction - decision trees - second example

Induction - decision trees - choice of attributes

Information theory

Mathematical model for choosing the best attribute and at methods for dealing with noise in the data.

The scheme used in decision tree learning for selecting attributes is designed to minimize the depth of the final tree. The idea is to pick the attribute that goes as far as possible toward providing an exact classification of the examples. A perfect attribute divides the examples into sets that are all positive or all negative.

The measure should have its maximum value when the attribute is perfect and its minimum value when the attribute is of no use at all.

Induction - third example

Example Height Eyes Hair Class

E1 tall blue dark 1

E2 short blue dark 1

E3 tall blue blond 2

E4 tall blue red 2

E5 tall brown blond 1

E6 short blue blond 2

E7 short brown blond 1

E8 tall brown dark 1

Induction - exampleExample Height Hair Eyes

Class

E1 tall dark blue 1

E2 short dark blue 1

E3 tall blond blue 2

E4 tall red blue 2

E5 tall blond brown 1

E6 short blond blue 2

E7 short blond brown 1

E8 tall dark brown 1Information in the 8 examples:

Entropy(I) = NN

NN i

c

i

i2

1log

,

Ni – number of examples in class iN – total number of examples (training)

Entropy(I) = -5/8 log2 (5/8) – 3/8 log2 (3/8) =0.954 bit

A t t r i b u t e t e s t :

S e l e c t a n a t t r i b u t e a n d c a l c u l a t e t h e i n f o r m a t i o n g a i n( e n t r o p y ) f o r i t .

E n t r o p y ( I , A K ) = ),( , JK

J

j

kj AIentropyN

n

1 =

=

J

j

c

i kj

kj

kj

kjkj

n

in

n

in

N

n

1 12

)(log

)(

a t t r i b u t e A K , k = 1 , 2 , … , K

T h e e x a m p l e s a r e d i v i d e d i n t o J s u b s e t s , w h e r e J i s t h en u m b e r o f v a l u e s t h e f e a t u r e m a y t a k e .

n k j ( i ) - e x a m p l e s f r o m s u b s e t j b e l o n g i n g t o c l a s s i

n k j – t o t a l n u m b e r o f e x a m p l e s i n s u b s e t j


Class

E1 tall dark blue 1



E4 tall red blue 2




E8 tall dark brown 1

entropy(I, hair) =

3

0

3

0

3

3

3

3

8

322 loglog

1

1

1

1

1

0

1

0

8

122 loglog

4

2

4

2

4

2

4

2

8

422 loglog =0.5 bit


Class

E1 tall dark blue 1



E4 tall red blue 2






Class

E1 tall dark blue 1



E4 tall red blue 2





Induction - example

Information gain max{entropy (I) – entropy (I, AK)

IG (hair) = 0.954 – 0.5 = 0.454 bitIG (height) = 0.954 – 0.951 = 0.003 bitIG (eyes) = 0.954 – 0.607 = 0.347 bit

hair

E4 – class 2E1 – class 1E2 – class 1E8 – class 1


reddark

blond


Class





e n t r o p y ( I , h e i g h t ) =

2

1

2

1

2

1

2

1

4

222 loglog

2

1

2

1

2

1

2

1

4

222 loglog = 0 . 3 0 2

e n t r o p y ( I , e y e s ) =

2

2

2

2

2

0

2

0

4

222 loglog

2

0

2

0

2

2

2

2

4

222 loglog = 0

Example Height Hair EyesClass





Induction - example

e n t r o p y ( I , h e i g h t ) =

2

1

2

1

2

1

2

1

4

222 loglog

2

1

2

1

2

1

2

1

4

222 loglog = 0 . 3 0 2

e n t r o p y ( I , e y e s ) =

2

2

2

2

2

0

2

0

4

222 loglog

2

0

2

0

2

2

2

2

4

222 loglog = 0

hair

E4 – class 2

E1 – class 1E2 – class 1E8 – class 1


red

dark

blond

E3 – class 2E6 – class 2

E5 – class 1E7 – class 1

blue

brown

Eyes

Induction systems

Determine objective - a search through a decision tree will reach one of a finite set of decisions on the basis of the path taken through the tree.

Determine decision factors - represent the attribute nodes of the decision tree.

Determine decision factor values - represent the attribute values of the decision tree.

Determine solutions - list of final decisions that the system can make - the leaf nodes in the tree.

Form example set.

Create decision tree.

Test the system.

Revise the system.

Induction systems - example

Football game prediction system

Predict the outcome of a football game (will our team win or lose).

Decision factors - location, weather, team record, opponent record.

Decision factor values -

Location Weather Own Record Opponent Record

Home Rain Poor PoorAway Cold Average Average

Moderate Good GoodHot

Solutions - win or lose

Induction systems - example (cont’d)

Examples - Week Locat. Weath Own r Opp. r Own

1 Home Hot Good Good Win

2 Home Rain Good Averg Win

3 Away Moder. Good Averg Loss

4 Away Hot Good Poor Win

5 Home Cold Good Good Loss

6 Away Hot Averg. Averg. Loss

7 Home Moder. Averg. Good Loss

8 Away Cold Poor Averg. Win

Induction systems - example (cont’d)

Decision tree - Weather

Own rec

Location

rain

Win

moderate

Loss

cold

hot

home awa

y

poor good

average

Win WinLoss

Loss

No-data

Test the system - predict the future games. Get the values for the decision factors for the upcoming game and see on which team to bet.

Induction systems - example (test)

Sensitivity study - Location

Induction systems - pros. and cons.

Discovers rules from examples - potential unknown rules could be induced.

Avoids knowledge elicitation problems - system knowledge can be acquired through past examples.

Can produce new knowledge.

Can uncover critical decision factors.

Can eliminate irrelevant decision factors.

Can uncover contradictions.

Difficult to choose good decision factors.

Difficult to understand rules.

Applicable only for classification problems.

Induction systems - implemented

AQ11 - diagnosing soybean diseases. Identifies 15 different diseases. The knowledge was derived from 630 examples and used 35 decision rules.

Willard - forecasting thunderstorms. 140 examples, hierarchy of 30 modules, each with a decision tree.

Rulemaster - detecting signs of transformer faults.

Stock market predictions.

Documents

Expert Systems. What are Expert Systems? “An expert systems is a computer system that operates by applying an inference mechanism to a body of specialist