Upload
nyasia-buzzard
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
Expert SystemsExpert Systems
What are Expert Systems?What are Expert Systems?
“ “An expert systems is a computer system that An expert systems is a computer system that operates by applying an inference mechanism to operates by applying an inference mechanism to a body of specialist expertise represented in the a body of specialist expertise represented in the form of knowledge that manipulates this form of knowledge that manipulates this knowledge to perform efficient and effective knowledge to perform efficient and effective problem solving in a narrow problem domain.”problem solving in a narrow problem domain.”
Emphasis is on Knowledge not Emphasis is on Knowledge not MethodsMethods
1. Most difficult and interesting problems do not have 1. Most difficult and interesting problems do not have tractable algorithmic solutionstractable algorithmic solutions
2. Human experts achieve outstanding performance 2. Human experts achieve outstanding performance because they are knowledgeablebecause they are knowledgeable
3. Knowledge is a scarce (and therefore, valuable) 3. Knowledge is a scarce (and therefore, valuable) resourceresource
It is better to call these systems:It is better to call these systems:
Knowledge-Based SystemsKnowledge-Based Systems
• Knowledge consists of descriptions, Knowledge consists of descriptions, relationships, and procedures in some domainrelationships, and procedures in some domain
• Knowledge takes many forms and is often hard Knowledge takes many forms and is often hard to categorizeto categorize
Fundamental ConceptsFundamental Concepts
Changing Focus in AIChanging Focus in AI
HIGH
Program
Power
LOW1960 1970 1980 Time Frame
Find general methods for problem-solvingand use them to create general-purposeprograms
Changing Focus in AIChanging Focus in AI
HIGH
Program
Power
LOW1960 1970 1980 Time Frame
Find general methods to improve representation and search and use them to create specialized programs
Changing Focus in AIChanging Focus in AI
HIGH
Program
Power
LOW1960 1970 1980 Time Frame
Use extensive, high-quality, specific knowledge about some narrow problem area to create very specialized programs
Heuristics vs. AlgorithmsHeuristics vs. AlgorithmsIt is great if we have algorithms but often a heuristic It is great if we have algorithms but often a heuristic will work almost as well at much less costwill work almost as well at much less cost
Prevent Skyjacking
Algorithm Heuristic
Why should we not use human Why should we not use human expertise?expertise?
Human ExpertiseHuman Expertise Artificial ExpertiseArtificial Expertise
PerishablePerishable PermanentPermanent
Difficult to transferDifficult to transfer Easy to transferEasy to transfer
Difficult to documentDifficult to document Easy to documentEasy to document
UnpredictableUnpredictable Consistent Consistent
ExpensiveExpensive Affordable Affordable
Why should we keep using Why should we keep using humans?humans?
Human ExpertiseHuman Expertise Artificial ExpertiseArtificial Expertise
CreativeCreative Uninspired Uninspired
AdaptiveAdaptive Needs to be told Needs to be told
Sensory experienceSensory experience Symbolic input Symbolic input
Broad focusBroad focus Narrow focus Narrow focus
Commonsense knowledgeCommonsense knowledge Technical knowledge Technical knowledge• knows certain things are true
• while others are not
• knows limits of knowledge
Knowledge EngineeringKnowledge Engineering
The process of building an expert systemThe process of building an expert system
Expert System
Domain Expert
Knowledge Engineer
Views of an Expert System: Views of an Expert System: End-userEnd-user
User UserInterface
IntelligentProgram
Data Base
Views of an Expert System: Views of an Expert System: Knowledge EngineerKnowledge Engineer
IntelligentProgram
KnowledgeBase
InferenceEngine
Rules, Semantic Networks, Frames, and Facts
General Problem Solving Knowledge:
Forms of InferenceForms of Inference
• Process of drawing conclusions based on facts Process of drawing conclusions based on facts known or thought to be trueknown or thought to be true
• We commonly use three different type:We commonly use three different type:
DeductionDeduction
AbductionAbduction
InductionInduction
DeductionDeduction
Reasoning from a known principle to an unknown, from Reasoning from a known principle to an unknown, from the general to the specific, or from a premise to a logical the general to the specific, or from a premise to a logical conclusionconclusion
Modus PonensModus Ponens
Modus TolensModus Tolens
> rules, theorems, models> rules, theorems, models
Deduction - ExampleDeduction - ExampleSuppose that we know:Suppose that we know:
X, swimming(X) -> wet(X)X, swimming(X) -> wet(X)
If we are now told:If we are now told:
swimming(andy)swimming(andy)
Then we can derive (using Modus Ponens):Then we can derive (using Modus Ponens):
wet(andy)wet(andy)
Used when generating explanationsUsed when generating explanations
Is an Is an unsound form of reasoning form of reasoning
If we know:If we know:
Can we state:Can we state:
AbductionAbduction
X, swimming(X) -> wet(X)
wet(alex)
swimming(alex) ?
Reasoning from particular facts or individual cases to a Reasoning from particular facts or individual cases to a general conclusiongeneral conclusion
This is the basis of scientific discoveryThis is the basis of scientific discovery
Key technique in machine learning and knowledge Key technique in machine learning and knowledge acquisitionacquisition
IFIFTHENTHEN
> generalization, observation > generalization, observation
InductionInduction
Example - Family Example - Family RelationshipsRelationships
child_of(alex, nicole).
child_of(alina, nicole).
child_of(nicholas, leah).
child_of(phillip, leah).
child_of(melanie, cathy).
child_of(leslie, cathy).
child_of(sarah, cathy).
child_of(angela, cathy).
male(alex).
male(phillip).
male(nicholas).
female(alina).
female(leah).
female(nicole).
female(angela).
female(sarah).
female(leslie).
female(melanie).
female(cathy).
Basic domain facts:
Example - Family RelationshipsExample - Family Relationships (cont.)(cont.)
sisters(nicole, leah).
sisters(X, Z) :- child_of(X, Y),
child_of(Z, Y),
female(X),
female(Z).
brothers(X, Z) :- child_of(X, Y),
child_of(Z, Y),
male(X),
male(Z).
Rules:
Machine LearningMachine Learning(Induction from Examples)(Induction from Examples)
What is learning?What is learning?
• ““changes in a system that enable a system changes in a system that enable a system to do the same task more efficiently the to do the same task more efficiently the next time” -- Herbert Simonnext time” -- Herbert Simon
• ““constructing or modifying representations constructing or modifying representations of what is being experienced” -- Ryszard of what is being experienced” -- Ryszard MichalskiMichalski
• ““making useful changes in our minds” -- making useful changes in our minds” -- Marvin MinskyMarvin Minsky
What is learning?What is learning?
• Shorter Oxford Dictionary defines learning Shorter Oxford Dictionary defines learning as:as: … … to get knowledge of (a subject) or skill (in to get knowledge of (a subject) or skill (in
art, etc) by study, experience or teaching. art, etc) by study, experience or teaching. Also to commit to memory …Also to commit to memory …
• so learning involvesso learning involves acquiring NEW knowledgeacquiring NEW knowledge
improving the use of EXISTING knowledge improving the use of EXISTING knowledge • i.e., performancei.e., performance
Why learn?Why learn?
• understand and improve human learningunderstand and improve human learning learn to teachlearn to teach
• discover new thingsdiscover new things data miningdata mining
• fill in skeletal information about a domainfill in skeletal information about a domain incorporate new information in real timeincorporate new information in real time
make systems less “finicky” or “brittle” by make systems less “finicky” or “brittle” by making them better able to generalizemaking them better able to generalize
Why learn?Why learn?• learning is considered to be a KEY element of AIlearning is considered to be a KEY element of AI
• any autonomous system MUST be able to learn any autonomous system MUST be able to learn and adaptand adapt
• sometimes it is easier to sometimes it is easier to `teach’`teach’ or or `explain’`explain’ than to than to `program’`program’ e.g., consider the difference in explaininge.g., consider the difference in explaining tic-tac-toe and tic-tac-toe and
writing a program to play the gamewriting a program to play the game e.g., consider the difference in using a few example pictures e.g., consider the difference in using a few example pictures
to explain the difference between a lion and a tiger, and to explain the difference between a lion and a tiger, and getting a computer to do likewisegetting a computer to do likewise
• any system that makes the same mistake twice any system that makes the same mistake twice is pretty STUPIDis pretty STUPID all systems (e.g., o/s, database) should have some integral all systems (e.g., o/s, database) should have some integral
learning componentlearning component
State of the ArtState of the Art
• modest achievementsmodest achievements
• mostly isolated solutions to datemostly isolated solutions to date
• but canbut can assist automatic knowledge acquisitionassist automatic knowledge acquisition extract relevant knowledge from very large knowledge extract relevant knowledge from very large knowledge
basesbases abstract higher-level concepts out of data setsabstract higher-level concepts out of data sets … … etc.etc.
• recent trend to integrated systemsrecent trend to integrated systems combine various learning methodscombine various learning methods
• induction, deduction, analogy, abductioninduction, deduction, analogy, abduction• symbolic ML, neural networks, genetic algorithms, ...symbolic ML, neural networks, genetic algorithms, ...
Components of a learning Components of a learning systemsystem
Evaluating PerformanceEvaluating Performance
• several possible criteriaseveral possible criteria predictive accuracy of classifierpredictive accuracy of classifier
speed of learnerspeed of learner
speed of classifierspeed of classifier
space requirementsspace requirements
• Most common criterion is Predictive AccuracyMost common criterion is Predictive Accuracy
Symbolic vs. NumericSymbolic vs. Numeric
• ML has traditionally concerned itself with ML has traditionally concerned itself with symbolicsymbolic representationsrepresentations e.g., e.g., [color = orange][color = orange] rather than rather than [wavelength = 600nm][wavelength = 600nm]
• concepts are inherently symbolicconcepts are inherently symbolic
• required for human understanding and recognitionrequired for human understanding and recognition we think in linguistic terms (i.e., symbols) and not in we think in linguistic terms (i.e., symbols) and not in
numbersnumbers
e.g., e.g., bird := has-wings & flies & has-beak & lays-eggs & ...bird := has-wings & flies & has-beak & lays-eggs & ...
• the relationship between symbolic & numerical the relationship between symbolic & numerical representations is still an open debaterepresentations is still an open debate
Learning as SearchLearning as Search• to learn a concept description, need to search through to learn a concept description, need to search through
a `hypothesis’ spacea `hypothesis’ space the space of possible concept descriptionsthe space of possible concept descriptions
• need a language to describe the conceptsneed a language to describe the concepts the choice of language defines a large (possibly infinite) the choice of language defines a large (possibly infinite)
set of potential concept descriptions (i.e., rules)set of potential concept descriptions (i.e., rules)
• the task of the learning algorithm is to search this the task of the learning algorithm is to search this space in an efficient mannerspace in an efficient manner
• the difficulty is how to ignore the vast majority of the difficulty is how to ignore the vast majority of invalid descriptions without missing the useful one(s)invalid descriptions without missing the useful one(s)
• usually requires heuristic methods to prune the searchusually requires heuristic methods to prune the search
SummarySummary
• Decision Trees are widely usedDecision Trees are widely used easy to understand rationaleeasy to understand rationale
can out-perform humanscan out-perform humans
fast, simple to implementfast, simple to implement
handles noisy data wellhandles noisy data well
• WeaknessesWeaknesses univariate (uses only 1 variable at a time)univariate (uses only 1 variable at a time)
batch (non-incremental)batch (non-incremental)
Induction systems
The power behind an intelligent system isknowledge.
We can trace the system success or failure to the quality of its knowledge.
Difficult task:
1. Extracting the knowledge.
2. Encoding the knowledge.
3. Inability to express the knowledge formally.
Induction
inducing general rules from knowledge contained in a finite set of examples.
Induction is the process of reasoning from a given set of facts to conclude general principles or rules.
Induction looks for patterns in available information to infer reasonable conclusions.
Induction as search
Induction can be viewed as a search through a problem space for a solution to a problem. The problem space is composed of the problem’s major concepts linked together by an inductive process that uses examples of the problem.
The choice of representation for the desired function is probably the most important issue. As well as affecting the nature of the algorithm, it can affect whether the problem is feasible at all. Is the desired function representable in the representation language?
An example is described by the values of the attributes and the value of the goal predicate. We call the value of the goal predicate the classification of the example. The complete set of examples is called the training set.
Induction
Induction - first example
Determine an appropriate gift on the basis of available money and the person’s age. Money and age will represent our decision factors (problem attributes).
Money Age Gift
Much Adult Car
Much Child Computer
Little Adult Toaster
Little Child Calculator
Money
Money
Money
Age Age
Age
Induction - decision trees
A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no “decision.” Decision trees therefore represent Boolean functions.
Each internal node in the tree corresponds to a test of the value of one of the properties, and the branches from the node are labeled with the possible values of the test. Each leaf node in the tree specifies the Boolean value to be returned if that leaf is reached.
Decision trees are implicitly limited to talking about a single object. That is, the decision tree language is essentially propositional, with each attribute test being a proposition. We cannot use decision trees to represent tests that refer to two or more different objects.
Decision trees are fully expressive within the class of propositional languages, that is, any Boolean function can be written as a decision tree. Have each row in the truth table for the function correspond to a path in the tree. The truth table is exponentially large in the number of attributes.
Induction - decision trees
Supervised Concept Learning
• given a training set of positive and negative examples of a concept– construct a description that will accurately classify
future examples.– Learn some good estimate of function f given a
training set:{ (x1,y1), (x2,y2), . . . (xn,yn)}where each yi is either + (positive) or - (negative)
• inductive learning generalizes from specific facts– cannot be proven true, but can be proven false
• falsity preserving– is like searching an Hypothesis Space H of possible f
functions– bias allows us to pick which h is preferable– need to define a metric for comparing f functions to
find the best
Inductive learning framework
• raw input is a feature vector, x, that describes the relevant attributes of an example
• each x is a list of n (attribute, value) pairs– x = (person=Sue, major=CS, age=Young, Gender=F)
• attributes have discrete values– all examples have all attributes.
• each example is a point in n-dimensional feature space
• maintain a library of previous cases• when a new problem arises
– find the most similar case(s) in the library– adapt the similar cases to solving the current
problem
Learning Decision Trees
• Goal: Build a decision tree for classifying examples as positive or negative instances of a concept
• Supervised– batch processing of training examples– using a preference bias
Induction - decision trees - second example
•If there are some positive and some negative examples, then choose the best attribute to split them.
•If all the remaining examples are positive (or all negative), then we are done: we can answer Yes or No.
•If there are no examples left, it means that no such example has been observed, and we return a default value calculated from the majority classification at the node’s parent.
•If there are no attributes left, but both positive and negative examples, we have a problem. It means that these examples have exactly the same description, but different classifications. This happens when some of the data are incorrect; we say there is noise in the data. It also happens when the attributes do not give enough information to fully describe the situation, or when the domain is truly nondeterministic.
Induction - decision trees - second example
Induction - decision trees - choice of attributes
Information theory
Mathematical model for choosing the best attribute and at methods for dealing with noise in the data.
The scheme used in decision tree learning for selecting attributes is designed to minimize the depth of the final tree. The idea is to pick the attribute that goes as far as possible toward providing an exact classification of the examples. A perfect attribute divides the examples into sets that are all positive or all negative.
The measure should have its maximum value when the attribute is perfect and its minimum value when the attribute is of no use at all.
Induction - third example
Example Height Eyes Hair Class
E1 tall blue dark 1
E2 short blue dark 1
E3 tall blue blond 2
E4 tall blue red 2
E5 tall brown blond 1
E6 short blue blond 2
E7 short brown blond 1
E8 tall brown dark 1
Induction - exampleExample Height Hair Eyes
Class
E1 tall dark blue 1
E2 short dark blue 1
E3 tall blond blue 2
E4 tall red blue 2
E5 tall blond brown 1
E6 short blond blue 2
E7 short blond brown 1
E8 tall dark brown 1Information in the 8 examples:
Entropy(I) = NN
NN i
c
i
i2
1log
,
Ni – number of examples in class iN – total number of examples (training)
Entropy(I) = -5/8 log2 (5/8) – 3/8 log2 (3/8) =0.954 bit
A t t r i b u t e t e s t :
S e l e c t a n a t t r i b u t e a n d c a l c u l a t e t h e i n f o r m a t i o n g a i n( e n t r o p y ) f o r i t .
E n t r o p y ( I , A K ) = ),( , JK
J
j
kj AIentropyN
n
1 =
=
J
j
c
i kj
kj
kj
kjkj
n
in
n
in
N
n
1 12
)(log
)(
a t t r i b u t e A K , k = 1 , 2 , … , K
T h e e x a m p l e s a r e d i v i d e d i n t o J s u b s e t s , w h e r e J i s t h en u m b e r o f v a l u e s t h e f e a t u r e m a y t a k e .
n k j ( i ) - e x a m p l e s f r o m s u b s e t j b e l o n g i n g t o c l a s s i
n k j – t o t a l n u m b e r o f e x a m p l e s i n s u b s e t j
Induction - exampleExample Height Hair Eyes
Class
E1 tall dark blue 1
E2 short dark blue 1
E3 tall blond blue 2
E4 tall red blue 2
E5 tall blond brown 1
E6 short blond blue 2
E7 short blond brown 1
E8 tall dark brown 1
entropy(I, hair) =
3
0
3
0
3
3
3
3
8
322 loglog
1
1
1
1
1
0
1
0
8
122 loglog
4
2
4
2
4
2
4
2
8
422 loglog =0.5 bit
Induction - exampleExample Height Hair Eyes
Class
E1 tall dark blue 1
E2 short dark blue 1
E3 tall blond blue 2
E4 tall red blue 2
E5 tall blond brown 1
E6 short blond blue 2
E7 short blond brown 1
E8 tall dark brown 1
Induction - exampleExample Height Hair Eyes
Class
E1 tall dark blue 1
E2 short dark blue 1
E3 tall blond blue 2
E4 tall red blue 2
E5 tall blond brown 1
E6 short blond blue 2
E7 short blond brown 1
E8 tall dark brown 1
Induction - example
Information gain max{entropy (I) – entropy (I, AK)
IG (hair) = 0.954 – 0.5 = 0.454 bitIG (height) = 0.954 – 0.951 = 0.003 bitIG (eyes) = 0.954 – 0.607 = 0.347 bit
hair
E4 – class 2E1 – class 1E2 – class 1E8 – class 1
E3 – class 2E5 – class 1E6 – class 2E7 – class 1
reddark
blond
Induction - exampleExample Height Hair Eyes
Class
E3 tall blond blue 2
E5 tall blond brown 1
E6 short blond blue 2
E7 short blond brown 1
e n t r o p y ( I , h e i g h t ) =
2
1
2
1
2
1
2
1
4
222 loglog
2
1
2
1
2
1
2
1
4
222 loglog = 0 . 3 0 2
e n t r o p y ( I , e y e s ) =
2
2
2
2
2
0
2
0
4
222 loglog
2
0
2
0
2
2
2
2
4
222 loglog = 0
Example Height Hair EyesClass
E3 tall blond blue 2
E5 tall blond brown 1
E6 short blond blue 2
E7 short blond brown 1
Induction - example
e n t r o p y ( I , h e i g h t ) =
2
1
2
1
2
1
2
1
4
222 loglog
2
1
2
1
2
1
2
1
4
222 loglog = 0 . 3 0 2
e n t r o p y ( I , e y e s ) =
2
2
2
2
2
0
2
0
4
222 loglog
2
0
2
0
2
2
2
2
4
222 loglog = 0
hair
E4 – class 2
E1 – class 1E2 – class 1E8 – class 1
E3 – class 2E5 – class 1E6 – class 2E7 – class 1
red
dark
blond
E3 – class 2E6 – class 2
E5 – class 1E7 – class 1
blue
brown
Eyes
Induction systems
Determine objective - a search through a decision tree will reach one of a finite set of decisions on the basis of the path taken through the tree.
Determine decision factors - represent the attribute nodes of the decision tree.
Determine decision factor values - represent the attribute values of the decision tree.
Determine solutions - list of final decisions that the system can make - the leaf nodes in the tree.
Form example set.
Create decision tree.
Test the system.
Revise the system.
Induction systems - example
Football game prediction system
Predict the outcome of a football game (will our team win or lose).
Decision factors - location, weather, team record, opponent record.
Decision factor values -
Location Weather Own Record Opponent Record
Home Rain Poor PoorAway Cold Average Average
Moderate Good GoodHot
Solutions - win or lose
Induction systems - example (cont’d)
Examples - Week Locat. Weath Own r Opp. r Own
1 Home Hot Good Good Win
2 Home Rain Good Averg Win
3 Away Moder. Good Averg Loss
4 Away Hot Good Poor Win
5 Home Cold Good Good Loss
6 Away Hot Averg. Averg. Loss
7 Home Moder. Averg. Good Loss
8 Away Cold Poor Averg. Win
Induction systems - example (cont’d)
Decision tree - Weather
Own rec
Location
rain
Win
moderate
Loss
cold
hot
home awa
y
poor good
average
Win WinLoss
Loss
No-data
Test the system - predict the future games. Get the values for the decision factors for the upcoming game and see on which team to bet.
Induction systems - example (test)
Sensitivity study - Location
Induction systems - pros. and cons.
Discovers rules from examples - potential unknown rules could be induced.
Avoids knowledge elicitation problems - system knowledge can be acquired through past examples.
Can produce new knowledge.
Can uncover critical decision factors.
Can eliminate irrelevant decision factors.
Can uncover contradictions.
Difficult to choose good decision factors.
Difficult to understand rules.
Applicable only for classification problems.
Induction systems - implemented
AQ11 - diagnosing soybean diseases. Identifies 15 different diseases. The knowledge was derived from 630 examples and used 35 decision rules.
Willard - forecasting thunderstorms. 140 examples, hierarchy of 30 modules, each with a decision tree.
Rulemaster - detecting signs of transformer faults.
Stock market predictions.