Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Concept Learning and the General-to-Specific
Ordering
Lecture Notes on Advanced AI
Byoung-Tak Zhang
Biointelligence LaboratoryComputer Science and Engineering, Bioinformatics,
Brain Science, and Cognitive ScienceSeoul National University
http://bi.snu.ac.kr/
2(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Concept of Concepts
Examples of Conceptsi“birds”, “car”, “situations in which I should study
more in order to pass the exam”
Concept iSome subset of objects or events defined over a
larger set, or iA boolean-valued function defined over this larger
set.iConcept “birds” is the subset of animals that
constitute birds.
3(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Concept Learning
Learning iInducing general functions from specific
training examplesConcept learningiAcquiring the definition of a general
category given a sample of positive and negative training examples of the categoryiInferring a boolean-valued function from
training examples of its input and output.
4(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
A Concept Learning Task
Target concept EnjoySporti“days on which Aldo enjoys water sport”
HypothesisiA vector of 6 constraints, specifying the values of
the six attributes Sky, AirTemp, Humidity, Wind, Water, and Forecast.iFor each attribute the hypo will either “?”, single
value (e.g. Warm) , or “0”i<?, Cold, High, ?, ?, ?> expresses the hypothesis
that Aldo enjoys his favorite sport only on cold days with high humidity.
5(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Training Examples for EnjoySport
Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport
A Sunny Warm Normal Strong Warm Same NoB Sunny Warm High Strong Warm Same YesC Rainy Cold High Strong Warm Change NoD Sunny Warm High Strong Cool Change Yes
Training examples for the target concept EnjoySport
6(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
The Learning Task
Given:iInstances X: set of items over which the concept is defined.iHypotheses H: conjunction of constraints on attributes.iTarget concept c: c : X→ {0, 1}iTraining examples (positive/negative): <x,c(x)>iTraining set D: available training examples
Determine:i A hypothesis h in H such that h(x) = c(x), for all x in X
7(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive Learning Hypothesis
Learning task is to determine h identical to c over the entire set of instances X.But the only information about c is its value over D.Inductive learning algorithms can at best guarantee that the induced h fits c over D.Assumption is that the best h regarding unseen instances is the h that best fits the observed data in D. Inductive learning hypothesisiAny good hypothesis over a sufficiently large set of
training examples will also approximate the target function. well over unseen examples.
8(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Concept Learning as Search
Search iFind a hypothesis that best fits training examplesiEfficient search in hypothesis space (finite/infinite)
Search space in EnjoySporti3*2*2*2*2*2 = 96 distinct instances (eg. Sky={Sunny,
Cloudy, Rainy}i5*4*4*4*4*4 = 5120 syntactically distinct hypotheses
within H (considering 0 and ? in addition)i1+4*3*3*3*3*3 = 973 semantically distinct hypotheses
(count just one 0 for each attribute since every hypo having one or more 0 symbols is empty)
9(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
General-to-Specific Ordering
General-to-specific ordering of hypotheses:x satisfies h ⇔ h(x)=1More_general_than_or_equal_to relation
(Strictly) more_general_than relation
<Sunny,?,?,?,?,?> >g <Sunny,?,?,Strong,?,?>
)]1)(()1)[(( )( =→=∈∀⇔≥ xhhXxhh jxkkgj
)()( jgkkgjkgj hhhhhh ≥¬∧≥⇔>
10(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
More_General_Than Relation
11(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
FindFind--SS: Finding a Maximally SpecificHypothesis
1. Initialize h to the most specific hypothesis in H2. For each positive training example x
For each attribute constraint ai in hIf the constraint ai is satisfied by xThen do nothingElse replace ai in h by the next more general
constraint satisfied by x
3. Output hypothesis h
12(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Hypothesis Space Search by FindFind--SS
13(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Properties of FindFind--SS
Ignores every negative example (no revision to hrequired in response to negative examples). Why? What’re the assumptions for this? Guaranteed to output the most specific hypothesis consistent with the positive training examples (for conjunctive hypothesis space).Final h also consistent with negative examples provided the target c is in H and no error in D.
14(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Weaknesses of Find-S
Has the learner converged to the correct target concept? No way to know whether the solution is unique.Why prefer the most specific hypothesis? How about the most general hypothesis?Are the training examples consistent? Training sets containing errors or noise can severely mislead the algorithm Find-S.What if there are several maximally specific consistent hypotheses? No backtrack to explore a different branch of partial ordering.
15(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Version Spaces (VSs)
Output all hypotheses consistent with the training examples.Version spaceiConsistent(h,D) ⇔(∀<x,c(x)> ∈ D) h(x) = c(x)
iVSH,D⇔ {h ∈ H | Consistent(h,D)}List-Then-Eliminate AlgorithmiLists all hypotheses, then removes inconsistent
ones.iApplicable to finite H
16(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Compact Representation of VSs
More compact representation for version spacesiGeneral boundary G
iSpecific boundary S
iVersion Space redefined with S and G
)]},'()')['(),(|{ DsConsistentggHgDsConsistentHgG g ∧>∈¬∃∧∈⇔
)]},'()')[('(),(|{ DsConsistentssHsDsConsistentHsS g ∧>∈¬∃∧∈≡
)})()((|{, shgGgSsHhVS ggDH ≥≥∈∃∈∃∈=
17(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
A Version Space with S and G Boundaries
18(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
CE: Candidate-Elimination Algorithm
Initialize G to the set of maximally general hypotheses in HInitialize S to the set of maximally specific hypotheses in HFor each training example d, do
If d is a positive exampleRemove from G any hypothesis inconsistent with dFor each hypothesis s in S that is not consistent with d
Remove s from SAdd to S all minimal generalizations h of s such that
h is consistent with d, and some member of G is more generalthan h
Remove from S any hypothesis that is more general than another hypothesis in S
19(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
If d is a negative exampleRemove from S any hypothesis inconsistent with dFor each hypothesis g in G that is not consistent with d
Remove g from GAdd to G all minimal specializations h of g such that
h is consistent with d, and some member of S is more specific than h
Remove from G any hypothesis that is less general than anotherhypothesis in G
Candidate-Elimination Algorithm
20(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Given First Two Examples
21(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
After the Third Example
22(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
After the Fourth Example
23(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
The Concept Learned
24(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Remarks on Candidate Elimination
Will the CE algorithm converge to the correct hypothesis?What training example should the learner request next?How can partially learned concepts be used?
25(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
When Does CE Converge?
Will the Candidate-Elimination algorithm converge to the correct hypothesis?Prerequisites
1. No error in training examples2. The target hypothesis exists which correctly
describes c(x).
If S and G boundary sets converge to an empty set, this means there is no hypothesis in Hconsistent with observed examples.
26(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Who Provides Examples?
What training example should the learner request next?Two methodsiFully supervised learning: External teacher provides all
training examples (input + correct output)iLearning by query: The learner generates instances (queries)
by conducting experiments, then obtains the correct classification for this instance from an external oracle (natureor a teacher).
Negative training examples specializes G, positive ones generalize S.
27(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Optimal Query Strategies
What would be a good query? The learner should attempt to discriminate among alternative competing hypotheses in its current version space.A good query is the one that is classified positive by some of these hypos, but negative by others. In general, the optimal query strategy for a concept learner is to generate instances that satisfy exactly halfthe hypos in the current version space.Experiments needed to find the correct target concept:
⎡ ⎤VS2log
28(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
How to Use Partially Learned Concepts?
Suppose the learner is asked to classify the four new instances shown in the following table.
Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport
A Sunny Warm Normal Strong Cool Change ?B Rainy Cold Normal Light Warm Same ?C Sunny Warm Normal Light Warm Same ?D Sunny Cold Normal Strong Warm Same ?
A: classified as positive by all hypos in the current version space (Fig. 2.3)
B: classified as negative by all hypos
C: 3 positive, 3 negative
D: 2 positive, 4 negative (can be decided by majority vote, for example)
29(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Partially Learned VS Revisited
30(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
CE will converge toward the target concept provided that it is contained in its initial hypo space and training examples contain no errors.What if the target concept is not contained in the hypo space?One solution: Use a hypothesis space that includes every possible hypothesis (more expressive hypo space).New problem: Generalize poorly or do not generalize at all.
Fundamental Questions for Inductive Inference
31(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive Bias
EnjoySport: H contains only conjunctions of attribute values. This H is unable to represent even simple disjunctive target concepts such as <Sunny,?,?,?,?,?> ∨ <Cloudy,?,?,?,?,?>
Given the following three training examples of this disjunctive hypothesis, CE would find that there are zero hypo in VS.
32(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
The problem is that we have biased the learner to consider only conjunctive hypotheses. We require more expressive hypothesis space.
-
Example Sky AirTemp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Cool Change Yes2 Cloudy Warm Normal Strong Cool Change Yes3 Rainy Warm Normal Strong Cool Change No
A Biased Hypothesis Space
33(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
An Unbiased LearnerOne solution: Provide H contains every teachable concept (every possible subset of instances X).iPower set of X: set of all subsets of a set X
EnjoySport: |X| = 96
iSize of the power set: 2|X| = 296 = 1028 (the number of distinct target concepts)
In contrast, our conjunctive H contains only 973 (semantically distinct) of these.New problem: unable to generalize beyond the observed examples.iObserved examples are only unambiguously classified.iVoting results in no majority or minority.
34(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Futility of Bias-Free Learning
Fundamental property of inductive inference:
“A learner that makes no a prioriassumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances.”
35(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
L: an arbitrary learning algorithmc: some arbitrary target conceptDc = { <x, c(x)> }: an arbitrary set of training dataL(xi, Dc): classification that L assigns to xi after learning Dc.Inductive inference step performed by L:
(Dc ^ xi) I> L(xi, Dc)
Inductive Inference
36(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive Bias Formally Defined
Because L is an inductive learning algorithm, the result L(xi, Dc): will not in general provably correct; L need not follow deductively from Dc and xi.However, additional assumptions can be added to Dc ^ xi so that L(xi, Dc) would follow deductively. Definition: The inductive bias of L is any minimal set of assertions B (assumptions, background knowledge etc.) such that for any target concept c and corresponding training examples Dc
)],())[(( ciici DxLxDBXx f∧∧∈∀
37(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive Bias of CE Algorithm
Given the assumption c∈ H, the inductive inference performed by the CE algorithm can be justified deductively. Why?iIf we assume c∈ H, it follows deductively that
c∈ VSH,Dc.iSince we defined L(xi, Dc) to be unanimous vote of
all hypos in VS, if L outputs the classification L(xi, Dc), it must be the case the every hypo in L(xi, Dc) also produces this classification, including the hypo c∈ VSH,Dc.
Inductive bias of CE: The target concept c is contained in the given hypothesis space H.
38(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive & Deductive Systems
39(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Strength of Inductive Biases
(1) Rote-Learner: weakest (no bias)(2) Candidate-Elimination Algorithm(3) Find-S: strongest bias of the three
40(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive Bias of Rote-Learner
Simply stores each observed training example in memory.New instances are classified by looking them up in memory:i If it is found in memory, the stored classification is returned.iOtherwise, the system refuses to classify the new instance.
No inductive bias: The classifications for new instances follow deductively from D with no additional assumptions required.
41(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive Bias of Cand.-Ellim.
New instances are classified only if all hypos in VS agree. Otherwise, it refuses to classify.Inductive bias: The target concept can be represented in its hypothesis space.This inductive bias is stronger than that of rote-learner since CE will classify some instances that the rote-learner will not.
42(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Inductive Bias of Find-S
Find the most specific hypo consistent with Dand uses this hypo to classify new instances.Even stronger inductive biase iThe target concept can be described in its hypo
space.iAll instances are negative unless opposite is
entailed by its other knowledge (default reasoning)
43(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Summary (1/3)
Concept learning can be cast as a problem of searching through a large predefined space of potential hypotheses.General-to-specific partial ordering of hypotheses provides a useful structure for search.Find-S algorithm performs specific-to-general search to find the most specific hypothesis.
44(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Candidate-Elimination algorithm computes version space by incrementally computing the sets of maximally specific (S) and maximally general (G) hypotheses.S and G delimit the entire set of hypotheses consistent with the data. Version spaces and Candidate-Eliminationalgorithm provide a useful conceptual framework for studying concept learning.
Summary (2/3)
45(c) 1997 SNU Dept. of Computer Engineering SCAI Lab
Summary (3/3)
Candidate-Elimination algorithm is not robust to noisy data or to situations where the unknown target concept is not expressible in the provided hypothesis space.Inductive bias in Candidate-Elimination algorithm is that target concept exists in HIf the hypothesis space is enriched to the point where there is every possible hypothesis (the power set of instances), then this will remove the inductive bias of CE and thus remove the ability to classify any instance beyond the observed examples.