29
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Embed Size (px)

Citation preview

Page 1: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

CS 478 – Tools for Machine Learning and Data Mining

The Need for and Role of Bias

Page 2: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Learning

• Rote:– Until you discover the rule/concept(s), the very

BEST you can ever expect to do is:• Remember what you observed• Guess on everything else

• Inductive:– What you do when you GENERALIZE from your

observations and make (accurate) predictions

Claim: “All [most of] the laws of nature were discovered by inductive reasoning”

Page 3: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

The Big Question

• All you have is what you have OBSERVED• Your generalization should at least be

consistent with those observations• But beyond that…

– How do you know that your generalization is any good?

– How do you choose among various candidate generalizations?

Page 4: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

The Answer Is

BIASAny basis for choosing one decision over

another, other than strict consistency with past observations

Page 5: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Why Bias?

• If you have no bias you cannot go beyond mere memorization– Mitchell’s proof using UGL and VS

• The power of a generalization system follows directly from its biases

• Progress towards understanding learning mechanisms depends upon understanding the sources of, and justification for, various biases

Page 6: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Concept Learning Given:

A language of observations/instances A language of concepts/generalizations A matching predicate A set of observations

Find generalizations that:1. Are consistent with the observations, and2. Classify instances beyond those observed

Page 7: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Claim

The absence of bias makes it impossible to solve part 2 of the Concept Learning problem, i.e.,

learning is limited to rote learning

Page 8: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Unbiased Generalization Language

• Generalization set of instances it matches• An Unbiased Generalization Language (UGL),

relative to a given language of instances, allows describing every possible subset of instances

• UGL = power set of the given instance language

Page 9: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Unbiased Generalization Procedure

• Uses Unbiased Generalization Language • Computes Version Space (VS) relative to UGL

• VS set of all expressible generalizations consistent with the training instances

Page 10: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Version Space (I)

• Let:– S be the set of maximally specific generalizations

consistent with the training data – G be the set of maximally general generalizations

consistent with the training data

Page 11: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Version Space (II)

• Intuitively– S keeps generalizing to accommodate new positive

instances– G keeps specializing to avoid new negative instances

• The key issue is that they only do that to the smallest extent necessary to maintain consistency with the training data, that is, G remains as general as possible and S remains as specific as possible.

Page 12: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Version Space (III)

• The sets S and G precisely delimit the version space (i.e., the set of all plausible versions of the emerging concept).

• A generalization g is in the version space represented by S and G if and only if:– g is more specific than or equal to some member

of G, and– g is more general than or equal to some member

of S

Page 13: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Version Space (IV)• Initialize G to the most general concept in the space• Initialize S to the first positive training instance• For each new positive training instance p

– Delete all members of G that do not cover p– For each s in S

• If s does not cover p– Replace s with its most specific generalizations that cover p

– Remove from S any element more general than some other element in S– Remove from S any element not more specific than some element in G

• For each new negative training instance n– Delete all members of S that cover n– For each g in G

• If g covers n– Replace g with its most general specializations that do not cover n

– Remove from G any element more specific than some other element in G– Remove from G any element more specific than some element in S

• If G=S and both are singletons– A single concept consistent with the training data has been found

• If G and S become empty– There is no concept consistent with the training data

Page 14: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Lemma 1

Any new instance, NI, is classified as positive if and only if NI is identical to some observed

positive instance

Page 15: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Proof of Lemma 1

• (). If NI is identical to some observed positive instance, then NI is classified as positive– Follows directly from the definition of VS

• (). If NI is classified as positive, then NI is identical to some observed positive instance– Let g={p: p is an observed positive instance}

• UGL gVS• NI matches all of VS NI matches g

Page 16: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Lemma 2

Any new instance, NI, is classified as negative if and only if NI is identical to some observed

negative instance

Page 17: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Proof of Lemma 2

• (). If NI is identical to some observed negative instance, then NI is classified as negative– Follows directly from the definition of VS

• (). If NI is classified as negative, then NI is identical to some observed negative instance – Let G={all subsets containing observed negative instances}

• UGL GVS=UGL • NI matches none in VS NI was observed

Page 18: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Lemma 3

If NI is any instance which was not observed, then NI matches exactly one half of VS, and so

cannot be classified

Page 19: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Proof of Lemma 3

• (). If NI was not observed, then NI matches exactly one half of VS, and so cannot be classified – Let g={p: p is an observed positive instance}– Let G’={all subsets of unobserved instances}

• UGL VS={gg’: g’G’}• NI was not observed NI matches exactly ½ of G’ NI

matches exactly ½ of VS

Page 20: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Theorem

An unbiased generalization procedure can never make the inductive leap necessary to classify

instances beyond those it has observed

Page 21: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Proof of the Theorem

• The result follows immediately from Lemmas 1, 2 and 3

• Practical consequence: If a learning system is to be useful, it must have some form of bias

Page 22: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Sources of Bias in Learning

• The representation language cannot express all possible classes of observations

• The generalization procedure is biased– Domain knowledge (e.g., double bonds rarely break)– Intended use (e.g., ICU – relative cost)– Shared assumptions (e.g., crown, bridge – dentistry)– Simplicity and generality (e.g., white men can’t jump)– Analogy (e.g., heat vs. water flow, thin ice)– Commonsense (e.g., social interactions, pain, etc.)

Page 23: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Fall 2004 CS 478 - Machine Learning

23

Representation Language

Decrease number of expressible generalizations Increase ability to make the inductive leap

Example: Restrict generalizations to conjunctive constraints on features in a Boolean domain

Page 24: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Fall 2004 CS 478 - Machine Learning

24

Proof of Concept (I)

Let N = number of features22N subsets of instances

Let GL = {0, 1, *}can only denote subsets of size 2p for 0pN

Page 25: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

CS 478 - Machine Learning

25Fall 2004

Proof of Concept (II)

For each p, there are only 2N-p expressible subsets Fix N-p features (there are ways of choosing

which)Set values for the selected features (there are 2N-p

possible settings)

pNNC

pNNC

Page 26: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

CS 478 - Machine Learning

26Fall 2004

Proof of Concept (III)

Excluding the empty set, the ratio of expressible to total generalizations is given by:

1N22

N

0p

pNNC

pN2

Page 27: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

CS 478 - Machine Learning

27Fall 2004

Proof of Concept (IV)

For example, if N=5 then only about 1 in 107 subsets may be representedStrong bias

Two-edge sword: representation could be too sparse

Page 28: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Generalization Procedure

• Domain knowledge (e.g., double bonds rarely break)

• Intended use (e.g., ICU – relative cost)• Shared assumptions (e.g., crown, bridge –

dentistry)• Simplicity and generality (e.g., white men can’t

jump)• Analogy (e.g., heat vs. water flow, thin ice)• Commonsense (e.g., social interactions, pain, etc.)

Page 29: CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias

Fall 2004 CS 478 - Machine Learning

29

Conclusion

Absence of bias = rote learningEfforts should focus on combined use of prior

knowledge and observations in guiding the learning process

Make biases and their use as explicit as observations and their use