16
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Embed Size (px)

Citation preview

Page 1: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Concept learning, Regression

Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Page 2: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)2

S, G, and the Version Space

most specific hypothesis, S

most general hypothesis, G

h H, between S and G isconsistent

and make up the version space

(Mitchell, 1997)

Page 3: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3

VC Dimension

N points can be labeled in 2N ways as +/– H shatters N if there

exists h H consistent for any of these: VC(H ) = N

An axis-aligned rectangle shatters 4 points only !

Page 4: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)4

How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ?(Blumer et al., 1989)

Each strip is at most ε/4 Pr that we miss a strip 1‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4)N

Pr that N instances miss 4 strips 4(1 ‒ ε/4)N

4(1 ‒ ε/4)N ≤ δ and (1 ‒ x)≤exp( ‒ x) 4exp(‒ εN/4) ≤ δ and N ≥ (4/ε)log(4/δ)

Probably Approximately Correct (PAC) Learning

Page 5: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5

Use the simpler one because Simpler to use

(lower computational complexity)

Easier to train (lower space complexity)

Easier to explain (more interpretable)

Generalizes better (lower variance - Occam’s razor)

Noise and Model Complexity

Page 6: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)6

Multiple Classes, Ci i=1,...,KNt

tt,r 1}{ xX

, if 0

if 1

ijr

jt

it

ti C

C

x

x

Train hypotheses hi(x), i =1,...,K:

, if 0

if 1

ijh

jt

it

ti C

C

x

xx

Page 7: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)7

Regression

01 wxwxg

012

2 wxwxwxg

N

t

tt xgrN

gE1

21| X

N

t

tt wxwrN

w,wE1

2

0101

1|X

tt

t

N

ttt

xfr

r

rx 1,X

Page 8: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)8

Model Selection & Generalization Learning is an ill-posed problem; data is not

sufficient to find a unique solution The need for inductive bias, assumptions about H Generalization: How well a model performs on

new data Overfitting: H more complex than C or f Underfitting: H less complex than C or f

Page 9: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9

Triple Trade-Off

There is a trade-off between three factors (Dietterich, 2003):

1. Complexity of H, c (H),2. Training set size, N, 3. Generalization error, E, on new data

As NE As c (H)first Eand then E

Page 10: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)10

Cross-Validation

To estimate generalization error, we need data unseen during training. We split the data as Training set (50%) Validation set (25%) Test (publication) set (25%)

Resampling when there is few data

Page 11: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)11

Dimensions of a Supervised Learner1. Model :

2. Loss function:

3. Optimization procedure:

|xg

t

tt g,rLE || xX

X|min arg

E*

Page 12: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)12

Steps to solving a supervised learning problem1. Select the input-output pairs2. Decide how to encode the inputs and outputs –

This defines the instance space X, and the out put space Y.

3. Choose a class of hypotheses / representations: H4. Choose an error function to define the best

hypothesis5. Choose an algorithm for searching efficiently

through the space of hypotheses.

Page 13: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)13

Example: What hypothesis class should we pick?

x y.86 2.49

.09 .83

-.85 -.25

.87 3.1

-.44 .87

-.43 .02

-1.1 -.12

.4 1.81

-.96 -.83

.17 .43

x

y y

Page 14: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)14

Linear Hypothesis

Suppose y was a linear function of x: hw(x) = w0 + w1x (+ … )

wi are called parameters or weights.

To simplify notation we add an attribute x0 = 1to the other n attributes (also called the bias term).

where w and x are vectors of size n+1 How should we pick w ?

Page 15: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)15

Error Minimization

We should make the predictions of hw close the true value y on the data we have.

We define an error function or a cost function. We will pick w such that the error function is

minimized.

How should we choose the error function?

Page 16: Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)16

Least Mean Squares (LMS)

Try to make hw(x) close to y on the examples in the training set. We define a sum-of-squares error function

We will choose w such as to minimize J(w) Compute w such that:

2

12

1)(

m

iii yxhJ ww

njwJw j

0,0