17
Perceptrons and Neural Nets Machine Learning, Fall 2010 1

Perceptrons and Neural Nets - UMD Department of …lily/Teaching/498FSlides/7-Perceptro… ·  · 2010-10-04Start talking about perceptrons and neural nets This week’s reading:

Embed Size (px)

Citation preview

Perceptrons and Neural Nets

Machine Learning, Fall 2010

1

L. Mihalkova, CSMC498F, Fall2010

AdministrativiaTopics this week:

Finish up discussion of how to evaluate hypotheses

Start talking about perceptrons and neural nets

This week’s reading:

Chapter 4

Big Picture:

This is 4th week of talking about supervised learning

After this, 3 classes on learning theory, 1 class on midterm preparation

So, midterm is in 3 weeks!

2

L. Mihalkova, CSMC498F, Fall2010

Perceptrons and Neural Nets

Neural Networks

Biologically inspired to emulate the brain

Many simple components (analogous to the brain cells) work together by passing stimuli to obtain complex behavior

Perceptrons

The simple components that make up a neural network (sort of)

3

L. Mihalkova, CSMC498F, Fall2010

Perceptrons

Sometimes called “linear threshold functions”

Interesting as building blocks of neural nets

But also interesting in their own right

Very simple and easy to use model

Surprisingly effective in many applications

4

L. Mihalkova, CSMC498F, Fall2010

What is a Perceptron

A thresholded, linear combination of the attributes

5

L. Mihalkova, CSMC498F, Fall2010

Training Perceptrons

Can use various techniques

The “Perceptron” Algorithm ✔

Gradient Descent ✔

Linear programming

6

L. Mihalkova, CSMC498F, Fall2010

The Perceptron AlgoGiven:

D: Data set of linearly separable examplesη: Learning rate

Set each wi to a small arbitrary initial valueWhile learner makes mistakes:

For each

If

Set

7

X̄ = !x1, ..., xn; y" # D

y · (n!

i=0

xiwi) ! 0

wi ! wi + !yxi

L. Mihalkova, CSMC498F, Fall2010

Perceptron Algo Properties

Provided that the data are linearly separable, the perceptron algorithm converges within a finite number of iterations

8

L. Mihalkova, CSMC498F, Fall2010

Data Not Linearly Separable

Option A: Add attributes formed as functions of existing attributes

Option B: Settle with good enough -- find a hypothesis that minimizes (squared) error on training data

9

E(W ) ! 12

!

d!D(td " od)2

True value of target attr.

Unthresholded value of target attr.

L. Mihalkova, CSMC498F, Fall2010

Minimizing Squared Error

Will use gradient descent

Initialize W to small arbitrary values

Iterate over examples, each time

computing the gradient

moving in direction opposite to gradient

10

wi ! wi " !"E

"wi

!E

!wi=

!

d!D

(td ! od)xid

i-th attr of d-th example

Unthresholded output

L. Mihalkova, CSMC498F, Fall2010

Stochastic Gradient Descent

11

Will use gradient descent

Initialize W to small arbitrary values

Iterate over examples, each time

computing the gradient

moving in direction opposite to gradient

wi ! wi " !"E

"wi

!E

!wi=

!

d!D

(td ! od)xid

i-th attr of d-th example

Unthresholded output

For each example d:

L. Mihalkova, CSMC498F, Fall2010

Difference

What is the difference between the perceptron and gradient descent?

Perceptron Algo: converges to perfect hypothesis when data are linearly separable

Gradient descent: converges asymptotically toward min error hypothesis, regardless of whether data are linearly separable

e.g., might require infinite time

12

L. Mihalkova, CSMC498F, Fall2010

Neural Networks

The typical NN looks something like this:

13

Inputs

Outputs

Hidden Units

L. Mihalkova, CSMC498F, Fall2010

Sigmoid Function

14

y =1

1 + exp(!x)

a.k.a Logistic Function

!(x)

L. Mihalkova, CSMC498F, Fall2010

Backpropagation Algo

Set-up

Each instance is a pair

The NN has

nin inputs

nhidden hidden units

nout outputs

Initial weights are small arbitrary values

15

!X̄, t̄"

L. Mihalkova, CSMC498F, Fall2010

Backpropagation Algo

While not done

Iterate over each :

Propagate through the network to compute output values

Compute error for each output unit k

Compute error for each hidden unit h

Update each

16

!X̄, t̄"

!k

!h

wji ! wji + !"jxji

L. Mihalkova, CSMC498F, Fall2010

Computing the Errors

Of output units

Of hidden units

17

!k = ok(1! ok)(tk ! ok)

!h = oh(1! oh)!

k!outputs

wkh!k

Output of sigmoid unit k

Correct value