Download pdf - Perceptrons and Neural Nets - UMD Department of …lily/Teaching/498FSlides/7-Perceptro… · · 2010-10-04Start talking about perceptrons and neural nets This week’s reading:

Perceptrons and Neural Nets

Machine Learning, Fall 2010

1

L. Mihalkova, CSMC498F, Fall2010

AdministrativiaTopics this week:

Finish up discussion of how to evaluate hypotheses

Start talking about perceptrons and neural nets

This week’s reading:

Chapter 4

Big Picture:

This is 4th week of talking about supervised learning

After this, 3 classes on learning theory, 1 class on midterm preparation

So, midterm is in 3 weeks!

2


Perceptrons and Neural Nets

Neural Networks

Biologically inspired to emulate the brain

Many simple components (analogous to the brain cells) work together by passing stimuli to obtain complex behavior

Perceptrons

The simple components that make up a neural network (sort of)

3


Perceptrons

Sometimes called “linear threshold functions”

Interesting as building blocks of neural nets

But also interesting in their own right

Very simple and easy to use model

Surprisingly effective in many applications

4


What is a Perceptron

A thresholded, linear combination of the attributes

5


Training Perceptrons

Can use various techniques

The “Perceptron” Algorithm ✔

Gradient Descent ✔

Linear programming

6


The Perceptron AlgoGiven:

D: Data set of linearly separable examplesη: Learning rate

Set each wi to a small arbitrary initial valueWhile learner makes mistakes:

For each

If

Set

7

X̄ = !x1, ..., xn; y" # D

y · (n!

i=0

xiwi) ! 0

wi ! wi + !yxi


Perceptron Algo Properties

Provided that the data are linearly separable, the perceptron algorithm converges within a finite number of iterations

8


Data Not Linearly Separable

Option A: Add attributes formed as functions of existing attributes

Option B: Settle with good enough -- find a hypothesis that minimizes (squared) error on training data

9

E(W ) ! 12

!

d!D(td " od)2

True value of target attr.

Unthresholded value of target attr.


Minimizing Squared Error

Will use gradient descent

Initialize W to small arbitrary values

Iterate over examples, each time

computing the gradient

moving in direction opposite to gradient

10

wi ! wi " !"E

"wi

!E

!wi=

!

d!D

(td ! od)xid

i-th attr of d-th example

Unthresholded output


Stochastic Gradient Descent

11

Will use gradient descent

Initialize W to small arbitrary values

Iterate over examples, each time

computing the gradient

moving in direction opposite to gradient

wi ! wi " !"E

"wi

!E

!wi=

!

d!D

(td ! od)xid

i-th attr of d-th example

Unthresholded output

For each example d:


Difference

What is the difference between the perceptron and gradient descent?

Perceptron Algo: converges to perfect hypothesis when data are linearly separable

Gradient descent: converges asymptotically toward min error hypothesis, regardless of whether data are linearly separable

e.g., might require infinite time

12


Neural Networks

The typical NN looks something like this:

13

Inputs

Outputs

Hidden Units


Sigmoid Function

14

y =1

1 + exp(!x)

a.k.a Logistic Function

!(x)


Backpropagation Algo

Set-up

Each instance is a pair

The NN has

nin inputs

nhidden hidden units

nout outputs

Initial weights are small arbitrary values

15

!X̄, t̄"


Backpropagation Algo

While not done

Iterate over each :

Propagate through the network to compute output values

Compute error for each output unit k

Compute error for each hidden unit h

Update each

16

!X̄, t̄"

!k

!h

wji ! wji + !"jxji


Computing the Errors

Of output units

Of hidden units

17

!k = ok(1! ok)(tk ! ok)

!h = oh(1! oh)!

k!outputs

wkh!k

Output of sigmoid unit k

Correct value