49
Lecture 13 – Perceptrons Machine Learning

Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

Embed Size (px)

Citation preview

Page 1: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

Lecture 13 – Perceptrons

Machine Learning

Page 2: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

2

Last Time

• Hidden Markov Models– Sequential modeling represented in a Graphical

Model

Page 3: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

3

Today

• Perceptrons• Leading to– Neural Networks– aka Multilayer Perceptron Networks– But more accurately: Multilayer Logistic

Regression Networks

Page 4: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

4

Review: Fitting Polynomial Functions

• Fitting nonlinear 1-D functions• Polynomial:

• Risk:

Page 5: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

5

Review: Fitting Polynomial Functions

• Order-D polynomial regression for 1D variables is the same as D-dimensional linear regression

• Extend the feature vector from a scalar.

• More generally

Page 6: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

6

Neuron inspires Regression

• Graphical Representation of linear regression– McCullough-Pitts Neuron– Note: Not a graphical model

Page 7: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

7

Neuron inspires Regression

• Edges multiply the signal (xi) by some weight (θi).• Nodes sum inputs• Equivalent to Linear Regression

Page 8: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

8

Introducing Basis functions

• Graphical representation of feature extraction

• Edges multiply the signal by a weight.• Nodes apply a function, ϕd.

Page 9: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

9

Extension to more features

• Graphical representation of feature extraction

Page 10: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

10

Combining function

• How do we construct the neural output

Linear Neuron

Page 11: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

11

Combining function

• Sigmoid function or Squashing function

Logistic NeuronClassification using the same metaphor

Page 12: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

12

Logistic Neuron optimization

• Minimizing R(θ) is more difficult

Bad News: There is no “closed-form” solution.

Good News: It’s convex.

Page 13: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

13

Aside: Convex Regions

• Convex: for any pair of points xa and xb within a region, every point xc on a line between xa and xb is in the region

Page 14: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

14

Aside: Convex Functions

• Convex: for any pair of points xa and xb within a region, every point xc on a line between xa and xb is in the region

Page 15: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

15

Aside: Convex Functions

• Convex: for any pair of points xa and xb within a region, every point xc on a line between xa and xb is in the region

Page 16: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

16

Aside: Convex Functions

• Convex functions have a single maximum and minimum!

• How does this help us? • (nearly) Guaranteed optimality of Gradient

Descent

Page 17: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

17

Gradient Descent

• The Gradient is defined (though we can’t solve directly)

• Points in the direction of fastest increase

Page 18: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

18

Gradient Descent

• Gradient points in the direction of fastest increase

• To minimize R, move in the opposite direction

Page 19: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

19

Gradient Descent

• Gradient points in the direction of fastest increase

• To minimize R, move in the opposite direction

Page 20: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

20

Gradient Descent

• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the

minimum

Page 21: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

21

Gradient Descent

• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the

minimum

Page 22: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

22

Gradient Descent

• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the

minimum

Page 23: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

23

Gradient Descent

• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the

minimum

Page 24: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

24

Gradient Descent

• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the

minimum

Page 25: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

25

Gradient Descent

• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the

minimum

Page 26: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

26

Gradient Descent

• Initialize Randomly• Update with small steps• (nearly) guaranteed to converge to the

minimum

Page 27: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

27

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 28: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

28

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 29: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

29

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 30: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

30

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 31: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

31

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 32: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

32

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 33: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

33

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 34: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

34

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 35: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

35

Gradient Descent

• Initialize Randomly• Update with small steps• Can oscillate if η is too large

Page 36: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

36

Gradient Descent

• Initialize Randomly• Update with small steps• Can stall if is ever 0 not at the minimum

Page 37: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

37

Back to Neurons

Linear Neuron

Logistic Neuron

Page 38: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

38

• Classification squashing function

• Strictly classification error

Perceptron

Perceptron

Page 39: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

39

Classification Error

• Only count errors when a classification is incorrect.

Sigmoid leads to greater than zero error on correct classifications

Page 40: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

40

Classification Error

• Only count errors when a classification is incorrect.

Perceptrons usestrictly classificationerror

Page 41: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

41

Perceptron Error

• Can’t do gradient descent on this.

Perceptrons usestrictly classificationerror

Page 42: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

42

Perceptron Loss

• With classification loss:• Define Perceptron Loss.– Loss calculated for each misclassified data point

– Now piecewise linear risk rather than step function

vs

Page 43: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

43

Perceptron Loss

• Perceptron Loss.– Loss calculated for each misclassified data point

– Nice gradient descent

Page 44: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

44

Perceptron vs. Logistic Regression

• Logistic Regression has a hard time eliminating all errors– Errors: 2

• Perceptrons often do better. Increased importance of errors.– Errors: 0

Page 45: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

45

Stochastic Gradient Descent

• Instead of calculating the gradient over all points, and then updating.

• Update the gradient for each misclassified point at training

• Update rule:

Page 46: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

46

Online Perceptron Training

• Online Training – Update weights for each data point.

• Iterate over points xi point, – If xi is correctly classified– Else

• Theorem: If xi in X are linearly separable, then this process will converge to a θ* which leads to zero error in a finite number of steps.

Page 47: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

47

Linearly Separable

• Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line

Page 48: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

48

Linearly Separable

• Two classes of points are linearly separable, iff there exists a line such that all the points of one class fall on one side of the line, and all the points of the other class fall on the other side of the line

Page 49: Lecture 13 – Perceptrons Machine Learning. Last Time Hidden Markov Models – Sequential modeling represented in a Graphical Model 2

49

Next

• Multilayer Neural Networks