Upload
jake-glass
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
1/31/08 CS 461, Winter 2009 1
CS 461: Machine LearningLecture 4
Dr. Kiri [email protected]. Kiri [email protected]
1/31/08 CS 461, Winter 2009 2
Plan for Today
Solution to HW 2 Support Vector Machines Neural Networks
Perceptrons Multilayer Perceptrons
1/31/08 CS 461, Winter 2009 3
Review from Lecture 3
Decision trees Regression trees, pruning, extracting rules
Evaluation Comparing two classifiers: McNemar’s test
Support Vector Machines Classification
Linear discriminants, maximum margin Learning (optimization): gradient descent, QP
1/31/08 CS 461, Winter 2009 4
Neural Networks
Chapter 11
It Is Pitch Dark
Chapter 11
It Is Pitch Dark
1/31/08 CS 461, Winter 2009 5
Perceptron
[Alpaydin 2004 The MIT Press]
Graphical
[ ][ ]Td
Td
Td
jjj
x,...,x,
w,...,w,w
wxwy
1
10
01
1=
=
=+=∑=
x
w
xw
Math
1/31/08 CS 461, Winter 2009 6
“Smooth” Output: Sigmoid Function
€
y = sigmoid wTx( ) =1
1+ exp −wTx[ ]
€
1. Calculate g x( ) = wTx and choose C1 if g x( ) > 0, or
2. Calculate y = sigmoid wTx( ) and choose C1 if y > 0.5
Why?
• Converts output to probability!
• Less “brittle” boundary
1/31/08 CS 461, Winter 2009 7
K outputsRegression:
xy
xw
W=
=+=∑=
Tii
d
jjiji wxwy 0
1
[Alpaydin 2004 The MIT Press]
kk
i
i
k k
ii
Tii
yy
C
oo
y
o
maxif
choose
expexp
=
=
=
∑
xw
Softmax
Classification:
1/31/08 CS 461, Winter 2009 8
Training a Neural Network
1. Randomly initialize weights2. Update =
Learning rate * (Desired - Actual) * Input
€
Δw jt = η y t − ˆ y t( )x j
t
1/31/08 CS 461, Winter 2009 9
Learning Boolean AND
[Alpaydin 2004 The MIT Press]
€
Δw jt = η y t − ˆ y t( )x j
t
Perceptron demo
1/31/08 CS 461, Winter 2009 10
Multilayer Perceptrons = MLP = ANN
[Alpaydin 2004 The MIT Press]
€
y i = viT z = v ihzh + v i0
h=1
H
∑
zh = sigmoid whTx( )
=1
1+ exp − whj x j + wh 0j=1
d
∑ ⎛ ⎝ ⎜ ⎞
⎠ ⎟ ⎡
⎣ ⎢ ⎤ ⎦ ⎥
1/31/08 CS 461, Winter 2009 11
x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
[Alpaydin 2004 The MIT Press]
1/31/08 CS 461, Winter 2009 12
Examples
Digit Recognition Ball Balancing
1/31/08 CS 461, Winter 2009 13
ANN vs. SVM
SVM with sigmoid kernel = 2-layer MLP Parameters
ANN: # hidden layers, # nodes SVM: kernel, kernel params, C
Optimization ANN: local minimum (gradient descent) SVM: global minimum (QP)
Interpretability? About the same… So why SVMs?
Sparse solution, geometric interpretation, less likely to overfit data
1/31/08 CS 461, Winter 2009 14
Summary: Key Points for Today
Support Vector Machines Neural Networks
Perceptrons Sigmoid Training by gradient descent
Multilayer Perceptrons
ANN vs. SVM
1/31/08 CS 461, Winter 2009 15
Next Time
Midterm Exam! 9:10 – 10:40 a.m. Open book, open notes (no computer) Covers all material through today
Neural Networks(read Ch. 11.1-11.8)
Questions to answer from the reading Posted on the website (calendar) Three volunteers?