Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CV & ML
Stanford University
04-Dec-2018
1
Lecture:ComputerVisionandMachineLearning
ShubhangDesai,Ranjay Krishna,andJuanCarlosNieblesStanfordVisionandLearningLab
CV & ML
Stanford University
04-Dec-2018
2
Today’sagenda
• Reviewofconvolutionsandclassification• Creatingaconvolution-basedclassifier• Overviewofmachinelearning– Neuralnetworks– Gradientdescent– Backprop
• Ourclassifier’sperformance
CV & ML
Stanford University
04-Dec-2018
3
Recallconvolutions…
12 3 19
25 10 1
9 7 17
1 2
3 4
? ?
? ?
* =
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
4
Recallconvolutions…
12 3 19
25 10 1
9 7 17
1 2
3 4
133 ?
? ?
* =
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
5
Recallconvolutions…
12 3 19
25 10 1
9 7 17
1 2
3 4
133 75
? ?
* =
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
6
Recallconvolutions…
12 3 19
25 10 1
9 7 17
1 2
3 4
133 75
100 ?
* =
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
7
Recallconvolutions…
12 3 19
25 10 1
9 7 17
1 2
3 4
133 75
100 101
* =
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
8
Recallconvolutions…
12 3 19
25 10 1
9 7 17
1 2
3 4
133 75
100 101
* =
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
9
Recallconvolutions…
1 2
3 4
?* =
12 21
18 31
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
10
Recallconvolutions…
1 2
3 4
232* =
12 21
18 31
f [n,m] ⇤ h[n,m]
CV & ML
Stanford University
04-Dec-2018
11
Whytheyareuseful
Allowustofindinterestinginsights/features fromimages!
0 -½ 0
0 0 0
0 ½ 0
* =
CV & ML
Stanford University
04-Dec-2018
12
RecallImageClassification…
Classifier “pupper”
Allowustousefeaturestoputimagesincategories!
Featurizer
CV & ML
Stanford University
04-Dec-2018
13
WaitaMinute…
Convolution=Image->Features
ClassificationAlgorithm=Features->Category
CV & ML
Stanford University
04-Dec-2018
14
WaitaMinute…
Convolution=Image->Features
ClassificationAlgorithm=Features->Category
Let’sput‘em together!
CV & ML
Stanford University
04-Dec-2018
15
InSpecific…
Let’sbuildaconvolution-based classificationalgorithmfortheCIFAR-10dataset(10classes,32x32images):
CV & ML
Stanford University
04-Dec-2018
16
FeatureExtractor
*32x32“Airplane
Filter” =
CV & ML
Stanford University
04-Dec-2018
17
FeatureExtractor
*32x32“Airplane
Filter” =
“probability”oftheimagebeingan
airplane
CV & ML
Stanford University
04-Dec-2018
18
FeatureExtractor
*32x32“Airplane
Filter” =
“probability”oftheimagebeingan
airplane”
CV & ML
Stanford University
04-Dec-2018
19
FeatureExtractor
*32x32“Airplane
Filter” =
“probability”oftheimagebeingan
airplane”
Thisisnotreallyaprobabilitybutascore,becauseitcanbelessthan0andgreaterthan1
CV & ML
Stanford University
04-Dec-2018
20
FeatureExtractor
*32x32
“AutomobileFilter”
=
“probability”oftheimagebeingan
automobile
CV & ML
Stanford University
04-Dec-2018
21
FeatureExtractor
*32x32“Bird
Filter” =
“probability”oftheimagebeinga
bird
CV & ML
Stanford University
04-Dec-2018
22
FeatureExtractor
*32x32“Truck
Filter” =
“probability”oftheimagebeinga
truck
CV & ML
Stanford University
04-Dec-2018
23
Classifier
𝑐"#$% = argmax()
CV & ML
Stanford University
04-Dec-2018
24
Classifier
Wepredicttheclassthathasthehighestprobability!
𝑐"#$% = argmax()
CV & ML
Stanford University
04-Dec-2018
25
TheWholeShebang
Image FeatureExtractor
Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
CV & ML
Stanford University
04-Dec-2018
26
TheWholeShebang
Image FeatureExtractor
Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
CV & ML
Stanford University
04-Dec-2018
27
TheWholeShebang
Image FeatureExtractor
Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
CV & ML
Stanford University
04-Dec-2018
28
Reframingconvolution
1 2
3 4
*
12 21
18 31
CV & ML
Stanford University
04-Dec-2018
29
Reframingconvolution
1 2
3 4
* =
12 21
18 31
12211831
⋅1234
CV & ML
Stanford University
04-Dec-2018
30
ReframedFeatureExtractor
*32x32“Airplane
Filter”
CV & ML
Stanford University
04-Dec-2018
31
ReframedFeatureExtractor
*32x32“Airplane
Filter” =
ImageVector
AirplaneWeightVector
.
CV & ML
Stanford University
04-Dec-2018
32
NewFeatureExtractorImageVector
AirplaneWeightVector
.=
“probability”oftheimagebeingan
airplane
CV & ML
Stanford University
04-Dec-2018
33
NewFeatureExtractorImageVector
AutomobileWeightVector
.=
“probability”oftheimagebeingan
automobile
CV & ML
Stanford University
04-Dec-2018
34
NewFeatureExtractorImageVector
AutomobileWeightVector
.=
“probability”oftheimagebeinga
bird
CV & ML
Stanford University
04-Dec-2018
35
NewFeatureExtractorImageVector
AutomobileWeightVector
.=
“probability”oftheimagebeinga
truck
CV & ML
Stanford University
04-Dec-2018
36
NewFeatureExtractor
=X
WeightMatrix
ImageVector
CV & ML
Stanford University
04-Dec-2018
37
NewFeatureExtractor
𝑊𝑥 = 𝑦0
𝑊: the(10x1024)matrixofweightvectors
𝑥: the(1024x1)imagevector
𝑦0: the(10x1)vectorofclass“probabilities”
CV & ML
Stanford University
04-Dec-2018
38
NewFeatureExtractor
𝑊𝑥 = 𝑦0
𝑊: the(10x1024)matrixofweightvectors
𝑥: the(1024x1)imagevector
𝑦0: the(10x1)vectorofclass“probabilities”
Thissimplecomputationiscalledafully-connectedlayer!
CV & ML
Stanford University
04-Dec-2018
39
Aside:Fully-ConnectedNeuralNetworks
Height
Width
Num_Legs
Algorithm
Dog
NotDog
𝑦0𝑥
CV & ML
Stanford University
04-Dec-2018
40
Aside:Fully-ConnectedNeuralNetworks
Height
Width
Num_Legs
Dog
NotDog
𝑦0𝑥
CV & ML
Stanford University
04-Dec-2018
41
Aside:Fully-ConnectedNeuralNetworks
Height
Width
Num_Legs
𝑤?
Dog
NotDog
𝑤@
𝑤A𝑤B
𝑤C
𝑤D
𝑦0𝑥
CV & ML
Stanford University
04-Dec-2018
42
Aside:Fully-ConnectedNeuralNetworks
𝑤A 𝑤@ 𝑤?𝑤B 𝑤C 𝑤D ⋅ =
𝑦0𝑥𝑊
𝑊𝑥 = 𝑦0
CV & ML
Stanford University
04-Dec-2018
43
Aside:Fully-ConnectedNeuralNetworks
”Fully-Connected” ”NeuralNetwork”
Everynodeisconnectedtoeveryothernode
Kinda lookslikeaneuron!
CV & ML
Stanford University
04-Dec-2018
44
NewFeatureExtractor
𝑊𝑥 = 𝑦0
𝑊: the(10x1024)matrixofweightvectors
𝑥: the(1024x1)imagevector
𝑦0: the(10x1)vectorofclass“probabilities”
CV & ML
Stanford University
04-Dec-2018
45
NewFeatureExtractor
𝑊𝑥 = 𝑦0
𝑊: the(10x1024)matrixofweightvectors
𝑥: the(1024x1)imagevector
𝑦0: the(10x1)vectorofclass“probabilities”?
CV & ML
Stanford University
04-Dec-2018
46
ClassProbabilityVector
• Musthavevaluesbetween0and1
• Mustsumto1
• There’snoguaranteeeitherrequirementissatisfied!
𝑦0 = 𝑊𝑥
CV & ML
Stanford University
04-Dec-2018
47
Softmax Function
1
-3
𝑎
Softmax:𝑎 𝑥 E =$FG
∑ $FI�I
CV & ML
Stanford University
04-Dec-2018
48
Softmax Function
1
-3
𝑎
Softmax:𝑎 𝑥 E =$FG
∑ $FI�I
0.98
0.02
𝑆𝑀(𝑎)
CV & ML
Stanford University
04-Dec-2018
49
ClassProbabilityVector
• Musthavevaluesbetween0and1
• Mustsumto1
𝑦0 = 𝑊𝑥
CV & ML
Stanford University
04-Dec-2018
50
ClassProbabilityVector
• Musthavevaluesbetween0and1
• Mustsumto1
𝑦0 = 𝑆𝑀(𝑊𝑥)
CV & ML
Stanford University
04-Dec-2018
51
Systemsofar…
• Featureextractor:
• Classifier:
𝑐"#$% = argmax(𝑦0)
y0 = 𝑆𝑀(𝑊𝑥)
CV & ML
Stanford University
04-Dec-2018
52
Systemsofar…
• Featureextractor:
• Classifier:
𝑐"#$% = argmax(𝑦0)
y0 = 𝑆𝑀(𝑊𝑥)
CV & ML
Stanford University
04-Dec-2018
53
Systemsofar…
• Featureextractor:
• Classifier:
𝑐"#$% = argmax(𝑦0)
y0 = 𝑆𝑀(𝑊𝑥)
CV & ML
Stanford University
04-Dec-2018
54
Usingthelabel
Let’scompareourpredictionwiththerealanswer!Foreachimage,wehavethelabel𝑦 whichtellsusthetrueclass:
0000010000
Xy
Dogclassindex
CV & ML
Stanford University
04-Dec-2018
55
KeyInsight:
Wewant:
argmax 𝑦0 = argmax 𝑦
CV & ML
Stanford University
04-Dec-2018
56
KeyInsight:
Wewant:
argmax 𝑦0 = argmax 𝑦
Whichwecanaccomplishby:
𝑊∗ = argminR
−Tlog(𝑝X)�
Y,[
CV & ML
Stanford University
04-Dec-2018
57
KeyInsight:
Wewant:
argmax 𝑦0 = argmax 𝑦
Whichwecanaccomplishby:
𝑊∗ = argminR
−Tlog(𝑝X)�
Y,[Where𝑝X istheprobabilityofthetrueclassin𝑦0
CV & ML
Stanford University
04-Dec-2018
58
Cross-EntropyLoss
Ourlossfunctionrepresentshowbadwearecurrentlydoing:
𝐿 = −log(𝑝X)
CV & ML
Stanford University
04-Dec-2018
59
Cross-EntropyLoss
Ourlossfunctionrepresentshowbadwearecurrentlydoing:
𝐿 = −log(𝑝X)
Examples:
𝑝X = 0 → 𝐿 = − log 0 = ∞𝑝X = 0.1 → 𝐿 = − log 0.1 = 2.3𝑝X = 0.9 → 𝐿 = − log 0.9 = 0.1
𝑝X = 1 → 𝐿 = − log 1 = 0
CV & ML
Stanford University
04-Dec-2018
60
Cross-EntropyLoss
Ourlossfunctionrepresentshowbadwearecurrentlydoing:
𝐿 = −log(𝑝X)
Examples:
𝑝X = 0 → 𝐿 = − log 0 = ∞𝑝X = 0.1 → 𝐿 = − log 0.1 = 2.3𝑝X = 0.9 → 𝐿 = − log 0.9 = 0.1
𝑝X = 1 → 𝐿 = − log 1 = 0
Thelargertheloss,theworseourprediction.WewanttominimizeL!
CV & ML
Stanford University
04-Dec-2018
61
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
62
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
63
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
64
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
65
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
66
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
67
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
68
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
69
MinimizingLoss
𝑤
𝐿
CV & ML
Stanford University
04-Dec-2018
70
GradientDescentPseudocode
for i in {0,…,num_epochs}:for x, y in data:
𝑦0 = 𝑆𝑀 𝑊𝑥𝐿 = 𝐶𝐸 𝑦0, 𝑦%c%R =? ? ?
𝑊 ≔ 𝑊 − 𝛼 %c%R
CV & ML
Stanford University
04-Dec-2018
71
GettingtheGradient
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 =
𝑑𝐿𝑑𝑧
𝑑𝑧𝑑𝑊
CV & ML
Stanford University
04-Dec-2018
72
GettingtheGradient
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 =
𝑑𝐿𝑑𝑧 (𝑥)
CV & ML
Stanford University
04-Dec-2018
73
GettingtheGradient
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 = (𝑆𝑀(𝑧) − 𝑦)(𝑥)
CV & ML
Stanford University
04-Dec-2018
74
GettingtheGradient
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 = (𝑆𝑀(𝑧) − 𝑦)(𝑥i)
CV & ML
Stanford University
04-Dec-2018
75
GettingtheGradient
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 = (𝑆𝑀(𝑧) − 𝑦)(𝑥i)
CV & ML
Stanford University
04-Dec-2018
76
WhatisBackprop?
𝑊
𝑥
𝑦
×
𝑆𝐶𝐸
𝑧
𝐿
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 =
𝑑𝐿𝑑𝑧
𝑑𝑧𝑑𝑊
CV & ML
Stanford University
04-Dec-2018
77
WhatisBackprop?
𝑊
𝑥
𝑦
×
𝑆𝐶𝐸
𝑧
𝐿
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 =
𝑑𝐿𝑑𝑧
𝑑𝑧𝑑𝑊
𝑑𝐿𝑑𝑧
𝑑𝑧𝑑𝑊
CV & ML
Stanford University
04-Dec-2018
78
WhatisBackprop?
𝑊
𝑥
𝑦
×
𝑆𝐶𝐸
𝑧
𝐿
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 =
𝑑𝐿𝑑𝑧
𝑑𝑧𝑑𝑊
𝑆𝑀(𝑧) − 𝑦
𝑥
CV & ML
Stanford University
04-Dec-2018
79
WhatisBackprop?
𝑊
𝑥
𝑦
×
𝑆𝐶𝐸
𝑧
𝐿
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 =
𝑑𝐿𝑑𝑧
𝑑𝑧𝑑𝑊
𝑆𝑀(𝑧) − 𝑦
𝑥
Whencomputationsaretreatedasnodes,allderivativesdependonlyoninputstothatnode.
CV & ML
Stanford University
04-Dec-2018
80
WhatisBackprop?
𝑊
𝑥
𝑦
×
𝑆𝐶𝐸
𝑧
𝐿
𝑧 = 𝑊𝑥𝐿 = 𝑆𝐶𝐸 𝑧, 𝑦
𝑑𝐿𝑑𝑊 =
𝑑𝐿𝑑𝑧
𝑑𝑧𝑑𝑊
𝑆𝑀(𝑧) − 𝑦
𝑥
Whencomputationsaretreatedasnodes,allderivativesdependonlyoninputstothatnode.
cacheX = {x}
cacheSCE = {z, y}
So,wecancachetheinitialcomputationandreuse!
CV & ML
Stanford University
04-Dec-2018
81
Backprop-FriendlyCode
class FullyConnected:def __init__(self):
self.cache = {}
def forward(self, W, x):self.cache[‘x’] = x
return np.dot(W, x)
def backward(self, dout):x = self.cache[‘x’]
return np.matmul(dout, x.T)
class SCELoss:def __init__(self):
self.cache = {}
def forward(self, z, y):self.cache[‘z’] = zself.cache[‘y’] = y
return tf.sce(z, y)
def backward(self):z = self.cache[‘z’]y = self.cache[‘y’]
return tf.sm(z) - y
CV & ML
Stanford University
04-Dec-2018
82
GradientDescentPseudocode(Updated)
for i in {0,…,num_epochs}:for x, y in data:
𝑦0 = 𝑆𝑀 𝑊𝑥𝐿 = 𝐶𝐸 𝑦0, 𝑦%c%R =? ? ?
𝑊 ≔ 𝑊 − 𝛼 %c%R
CV & ML
Stanford University
04-Dec-2018
83
GradientDescentPseudocode(Updated)
for i in {0,…,num_epochs}:for x, y in data:
𝑦0 = 𝑆𝑀 𝑊𝑥𝐿 = 𝐶𝐸 𝑦0, 𝑦%c%R = backprop(L)
𝑊 ≔𝑊 − 𝛼 %c%R
CV & ML
Stanford University
04-Dec-2018
84
GradientDescentPseudocode(Updated)
for i in {0,…,num_epochs}:for x, y in data:
𝑦0 = 𝑆𝑀 𝑊𝑥𝐿 = 𝐶𝐸 𝑦0, 𝑦%c%R = backprop(L)
𝑊 ≔𝑊 − 𝛼 %c%R
CV & ML
Stanford University
04-Dec-2018
85
OurClassificationSystem
Image FeatureExtractor
Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
CV & ML
Stanford University
04-Dec-2018
86
OurClassificationSystem(modified)
InputImage Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
InputLabel
𝑦 𝐶𝐸 𝐿LossValueLossFunction
FeatureExtractor
CV & ML
Stanford University
04-Dec-2018
87
OurClassificationSystem(modified)
InputImage Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
InputLabel
𝑦 𝐶𝐸 𝐿LossValueLossFunction
FeatureExtractor
CV & ML
Stanford University
04-Dec-2018
88
OurClassificationSystem(modified)
InputImage Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
InputLabel
𝑦 𝐶𝐸 𝐿LossValueLossFunction
FeatureExtractor
CV & ML
Stanford University
04-Dec-2018
89
OurClassificationSystem(modified)
InputImage Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
InputLabel
𝑦 𝐶𝐸 𝐿LossValueLossFunction
FeatureExtractor
3)Usinggradientdescent!
CV & ML
Stanford University
04-Dec-2018
90
OurClassificationSystem(modified)
InputImage Prediction𝑦0 Classifier
𝑎𝑟𝑔𝑚𝑎𝑥 𝑐"#$%
ClassificationOutput
InputLabel
𝑦 𝐶𝐸 𝐿LossValueLossFunction
FeatureExtractor
3)Usinggradientdescent!
CV & ML
Stanford University
04-Dec-2018
91
OurSystem’sPerformance
• ~40%accuracyonCIFAR-10test– Bestclass:Truck(~60%)–Worstclass:Horse(~16%)
• Checkoutthemodelat:https://tinyurl.com/cifar10
• Whataboutthefilters?Whatdotheylooklike?
CV & ML
Stanford University
04-Dec-2018
92
VisualizingtheFilters
CV & ML
Stanford University
04-Dec-2018
93
NextTime…
Buildingastrongerconvolution-basedfeatureextractor
Historyofdeeplearning+computervision(ConvolutionalNeuralNets!)
ApplicationsofCNNs