Upload
vikram360
View
236
Download
0
Embed Size (px)
Citation preview
8/13/2019 Jul09 Hinton Deeplearn
1/126
UCL Tutorial on:Deep Belief Nets
(An updated and extended version of my 2007 N!" tutorial#
$eoffrey %inton
Canadian nstitute for Advan&ed 'esear&
)
Department of Computer "&ien&e
University of Toronto
8/13/2019 Jul09 Hinton Deeplearn
2/126
"&edule for te Tutorial
* 2+00 , -+-0 Tutorial part .
* -+-0 , -+/ 1uestions
* -+/ /+. Tea Brea3
* /+. , +/ Tutorial part 2
* +/ , 4+00 1uestions
8/13/2019 Jul09 Hinton Deeplearn
3/126
"ome tin5s you 6ill learn in tis tutorial
* %o6 to learn multilayer 5enerative models of unlaelled
data y learnin5 one layer of features at a time+, %o6 to add 8ar3ov 'andom 9ields in ea& idden layer+
* %o6 to use 5enerative models to ma3e dis&riminativetrainin5 metods 6or3 mu& etter for &lassifi&ation and
re5ression+, %o6 to extend tis approa& to $aussian !ro&esses and
o6 to learn &omplex domainspe&ifi& 3ernels for a$aussian !ro&ess+
* %o6 to perform nonlinear dimensionality redu&tion on verylar5e datasets
, %o6 to learn inary lo6dimensional &odes and o6 touse tem for very fast do&ument retrieval+
* %o6 to learn multilayer 5enerative models of i5
dimensional se;uential data+
8/13/2019 Jul09 Hinton Deeplearn
4/126
A spe&trum of ma&ine learnin5 tas3s
* Lo6dimensional data (e+5+
less tan .00 dimensions#
* Lots of noise in te data
* Tere is not mu& stru&ture in
te data and 6at stru&ture
tere is &an e represented y
a fairly simple model+
* Te main prolem is
distin5uisin5 true stru&ture
from noise+
* %i5dimensional data (e+5+
more tan .00 dimensions#
* Te noise is not suffi&ient to
os&ure te stru&ture in te
data if 6e pro&ess it ri5t+* Tere is a u5e amount of
stru&turein te data ut te
stru&ture is too &ompli&ated to
e represented y a simple
model+
* Te main prolem is fi5urin5
out a 6ay to represent te
&ompli&ated stru&ture so tat it
&an e learned+
Typi&al "tatisti&sArtifi&ial ntelli5en&e
8/13/2019 Jul09 Hinton Deeplearn
5/126
%istori&al a&35round:9irst 5eneration neural net6or3s
* !er&eptrons (e o?e&ts y
learnin5 o6 to 6ei5ttese features+
, Tere 6as a neatlearnin5 al5oritm forad?ustin5 te 6ei5ts+
, But per&eptrons arefundamentally limitedin 6at tey &an learnto do+
nonadaptive
and&oded
features
output units
e+5+ &lass laels
input units
e+5+ pixels
"3et& of a typi&al
per&eptron from te .=40@s
Bom Toy
8/13/2019 Jul09 Hinton Deeplearn
6/126
"e&ond 5eneration neural net6or3s (
8/13/2019 Jul09 Hinton Deeplearn
7/126
A temporary di5ression
* apni3 and is &o6or3ers developed a very &lever type
of per&eptron &alled a "upport e&tor 8a&ine+
, nstead of and&odin5 te layer of nonadaptive
features ea& trainin5 example is used to &reate a
ne6 feature usin5 a fixed re&ipe+* Te feature &omputes o6 similar a test example is to tat
trainin5 example+
, Ten a &lever optimi>ation teue is used to sele&t
te est suset of te features and to de&ide o6 to
6ei5t ea& feature 6en &lassifyin5 a test &ase+* But its ?ust a per&eptron and as all te same limitations+
* n te .==0@s many resear&ers aandoned neural
net6or3s 6it multiple adaptive idden layers e&ause
"upport e&tor 8a&ines 6or3ed etter+
8/13/2019 Jul09 Hinton Deeplearn
8/126
8/13/2019 Jul09 Hinton Deeplearn
9/126
Ever&omin5 te limitations of a&3
propa5ation
* Feep te effi&ien&y and simpli&ity of usin5 a
5radient metod for ad?ustin5 te 6ei5ts ut use
it for modelin5 te stru&ture of te sensory input+
,Ad?ust te 6ei5ts to maximi>e te proailitytat a 5enerative model 6ould ave produ&ed
te sensory input+
, Learn p(ima5e# not p(lael G ima5e#
* f you 6ant to do &omputer vision first learn
&omputer 5rapi&s
* at 3ind of 5enerative model sould 6e learn
8/13/2019 Jul09 Hinton Deeplearn
10/126
Belief Nets
* A elief net is a dire&ted
a&y&li& 5rap &omposed of
sto&asti& variales+
* e 5et to oserve some of
te variales and 6e 6ould
li3e to solve t6o prolems:
* Te inferen&e prolem:nfer
te states of te unoserved
variales+
* Te learnin5 prolem:Ad?ust
te intera&tions et6een
variales to ma3e te
net6or3 more li3ely to
5enerate te oserved data+
sto&asti&idden
&ause
visile
effe&t
e 6ill use nets &omposed of
layers of sto&asti& inary variales
6it 6ei5ted &onne&tions+ Later
6e 6ill 5enerali>e to oter types of
variale+
8/13/2019 Jul09 Hinton Deeplearn
11/126
"to&asti& inary units(Bernoulli variales#
* Tese ave a state of .
or 0+
* Te proaility ofturnin5 on is determined
y te 6ei5ted input
from oter units (plus a
ias#
0
0
.
+==
j
jijii
wsbsp
)exp(1)(
11
+j
jiji wsb
)( 1=isp
8/13/2019 Jul09 Hinton Deeplearn
12/126
Learnin5 Deep Belief Nets
* t is easy to 5enerate anuniased example at te
leaf nodes so 6e &an see
6at 3inds of data te
net6or3 elieves in+
* t is ard to infer te
posterior distriution over
all possile &onfi5urations
of idden &auses+
* t is ard to even 5et asample from te posterior+
* "o o6 &an 6e learn deep
elief nets tat ave
millions of parameters
sto&asti&
idden
&ause
visile
effe&t
8/13/2019 Jul09 Hinton Deeplearn
13/126
Te learnin5 rule for si5moid elief nets
* Learnin5 is easy if 6e &an
5et an uniased sample
from te posterior
distriution over idden
states 5iven te oserveddata+
* 9or ea& unit maximi>e
te lo5 proaility tat itsinary state in te sample
from te posterior 6ould e
5enerated y te sampled
inary states of its parents+
+==
j
jijii
wsspp
)exp(1)(
11
?
i
jiw
)( iijji pssw =
is
js
learnin5
rate
8/13/2019 Jul09 Hinton Deeplearn
14/126
Hxplainin5 a6ay (Iudea !earl#
* Hven if t6o idden &auses are independent tey &ane&ome dependent 6en 6e oserve an effe&t tat tey &an
ot influen&e+
, f 6e learn tat tere 6as an eart;ua3e it redu&es te
proaility tat te ouse ?umped e&ause of a tru&3+
tru&3 its ouse eart;ua3e
ouse ?umps
20 20
20
.0 .0
p(..#J+000.
p(.0#J+/===
p(0.#J+/===
p(00#J+000.
posterior
8/13/2019 Jul09 Hinton Deeplearn
15/126
y it is usually very ard to learn
si5moid elief nets one layer at a time* To learn 6e need te posterior
distriution in te first idden layer+
* !rolem .: Te posterior is typi&ally&ompli&ated e&ause of Kexplainin5a6ay+
* !rolem 2:Te posterior dependson te prior as 6ell as te li3eliood+
, "o to learn 6e need to 3no6te 6ei5ts in i5er layers evenif 6e are only approximatin5 te
posterior+All te 6ei5ts intera&t+* !rolem -:e need to inte5rate
over all possile &onfi5urations ofte i5er variales to 5et te prior
for first idden layer+ Mu3
data
idden variales
idden variales
idden variales
li3eliood
prior
8/13/2019 Jul09 Hinton Deeplearn
16/126
"ome metods of learnin5
deep elief nets
* 8onte Carlo metods &an e used to sample
from te posterior+
, But its painfully slo6 for lar5e deep models+
* n te .==0@s people developed variationalmetods for learnin5 deep elief nets
, Tese only 5et approximate samples from te
posterior+, Neveteless te learnin5 is still 5uaranteed to
improve a variational ound on te lo5
proaility of 5eneratin5 te oserved data+
8/13/2019 Jul09 Hinton Deeplearn
17/126
Te rea3trou5 tat ma3es deep
learnin5 effi&ient
* To learn deep nets effi&iently 6e need to learn one layer
of features at a time+ Tis does not 6or3 6ell if 6e
assume tat te latent variales are independent in te
prior :
, Te latent variales are not independent in te
posterior so inferen&e is ard for nonlinear models+
, Te learnin5 tries to find independent &auses usin5
one idden layer 6i& is not usually possile+
* e need a 6ay of learnin5 one layer at a time tat ta3es
into a&&ount te fa&t tat 6e 6ill e learnin5 more
idden layers later+
, e solve tis prolem y usin5 an undire&ted model+
8/13/2019 Jul09 Hinton Deeplearn
18/126
T6o types of 5enerative neural net6or3
* f 6e &onne&t inary sto&asti& neurons in a
dire&ted a&y&li& 5rap 6e 5et a "i5moid Belief
Net ('adford Neal .==2#+
* f 6e &onne&t inary sto&asti& neurons usin5
symmetri& &onne&tions 6e 5et a Bolt>mann
8a&ine (%inton ) "e?no6s3i .=-#+
, f 6e restri&t te &onne&tivity in a spe&ial 6ay
it is easy to learn a Bolt>mann ma&ine+
8/13/2019 Jul09 Hinton Deeplearn
19/126
'estri&ted Bolt>mann 8a&ines("molens3y .=4 &alled tem Karmoniums#
* e restri&t te &onne&tivity to ma3e
learnin5 easier+
, Enly one layer of idden units+
* e 6ill deal 6it more layers later
, No &onne&tions et6een idden units+
* n an 'B8 te idden units are
&onditionally independent 5iven te
visile states+
, "o 6e &an ;ui&3ly 5et an uniased
sample from te posterior distriution
6en 5iven a datave&tor+
, Tis is a i5 advanta5e over dire&ted
elief nets
idden
i
?
visile
8/13/2019 Jul09 Hinton Deeplearn
20/126
Te Hner5y of a ?oint &onfi5uration(i5norin5 terms to do 6it iases#
= ji ijji whvv,hE ,)(6ei5t et6een
units i and ?
Hner5y 6it &onfi5uration
von te visile units and
on te idden units
inary state of
visile unit i
inary state of
idden unit ?
ji
ij
hvw
hvE=
),(
8/13/2019 Jul09 Hinton Deeplearn
21/126
ei5tsHner5ies!roailities
* Ha& possile ?oint &onfi5uration of te visileand idden units as an ener5y
, Te ener5y is determined y te 6ei5ts and
iases (as in a %opfield net#+* Te ener5y of a ?oint &onfi5uration of te visile
and idden units determines its proaility:
* Te proaility of a &onfi5uration over te visile
units is found y summin5 te proailities of all
te ?oint &onfi5urations tat &ontain it+
),(),(
hvEhvp e
8/13/2019 Jul09 Hinton Deeplearn
22/126
Usin5 ener5ies to define proailities
* Te proaility of a ?oint&onfi5uration over ot visile
and idden units depends on
te ener5y of tat ?oint
&onfi5uration &ompared 6itte ener5y of all oter ?oint
&onfi5urations+
* Te proaility of a&onfi5uration of te visile
units is te sum of te
proailities of all te ?oint
&onfi5urations tat &ontain it+
=
gu
guE
hvE
eehvp
,
),(
),(
),(
=
gu
guEh
hvE
e
e
vp
,
),(
),(
)(
partition
fun&tion
8/13/2019 Jul09 Hinton Deeplearn
23/126
A pi&ture of te maximum li3eliood learnin5
al5oritm for an 'B8
0>< jihv
>< jihv
i
?
i
?
i
?
i
?
t J 0 t J . t J 2 t J infinity
>
8/13/2019 Jul09 Hinton Deeplearn
24/126
A ;ui&3 6ay to learn an 'B8
0>< jihv
1>< jihv
i
?
i
?
t J 0 t J .
)( 10
>
8/13/2019 Jul09 Hinton Deeplearn
25/126
%o6 to learn a set of features tat are 5ood for
re&onstru&tin5 ima5es of te di5it 2
0 inary
feature
neurons
.4 x .4pixel
ima5e
0 inary
feature
neurons
.4 x .4pixel
ima5e
n&rement6ei5tset6een an a&tive
pixel and an a&tive
feature
De&rement 6ei5tset6een an a&tive
pixel and an a&tive
feature
data(reality#
re&onstru&tion
(etter tan reality#
8/13/2019 Jul09 Hinton Deeplearn
26/126
Te final 0x 24 6ei5ts
Ha& neuron 5ras a different feature+
8/13/2019 Jul09 Hinton Deeplearn
27/126
'e&onstru&tion
from a&tivated
inary featuresData
'e&onstru&tion
from a&tivated
inary featuresData
%o6 6ell &an 6e re&onstru&t te di5it ima5es
from te inary feature a&tivations
Ne6 test ima5es fromte di5it &lass tat te
model 6as trained on
ma5es from an
unfamiliar di5it &lass
(te net6or3 tries to see
every ima5e as a 2#
8/13/2019 Jul09 Hinton Deeplearn
28/126
Tree 6ays to &omine proaility density
models (an underlyin5 teme of te tutorial#
* Mixture: Ta3e a 6ei5ted avera5e of te distriutions+
, t &an never e sarper tan te individual distriutions+t@s a very 6ea3 6ay to &omine models+
* Product:8ultiply te distriutions at ea& point and tenrenormali>e (tis is o6 an 'B8 &omines te distriutions definedy ea& idden unit#
, Hxponentiallymore po6erful tan a mixture+ Tenormali>ation ma3es maximum li3eliood learnin5
diffi&ult ut approximations allo6 us to learn any6ay+* Composition:Use te values of te latent variales of onemodel as te data for te next model+
, or3s 6ell for learnin5 multiple layers of representationut only if te individual models are undire&ted+
8/13/2019 Jul09 Hinton Deeplearn
29/126
Trainin5 a deep net6or3(te main reason 'B8@s are interestin5#
* 9irst train a layer of features tat re&eive input dire&tly
from te pixels+
* Ten treat te a&tivations of te trained features as if
tey 6ere pixels and learn features of features in a
se&ond idden layer+
* t &an e proved tat ea& time 6e add anoter layer of
features 6e improve a variational lo6er ound on te lo5
proaility of te trainin5 data+
, Te proof is sli5tly &ompli&ated+
, But it is ased on a neat e;uivalen&e et6een an
'B8 and a deep dire&ted model (des&ried later#
8/13/2019 Jul09 Hinton Deeplearn
30/126
Te 5enerative model after learnin5 - layers
* To 5enerate data:
.+ $et an e;uilirium sample
from te toplevel 'B8 y
performin5 alternatin5 $is
samplin5 for a lon5 time+2+ !erform a topdo6n pass to
5et states for all te oter
layers+
"o te lo6er level ottomup
&onne&tions are not part of
te 5enerative model+ Tey
are ?ust used for inferen&e+
2
data
.
-
2W
3W
1W
d d l i 3
8/13/2019 Jul09 Hinton Deeplearn
31/126
y does 5reedy learnin5 6or3An aside: Avera5in5 fa&torial distriutions
* f you avera5e some fa&torial distriutions you
do NET 5et a fa&torial distriution+
, n an 'B8 te posterior over te idden units
is fa&torial for ea& visile ve&tor+, But te a55re5ated posterior over all trainin5
&ases is not fa&torial (even if te data 6as
5enerated y te 'B8 itself#+
8/13/2019 Jul09 Hinton Deeplearn
32/126
y does 5reedy learnin5 6or3
* Ha& 'B8 &onverts its data distriutioninto an a55re5ated posterior distriutionover its idden units+
* Tis divides te tas3 of modelin5 itsdata into t6o tas3s:
, Tas3 .:Learn 5enerative 6ei5tstat &an &onvert te a55re5ated
posterior distriution over te iddenunits a&3 into te data distriution+
, Tas3 2:Learn to model tea55re5ated posterior distriutionover te idden units+
, Te 'B8 does a 5ood ?o of tas3 .and a moderately 5ood ?o of tas3 2+
* Tas3 2 is easier (for te next 'B8# tanmodelin5 te ori5inal data e&ause tea55re5ated posterior distriution is&loser to a distriution tat an 'B8 &an
model perfe&tly+
data distriution
on visile units
a55re5ated
posterior distriutionon idden units
)|( Whp
),|( Whvp
Tas3 2
Tas3 .
8/13/2019 Jul09 Hinton Deeplearn
33/126
y does 5reedy learnin5 6or3
=h
hvphpvp )|()()(
Te 6ei5ts in te ottom level 'B8 definep(vG# and tey also indire&tly define p(#+
"o 6e &an express te 'B8 model as
f 6e leave p(vG# alone and improve p(# 6e 6ill
improve p(v#+
To improve p(# 6e need it to e a etter model of
te a55re5ated posteriordistriution over idden
ve&tors produ&ed y applyin5 to te data+
8/13/2019 Jul09 Hinton Deeplearn
34/126
i& distriutions are fa&torial in a
dire&ted elief net
* n a dire&ted elief net 6it one idden layer te
posterior over te idden units p(Gv# is non
fa&torial (due to explainin5 a6ay#+
, Te a55re5ated posterior is fa&torial if te
data 6as 5enerated y te dire&ted model+
* t@s te opposite 6ay round from an undire&ted
model 6i& as fa&torial posteriors and a nonfa&torial prior p(# over te iddens+
* Te intuitions tat people ave from usin5 dire&ted
models are very misleadin5 for undire&ted models+
8/13/2019 Jul09 Hinton Deeplearn
35/126
y does 5reedy learnin5 fail in a dire&ted module
* A dire&ted module also &onverts its datadistriution into an a55re5ated posterior
, Tas3 .Te learnin5 is no6 ardere&ause te posterior for ea& trainin5&ase is nonfa&torial+
,Tas3 2is performed usin5 anindependent prior+ Tis is a very adapproximation unless te a55re5atedposterior is &lose to fa&torial+
* A dire&ted module attempts to ma3e te
a55re5ated posterior fa&torial in one step+, Tis is too diffi&ult and leads to a ad
&ompromise+ Tere is also no5uarantee tat te a55re5atedposterior is easier to model tan tedata distriution+
data distriution
on visile units
)|( 2Whp
),|( 1Whvp
Tas3 2
Tas3 .
a55re5ated
posterior distriutionon idden units
8/13/2019 Jul09 Hinton Deeplearn
36/126
A model of di5it re&o5nition
2000 toplevel neurons
00 neurons
00 neurons
2 x 2
pixel
ima5e
.0 lael
neurons
Te model learns to 5enerate
&ominations of laels and ima5es+
To perform re&o5nition 6e start 6it aneutral state of te lael units and do
an uppass from te ima5e follo6ed
y a fe6 iterations of te toplevel
asso&iative memory+
Te top t6o layers form an
asso&iative memory 6oseener5y lands&ape models te lo6
dimensional manifolds of te
di5its+
Te ener5y valleys ave names
8/13/2019 Jul09 Hinton Deeplearn
37/126
9inetunin5 6it a &ontrastive version of te
K6a3esleep al5oritm
After learnin5 many layers of features 6e &an finetune
te features to improve 5eneration+
.+ Do a sto&asti& ottomup pass
,Ad?ust te topdo6n 6ei5ts to e 5ood at
re&onstru&tin5 te feature a&tivities in te layer elo6+
-+ Do a fe6 iterations of samplin5 in te top level 'B8
Ad?ust te 6ei5ts in te toplevel 'B8+
/+ Do a sto&asti& topdo6n pass,Ad?ust te ottomup 6ei5ts to e 5ood at
re&onstru&tin5 te feature a&tivities in te layer aove+
8/13/2019 Jul09 Hinton Deeplearn
38/126
"o6 te movie of te net6or35eneratin5 di5its
(availale at 666+&s+torontoO
8/13/2019 Jul09 Hinton Deeplearn
39/126
"amples 5enerated y lettin5 te asso&iative
memory run 6it one lael &lamped+ Tere are
.000 iterations of alternatin5 $is samplin5
et6een samples+
8/13/2019 Jul09 Hinton Deeplearn
40/126
Hxamples of &orre&tly re&o5ni>ed and6ritten di5its
tat te neural net6or3 ad never seen efore
ts very
5ood
8/13/2019 Jul09 Hinton Deeplearn
41/126
%o6 6ell does it dis&riminate on 8N"T test set 6it
no extra information aout 5eometri& distortions
* $enerative model ased on 'B8@s .+2P
* "upport e&tor 8a&ine (De&oste et+ al+# .+/P
* Ba&3prop 6it .000 iddens (!latt#
8/13/2019 Jul09 Hinton Deeplearn
42/126
Unsupervised Kpretrainin5 also elps for
models tat ave more data and etter priors
* 'an>ato et+ al+ (N!" 2004# used an additional
400000 distorted di5its+
* Tey also used &onvolutional multilayer neural
net6or3s tat ave some uiltin lo&altranslational invarian&e+
Ba&3propa5ation alone: 0+/=P
Unsupervised layerylayer
pretrainin5 follo6ed y a&3prop: 0+-=P (re&ord#
8/13/2019 Jul09 Hinton Deeplearn
43/126
Anoter vie6 of 6y layerylayer
learnin5 6or3s (%inton Esindero ) Te 2004#
* Tere is an unexpe&ted e;uivalen&e et6een
'B8@s and dire&ted net6or3s 6it many layers
tat all use te same 6ei5ts+
, Tis e;uivalen&e also 5ives insi5t into 6y
&ontrastive diver5en&e learnin5 6or3s+
8/13/2019 Jul09 Hinton Deeplearn
44/126
An infinite si5moid elief net
tat is e;uivalent to an 'B8
* Te distriution 5enerated y tis
infinite dire&ted net 6it repli&ated
6ei5ts is te e;uilirium distriution
for a &ompatile pair of &onditional
distriutions: p(vG# and p(Gv# tatare ot defined y
,A topdo6n pass of te dire&ted
net is exa&tly e;uivalent to lettin5
a 'estri&ted Bolt>mann 8a&inesettle to e;uilirium+
, "o tis infinite dire&ted net
defines te same distriution as
an 'B8+
W
v.
.
v0
0
v2
2
TW
TW
TW
W
W
et&+
8/13/2019 Jul09 Hinton Deeplearn
45/126
* Te variales in 0 are &onditionallyindependent 5iven v0+
, nferen&e is trivial+ e ?ust
multiply v0 y transpose+
, Te model aove 0 implementsa &omplementary prior+
, 8ultiplyin5 v0 y transpose5ives te produ&tof te li3eliood
term and te prior term+* nferen&e in te dire&ted net isexa&tly e;uivalent to lettin5 a'estri&ted Bolt>mann 8a&ine settleto e;uilirium startin5 at te data+
nferen&e in a dire&ted net
6it repli&ated 6ei5ts
W
v.
.
v0
0
v2
2
TW
TW
TW
W
W
et&+
R
R
R
R
8/13/2019 Jul09 Hinton Deeplearn
46/126
* Te learnin5 rule for a si5moid elief
net is:
* it repli&ated 6ei5ts tis e&omes:
W
v.
.
v0
0
v2
2
T
W
TW
TW
W
W
et&+
0
i
s
0
js
1js
2
js
1
is
2
is
+
+
+
ij
iij
jji
iij
ss
sss
sss
sss
...)(
)(
)(
211
101
100
TW
TW
TW
W
W
)( iijij sssw
8/13/2019 Jul09 Hinton Deeplearn
47/126
* 9irst learn 6it all te 6ei5ts tied, Tis is exa&tly e;uivalent to
learnin5 an 'B8
, Contrastive diver5en&e learnin5
is e;uivalent to i5norin5 te smallderivatives &ontriuted y te tied
6ei5ts et6een deeper layers+
Learnin5 a deep dire&ted
net6or3
W
W
v.
.
v0
0
v2
2
TW
TW
TW
W
et&+
v0
0
W
8/13/2019 Jul09 Hinton Deeplearn
48/126
* Ten free>e te first layer of 6ei5ts
in ot dire&tions and learn te
remainin5 6ei5ts (still tied
to5eter#+
, Tis is e;uivalent to learnin5
anoter 'B8 usin5 te
a55re5ated posterior distriution
of 0 as te data+
W
v.
.
v0
0
v2
2
TW
TW
TW
W
et&+
frozenW
v.
0
W
TfrozenW
8/13/2019 Jul09 Hinton Deeplearn
49/126
%o6 many layers sould 6e use and o6
6ide sould tey e
* Tere is no simple ans6er+
, Hxtensive experiments y Mosua Ben5io@s 5roup
(des&ried later# su55est tat several idden layers is
etter tan one+, 'esults are fairly roust a5ainst &an5es in te si>e of a
layer ut te top layer sould e i5+
* Deep elief nets 5ive teir &reator a lot of freedom+
, Te est 6ay to use tat freedom depends on te tas3+, it enou5 narro6 layers 6e &an model any distriution
over inary ve&tors ("uts3ever ) %inton 2007#
8/13/2019 Jul09 Hinton Deeplearn
50/126
at appens 6en te 6ei5ts in i5er layers
e&ome different from te 6ei5ts in te first layer
* Te i5er layers no lon5er implement a &omplementaryprior+, "o performin5 inferen&e usin5 te fro>en 6ei5ts in
te first layer is no lon5er &orre&t+ But its still pretty5ood+
, Usin5 tis in&orre&t inferen&e pro&edure 5ives avariational lo6er ound on te lo5 proaility of tedata+
* Te i5er layers learn a prior tat is &loser to te
a55re5ated posterior distriution of te first idden layer+, Tis improves te net6or3@s model of te data+
* %inton Esindero and Te (2004# prove tat tisimprovement is al6ays i55er tan te loss in te variationalound &aused y usin5 less a&&urate inferen&e+
8/13/2019 Jul09 Hinton Deeplearn
51/126
An improved version of Contrastive
Diver5en&e learnin5 (if time permits#
* Te main 6orry 6it CD is tat tere 6ill e deepminima of te ener5y fun&tion far a6ay from tedata+, To find tese 6e need to run te 8ar3ov &ain for
a lon5 time (maye tousands of steps#+, But 6e &annot afford to run te &ain for too lon5for ea& update of te 6ei5ts+
* 8aye 6e &an run te same 8ar3ov &ain overmany 6ei5t updates (Neal .==2#
, f te learnin5 rate is very small tis sould ee;uivalent to runnin5 te &ain for many stepsand ten doin5 a i55er 6ei5t update+
8/13/2019 Jul09 Hinton Deeplearn
52/126
!ersistent CD(Ti?men Teileman C8L 200 ) 200=#
* Use miniat&es of .00 &ases to estimate te
first term in te 5radient+ Use a sin5le at& of
.00 fantasies to estimate te se&ond term in te
5radient+
* After ea& 6ei5t update 5enerate te ne6
fantasies from te previous fantasies y usin5one alternatin5 $is update+
, "o te fantasies &an 5et far from te data+
C t ti di
8/13/2019 Jul09 Hinton Deeplearn
53/126
Contrastive diver5en&e as an
adversarial 5ame
* y does persisitent CD 6or3 so 6ell 6it only
.00 ne5ative examples to &ara&teri>e te
6ole partition fun&tion
, 9or all interestin5 prolems te partition
fun&tion is i5ly multimodal+
, %o6 does it mana5e to find all te modes
6itout startin5 at te data
8/13/2019 Jul09 Hinton Deeplearn
54/126
Te learnin5 &auses very fast mixin5
* Te learnin5 intera&ts 6it te 8ar3ov &ain+
* !ersisitent Contrastive Diver5en&e &annot eanalysed y vie6in5 te learnin5 as an outer loop+
, erever te fantasies outnumer te
positive data te freeener5y surfa&e israised+ Tis ma3es te fantasies rus around
ypera&tively+
% i t t CD t t
8/13/2019 Jul09 Hinton Deeplearn
55/126
%o6 persistent CD moves et6een te
modes of te model@s distriution
* f a mode as more fantasy
parti&les tan data te free
ener5y surfa&e is raised until
te fantasy parti&les es&ape+
, Tis &an over&ome free
ener5y arriers tat 6ould
e too i5 for te 8ar3ov
Cain to ?ump+
* Te freeener5y surfa&e is
ein5 &an5ed to elp
mixin5 in addition to definin5
te model+
8/13/2019 Jul09 Hinton Deeplearn
56/126
"ummary so far
* 'estri&ted Bolt>mann 8a&ines provide a simple 6ay tolearn a layer of features 6itout any supervision+
, 8aximum li3eliood learnin5 is &omputationallyexpensive e&ause of te normali>ation term ut
&ontrastive diver5en&e learnin5 is fast and usually6or3s 6ell+
* 8any layers of representation &an e learned y treatin5te idden states of one 'B8 as te visile data fortrainin5 te next 'B8 (a &omposition of experts#+
* Tis &reates 5ood 5enerative models tat &an ten efinetuned+
, Contrastive 6a3esleep &an finetune 5eneration+
8/13/2019 Jul09 Hinton Deeplearn
57/126
B'HAF
8/13/2019 Jul09 Hinton Deeplearn
58/126
Evervie6 of te rest of te tutorial
* %o6 to finetune a 5reedily trained 5enerativemodel to e etter at dis&rimination+
* %o6 to learn a 3ernel for a $aussian pro&ess+
* %o6 to use deep elief nets for nonlinear
dimensionality redu&tion and do&ument retrieval+
* %o6 to learn a 5enerative ierar&y of
&onditional random fields+
* A more advan&ed learnin5 module for deepelief nets tat &ontains multipli&ative
intera&tions+
* %o6 to learn deep models of se;uential data+
8/13/2019 Jul09 Hinton Deeplearn
59/126
9inetunin5 for dis&rimination
* 9irst learn one layer at a time 5reedily+
* Ten treat tis as Kpretrainin5 tat finds a 5oodinitial set of 6ei5ts 6i& &an e finetuned ya lo&al sear& pro&edure+
, Contrastive 6a3esleep is one 6ay of finetunin5 te model to e etter at 5eneration+
* Ba&3propa5ation &an e used to finetune te
model for etter dis&rimination+, Tis over&omes many of te limitations of
standard a&3propa5ation+
8/13/2019 Jul09 Hinton Deeplearn
60/126
y a&3propa5ation 6or3s etter 6it
5reedy pretrainin5: Te optimi>ation vie6
* $reedily learnin5 one layer at a time s&ales 6ellto really i5 net6or3s espe&ially if 6e ave
lo&ality in ea& layer+
* e do not start a&3propa5ation until 6e alreadyave sensile feature dete&tors tat souldalready e very elpful for te dis&rimination tas3+, "o te initial 5radients are sensile and
a&3prop only needs to perform a lo&alsear&from a sensile startin5 point+
y a&3propa5ation 6or3s etter 6it
8/13/2019 Jul09 Hinton Deeplearn
61/126
y a&3propa5ation 6or3s etter 6it
5reedy pretrainin5: Te overfittin5 vie6
* 8ost of te information in te final 6ei5ts &omes frommodelin5 te distriution of input ve&tors+, Te input ve&tors 5enerally &ontain a lot more
information tan te laels+
, Te pre&ious information in te laels is only used forte final finetunin5+
, Te finetunin5 only modifies te features sli5tly to 5ette &ate5ory oundaries ri5t+ t does not need todis&over features+
* Tis type of a&3propa5ation 6or3s 6ell even if most ofte trainin5 data is unlaeled+, Te unlaeled data is still very useful for dis&overin5
5ood features+
8/13/2019 Jul09 Hinton Deeplearn
62/126
9irst model te distriution of di5it ima5es
2000 units
00 units
00 units
2 x 2
pixel
ima5e
Te net6or3 learns a density model for
unlaeled di5it ima5es+ en 6e 5enerate
from te model 6e 5et tin5s tat loo3 li3e
real di5its of all &lasses+
But do te idden features really elp 6itdi5it dis&rimination
Add .0 softmaxed units to te top and do
a&3propa5ation+
Te top t6o layers form a restri&ted
Bolt>mann ma&ine 6ose free ener5y
lands&ape sould model te lo6
dimensional manifolds of te di5its+
8/13/2019 Jul09 Hinton Deeplearn
63/126
'esults on permutationinvariant 8N"T tas3
* ery &arefully trained a&3prop net 6it .+4Pone or t6o idden layers (!lattS %inton#
* "8 (De&oste ) "&oel3opf 2002# .+/P
* $enerative model of ?oint density of .+2Pima5es and laels (R 5enerative finetunin5#
* $enerative model of unlaelled di5its .+.Pfollo6ed y 5entle a&3propa5ation(%inton ) "ala3utdinov "&ien&e 2004#
8/13/2019 Jul09 Hinton Deeplearn
64/126
Learnin5 Dynami&s of Deep Nets
te next / slides des&rie 6or3 y Mosua Ben5io@s 5roup
Before fine-tuning After fine-tuning
8/13/2019 Jul09 Hinton Deeplearn
65/126
Hffe&t of Unsupervised !retrainin5
4
Erhan et. al. AISTATS2009
8/13/2019 Jul09 Hinton Deeplearn
66/126
Hffe&t of Dept
44
w/o pre-trainingwith pre-trainingwithout pre-training
L i T ? t i i 9 ti "
8/13/2019 Jul09 Hinton Deeplearn
67/126
Learnin5 Tra?e&tories in 9un&tion "pa&e(a 2D visuali>ation produ&ed 6it t"NH#
* Ha& point is a
model in fun&tion
spa&e
* Color J epo&
* Top: tra?e&tories
6itout pretrainin5+
Ha& tra?e&tory
&onver5es to a
different lo&al min+
* Bottom: Tra?e&tories
6it pretrainin5+
* No overlap
Erhan et. al. AISTATS2009
i d t i i 3
8/13/2019 Jul09 Hinton Deeplearn
68/126
y unsupervised pretrainin5 ma3es sense
stuff
ima5e lael
stuff
ima5e lael
f ima5elael pairs 6ere
5enerated tis 6ay it
6ould ma3e sense to tryto 5o strai5t from
ima5es to laels+
9or example do te
pixels ave even parity
f ima5elael pairs are
5enerated tis 6ay it
ma3es sense to first learnto re&over te stuff tat
&aused te ima5e y
invertin5 te i5
and6idt pat6ay+
i5
and6idtlo6
and6idt
8/13/2019 Jul09 Hinton Deeplearn
69/126
8odelin5 realvalued data
* 9or ima5es of di5its it is possile to representintermediate intensities as if tey 6ere proailities y
usin5 Kmeanfield lo5isti& units+
, e &an treat intermediate values as te proaility
tat te pixel is in3ed+* Tis 6ill not 6or3 for real ima5es+
, n a real ima5e te intensity of a pixel is almost
al6ays almost exa&tly te avera5e of te nei5orin5
pixels+, 8eanfield lo5isti& units &annot represent pre&ise
intermediate values+
8/13/2019 Jul09 Hinton Deeplearn
70/126
'epla&in5 inary variales y
inte5ervalued variales
(Te and %inton 200.#
* Ene 6ay to model an inte5ervalued variale is
to ma3e N identi&al &opies of a inary unit+
* All &opies ave te same proaility
of ein5 Kon : p J lo5isti&(x#
, Te total numer of Kon &opies is li3e te
firin5 rate of a neuron+, t as a inomial distriution 6it mean N p
and varian&e N p(.p#
8/13/2019 Jul09 Hinton Deeplearn
71/126
A etter 6ay to implement inte5er values
* 8a3e many &opies of a inary unit+* All &opies ave te same 6ei5ts and te same
adaptive ias ut tey ave different fixed offsets to
te ias:
....,5.3,5.2,5.1,5.0 bbbb
x
8/13/2019 Jul09 Hinton Deeplearn
72/126
A fast approximation
* Contrastive diver5en&e learnin5 6or3s 6ell for te sum of
inary units 6it offset iases+* t also 6or3s for re&tified linear units+ Tese are mu& faster
to &ompute tan te sum of many lo5isti& units+
output J max(0 x R randns;rt(lo5isti&(x## #
)1log()5.0(logistic
1
x
n
n
enx ++=
=
%o6 to train a ipartite net6or3 of re&tified
8/13/2019 Jul09 Hinton Deeplearn
73/126
%o6 to train a ipartite net6or3 of re&tified
linear units
* Iust use &ontrastive diver5en&e to lo6er te ener5y ofdata and raise te ener5y of neary &onfi5urations tatte model prefers to te data+
data>< jihv
recon>< jihv
i
?
i
?
)( recondata >
8/13/2019 Jul09 Hinton Deeplearn
74/126
3D Object Recognition: The NORB dataset
Stereopairs o! gra"sca#e images o! to" objects$
% #ighting conditions& '%( )ie*points+i)e object instances per c#ass in the training set, differentset o! !i)e instances per c#ass in the test set
(-&3.. training cases& (-&3.. test cases
,nima#s
/umans
P#anes
Truc0s
Cars
Norma#i1ed
uni!orm
)ersion o!
NORB
8/13/2019 Jul09 Hinton Deeplearn
75/126
"implifyin5 te data
* Ha& trainin5 &ase is a stereopair of =4x=4 ima5es+
, Te o?e&t is &entered+
, Te ed5es of te ima5e are mainly lan3+
, Te a&35round is uniform and ri5t+* To ma3e learnin5 faster used simplified te data:
, Tro6 a6ay one ima5e+
, Enly use te middle 4/x4/ pixels of te oter
ima5e+
, Do6nsample to -2x-2 y avera5in5 / pixels+
"implifyin5 te data even more so tat it &an
8/13/2019 Jul09 Hinton Deeplearn
76/126
"implifyin5 te data even more so tat it &an
e modeled y re&tified linear units
* Te intensity isto5ram for ea& -2x-2 ima5e as asarp pea3 for te ri5t a&35round+
* 9ind tis pea3 and &all it >ero+
* Call all intensities ri5ter tan te a&35round >ero+
* 8easure intensities do6n6ards from te a&35round
intensity+
0
Test set error rates on NE'B after 5reedy
8/13/2019 Jul09 Hinton Deeplearn
77/126
learnin5 of one or t6o idden layers usin5
re&tified linear units
9ull NE'B (2 ima5es of =4x=4#
* Lo5isti& re5ression on te ra6 pixels 20+P
* $aussian "8 (trained y Leon Bottou# ..+4P
* Convolutional neural net (Le Cun@s 5roup# 4+0P(&onvolutional nets ave 3no6led5e of translations uilt in#
'edu&ed NE'B (. ima5e -2x-2#
* Lo5isti& re5ression on te ra6 pixels-0+2P
* Lo5isti& re5ression on first idden layer ./+=P
* Lo5isti& re5ression on se&ond idden layer .0+2P
T
8/13/2019 Jul09 Hinton Deeplearn
78/126
Te
re&eptive
fields of
somere&tified
linear
idden
units+
A standard type of realvalued visile unit
8/13/2019 Jul09 Hinton Deeplearn
79/126
A standard type of realvalued visile unit
* e &an model pixels as$aussian variales+
Alternatin5 $issamplin5 is still easytou5 learnin5 needs to
e mu& slo6er+
ijj
ji i
iv
hidj
jj
visi i
ii whhbbv,E =,
2
2
2
)()(
hv
H
ener5y5radient
produ&ed y te total
input to a visile unit
paraoli&
&ontainment
fun&tion
ii vb
ellin5 et+ al+ (200# so6 o6 to extend 'B8@s to te
exponential family+ "ee also Ben5io et+ al+ (2007#
A random sample of .0000 inary filters learned
8/13/2019 Jul09 Hinton Deeplearn
80/126
y Alex Fri>evs3y on a million -2x-2 &olor ima5es+
Cominin5 deep elief nets 6it $aussian pro&esses
8/13/2019 Jul09 Hinton Deeplearn
81/126
Cominin5 deep elief nets 6it $aussian pro&esses
* Deep elief nets &an enefit a lot from unlaeled data
6en laeled data is s&ar&e+, Tey ?ust use te laeled data for finetunin5+
* Fernel metods li3e $aussian pro&esses 6or3 6ell onsmall laeled trainin5 sets ut are slo6 for lar5e trainin5sets+
* "o 6en tere is a lot of unlaeled data and only a littlelaeled data &omine te t6o approa&es:, 9irst learn a deep elief net 6itout usin5 te laels+, Ten apply a $aussian pro&ess model to te deepest
layer of features+ Tis 6or3s etter tan usin5 te ra6data+
, Ten use $!@s to 5et te derivatives tat are a&3propa5ated trou5 te deep elief net+ Tis is afurter 6in+ t allo6s $!@s to finetune &ompli&ateddomainspe&ifi& 3ernels+
Learnin5 to extra&t te orientation of a fa&e pat&
8/13/2019 Jul09 Hinton Deeplearn
82/126
Learnin5 to extra&t te orientation of a fa&e pat&("ala3utdinov ) %inton N!" 2007#
Te trainin5 and test sets for predi&tin5
8/13/2019 Jul09 Hinton Deeplearn
83/126
Te trainin5 and test sets for predi&tin5
fa&e orientation
..000 unlaeled &ases.00 00 or .000 laeled &ases
fa&e pat&es from ne6 people
Te root mean s;uared error in te orientation
8/13/2019 Jul09 Hinton Deeplearn
84/126
Te root mean s;uared error in te orientation
6en &ominin5 $!@s 6it deep elief nets
22+2 .7+= .+2
.7+2 .2+7 7+2
.4+- ..+2 4+/
$! on
te
pixels
$! on
toplevel
features
$! on toplevel
features 6it
finetunin5
.00 laels 00 laels
.000 laels
Con&lusion: Te deep features are mu& etter
tan te pixels+ 9inetunin5 elps a lot+
Deep Autoen&oders 2x2W T
8/13/2019 Jul09 Hinton Deeplearn
85/126
(%inton ) "ala3utdinov 2004#
* Tey al6ays loo3ed li3e a really
ni&e 6ay to do nonlinear
dimensionality redu&tion:
, But it is very diffi&ult to
optimi>e deep autoen&oders
usin5 a&3propa5ation+
* e no6 ave a mu& etter 6ay
to optimi>e tem:
, 9irst train a sta&3 of / 'B8@s
, Ten Kunroll tem+
, Ten finetune 6it a&3prop+
.000 neurons
00 neurons
00 neurons
20 neurons
20 neurons
-0
.000 neurons
2x2
1
2
3
4
4
3
2
1
W
W
W
W
W
W
W
W
T
T
T
T
linearunits
A &omparison of metods for &ompressin5
8/13/2019 Jul09 Hinton Deeplearn
86/126
A &omparison of metods for &ompressin5
di5it ima5es to -0 real numers+
real
data
-0Ddeep auto
-0D lo5isti&
!CA
-0D
!CA
'etrievin5 do&uments tat are similar
8/13/2019 Jul09 Hinton Deeplearn
87/126
'etrievin5 do&uments tat are similar
to a ;uery do&ument
* e &an use an autoen&oder to find lo6dimensional &odes for do&uments tat allo6
fast and a&&urate retrieval of similar
do&uments from a lar5e set+
* e start y &onvertin5 ea& do&ument into a
Ka5 of 6ords+ Tis a 2000 dimensional
ve&tor tat &ontains te &ounts for ea& of te2000 &ommonest 6ords+
%o6 to &ompress te &ount ve&tor
8/13/2019 Jul09 Hinton Deeplearn
88/126
p
* e train te neuralnet6or3 to reprodu&e its
input ve&tor as its output
* Tis for&es it to
&ompress as mu&information as possile
into te .0 numers in
te &entral ottlene&3+
* Tese .0 numers areten a 5ood 6ay to
&ompare do&uments+
2000 re&onstru&ted &ounts
00 neurons
2000 6ord &ounts
00 neurons
20 neurons
20 neurons
.0
input
ve&tor
output
ve&tor
!erforman&e of te autoen&oder at
8/13/2019 Jul09 Hinton Deeplearn
89/126
!erforman&e of te autoen&oder at
do&ument retrieval
* Train on a5s of 2000 6ords for /00000 trainin5 &asesof usiness do&uments+, 9irst train a sta&3 of 'B8@s+ Ten finetune 6it
a&3prop+* Test on a separate /00000 do&uments+
, !i&3 one test do&ument as a ;uery+ 'an3 order all teoter test do&uments y usin5 te &osine of te an5leet6een &odes+
, 'epeat tis usin5 ea& of te /00000 test do&umentsas te ;uery (re;uires 0+.4 trillion &omparisons#+
* !lot te numer of retrieved do&uments a5ainst teproportion tat are in te same andlaeled &lass as te;uery do&ument+
!roportion of retrieved do&uments in same &lass as ;uery
8/13/2019 Jul09 Hinton Deeplearn
90/126
p ; y
Numer of do&uments retrieved
9irst &ompress all do&uments to 2 numers usin5 a type of !CA
8/13/2019 Jul09 Hinton Deeplearn
91/126
9irst &ompress all do&uments to 2 numers usin5 a type of !CA
Ten use different &olors for different
do&ument &ate5ories
9irst &ompress all do&uments to 2 numers+Ten use different &olors for different do&ument &ate5ories
8/13/2019 Jul09 Hinton Deeplearn
92/126
5
9indin5 inary &odes for do&uments
8/13/2019 Jul09 Hinton Deeplearn
93/126
d 5 a y &odes o do&u e s
*Train an autoen&oder usin5 -0lo5isti& units for te &ode layer+
* Durin5 te finetunin5 sta5eadd noise to te inputs to te&ode units+,
Te Knoise ve&tor for ea&trainin5 &ase is fixed+ "o 6estill 5et a deterministi&5radient+
, Te noise for&es teira&tivities to e&ome imodalin order to resist te effe&tsof te noise+
, Ten 6e simply round tea&tivities of te -0 &ode unitsto . or 0+
2000 re&onstru&ted &ounts
00 neurons
2000 6ord &ounts
00 neurons
20 neurons
20 neurons
-0
noise
"emanti& asin5: Usin5 a deep autoen&oder as aasfun&tion for findin5 approximate mat&es
8/13/2019 Jul09 Hinton Deeplearn
94/126
as fun&tion for findin5 approximatemat&es
("ala3utdinov ) %inton 2007#
as
fun&tion
Ksupermar3et sear&
%o6 5ood is a sortlist found tis 6ay
8/13/2019 Jul09 Hinton Deeplearn
95/126
5 y
* e ave only implemented it for a milliondo&uments 6it 20it &odes ut 6at &ould
possily 5o 6ron5
,A 20D yper&ue allo6s us to &apture enou5
of te similarity stru&ture of our do&ument set+
* Te sortlist found usin5 inary &odes a&tually
improves te pre&isionre&all &urves of T9D9+
, Lo&ality sensitive asin5 (te fastest otermetod# is 0 times slo6er and as 6orse
pre&isionre&all &urves+
$eneratin5 te parts of an o?e&t
8/13/2019 Jul09 Hinton Deeplearn
96/126
$eneratin5 te parts of an o?e&t
* Ene 6ay to maintain te&onstraints et6een te parts isto 5enerate ea& part verya&&urately
, But tis 6ould re;uire a lot of&ommuni&ation and6idt+
* "loppy topdo6n spe&ifi&ation ofte parts is less demandin5
, ut it messes up relationsipset6een features
, so use redundant featuresand use lateral intera&tions to&lean up te mess+
* Ha& transformed feature elpsto lo&ate te oters
, Tis allo6s a noisy &annel
sloppy topdo6n
a&tivation of parts
&leanup usin53no6n intera&tions
pose parameters
features 6ittopdo6n
support
Ks;uare
R
ts li3e soldiers on
a parade 5round
"emirestri&ted Bolt>mann 8a&ines
8/13/2019 Jul09 Hinton Deeplearn
97/126
* e restri&t te &onne&tivity to ma3e
learnin5 easier+* Contrastive diver5en&e learnin5 re;uires
te idden units to e in &onditional
e;uilirium 6it te visiles+
, But it does not re;uire te visile unitsto e in &onditional e;uilirium 6it te
iddens+
,All 6e re;uire is tat te visile units
are &loser to e;uilirium in tere&onstru&tions tan in te data+
* "o 6e &an allo6 &onne&tions et6een
te visiles+
idden
i
?
visile
Learnin5 a semirestri&ted Bolt>mann 8a&ine
8/13/2019 Jul09 Hinton Deeplearn
98/126
0>< jihv
1>< jihv
i
?
i
?
t J 0 t J .
)( 10
>mann
8/13/2019 Jul09 Hinton Deeplearn
99/126
5
8a&ines
* 8etod .:To form a re&onstru&tion &y&letrou5 te visile units updatin5 ea& in turn
usin5 te topdo6n input from te iddens plus
te lateral input from te oter visiles+
* 8etod 2:Use Kmean field visile units tat
ave real values+ Update tem all in parallel+
, Use dampin5 to prevent os&illations
)()(11 iti
ti xpp +=+
total input to idampin5
'esults on modelin5 natural ima5e pat&es
8/13/2019 Jul09 Hinton Deeplearn
100/126
5 5 p
usin5 a sta&3 of 'B8@s (Esindero and %inton#
* "ta&3 of 'B8@s learned one at a time+* /00 $aussian visile units tat see
6itened ima5e pat&es, Derived from .00000 an %ateren
ima5e pat&es ea& 20x20* Te idden units are all inary+
, Te lateral &onne&tions arelearned 6en tey are te visileunits of teir 'B8+
* 'e&onstru&tion involves lettin5 tevisile units of ea& 'B8 settle usin5meanfield dynami&s+, Te already de&ided states in te
level aove determine te effe&tiveiases durin5 meanfield settlin5+
Dire&ted Conne&tions
Dire&ted Conne&tions
Undire&ted Conne&tions
/00
$aussian
units
%idden
8'9 6it
2000 units
%idden
8'9 6it00 units
.000 toplevel units+
No 8'9+
itout lateral &onne&tions
8/13/2019 Jul09 Hinton Deeplearn
101/126
real data samples from model
it lateral &onne&tions
8/13/2019 Jul09 Hinton Deeplearn
102/126
real data samples from model
A funny 6ay to use an 8'9
8/13/2019 Jul09 Hinton Deeplearn
103/126
A funny 6ay to use an 8'9
* Te lateral &onne&tions form an 8'9+* Te 8'9 is used durin5 learnin5 and 5eneration+
* Te 8'9 is notused for inferen&e+
, Tis is a novel idea so vision resear&ers don@t li3e it+
* Te 8'9 enfor&es &onstraints+ Durin5 inferen&e&onstraints do not need to e enfor&ed e&ause te dataoeys tem+
, Te &onstraints only need to e enfor&ed durin55eneration+
* Unoserved idden units &annot enfor&e &onstraints+
, To enfor&e &onstraints re;uires lateral &onne&tions oroserved des&endants+
y do 6e 6iten data
8/13/2019 Jul09 Hinton Deeplearn
104/126
y do 6e 6iten data
* ma5es typi&ally ave stron5 pair6ise &orrelations+* Learnin5 i5er order statisti&s is diffi&ult 6en tere are
stron5 pair6ise &orrelations+
, "mall &an5es in parameter values tat improve te
modelin5 of i5erorder statisti&s may e re?e&tede&ause tey form a sli5tly 6orse model of te mu&
stron5er pair6ise statisti&s+
* "o 6e often remove te se&ondorder statisti&s efore
tryin5 to learn te i5erorder statisti&s+
itenin5 te learnin5 si5nal instead
8/13/2019 Jul09 Hinton Deeplearn
105/126
of te data
* Contrastive diver5en&e learnin5 &an remove te effe&tsof te se&ondorder statisti&s on te learnin56itouta&tually &an5in5 te data+
, Te lateral &onne&tions model te se&ond order
statisti&s, f a pixel &an e re&onstru&ted &orre&tly usin5 se&ond
order statisti&s its 6ill e te same in tere&onstru&tion as in te data+
, Te idden units &an ten fo&us on modelin5 i5order stru&ture tat &annot e predi&ted y te lateral&onne&tions+
* 9or example a pixel &lose to an ed5e 6ere interpolationfrom neary pixels &auses in&orre&t smootin5+
To6ards a more po6erful multilinear
8/13/2019 Jul09 Hinton Deeplearn
106/126
sta&3ale learnin5 module
* "o far te states of te units in one layer ave only eenused to determine te effe&tive iases of te units in te
layer elo6+
* t 6ould e mu& more po6erful to modulate te pair6ise
intera&tions in te layer elo6+,A 5ood 6ay to desi5n a ierar&i&al system is to allo6
ea& level to determine te o?e&tive fun&tion of te level
elo6+
* To modulate pair6ise intera&tions 6e need i5erorderBolt>mann ma&ines+
%i5er order Bolt>mann ma&ines("e?no6s3i
8/13/2019 Jul09 Hinton Deeplearn
107/126
("e?no6s3i
8/13/2019 Jul09 Hinton Deeplearn
108/126
model ima5e transformations(te unfa&tored version#
* A 5loal transformation spe&ifies 6i& pixel
5oes to 6i& oter pixel+
* Conversely ea& pair of similar intensity pixels
one in ea& ima5e votes for a parti&ular 5loaltransformation+
ima5e(t# ima5e(tR.#
ima5e transformation
9a&torin5 tree6ay
8/13/2019 Jul09 Hinton Deeplearn
109/126
5 y
multipli&ative intera&tions
=
=
fhfjfifhj
hjii
ijhhj
hji
i
wwwsssE
wsssE
,,
,,
fa&tored6it linearly
many parameters
per fa&tor+
unfa&tored6it &ui&ally
many parameters
A pi&ture of te lo6ran3 tensor
&ontri ted fa&tor f
8/13/2019 Jul09 Hinton Deeplearn
110/126
&ontriuted y fa&tor f
ifw
jfw
hfw
Ha& layer is a s&aled version
of te same matrix+
Te asis matrix is spe&ified
as an outer produ&t 6ittypi&al term
"o ea& a&tive idden unit
&ontriutes a s&alartimes te matrix spe&ified y
fa&tor f +
jfifww
hfw
nferen&e 6it fa&tored tree6ay
8/13/2019 Jul09 Hinton Deeplearn
111/126
multipli&ative intera&tions
[ ]
=
=
==
j
jfjif
i
ihfhfhf
hfjfifhj
hji
if
wswswsEsE
wwwsssE
)()( 10
,,
%o6 &an5in5 te inary state
of unit &an5es te ener5y
&ontriuted y fa&tor f+
at unit needs
to 3no6 in order to
do $is samplin5
Te ener5y
&ontriuted y
fa&tor f+
Belief propa5ation
8/13/2019 Jul09 Hinton Deeplearn
112/126
p p 5
ifw jfw
hfw
f
i j
h
Te out5oin5 messa5e
at ea& vertex of te
fa&tor is te produ&t of
te 6ei5ted sums atte oter t6o verti&es+
Learnin5 6it fa&tored tree6ay
8/13/2019 Jul09 Hinton Deeplearn
113/126
multipli&ative intera&tions
delmodata
modeldata
hfh
hfh
hf
f
hf
f
hf
j
jfjif
i
ihf
msms
w
E
w
Ew
wswsm
=
= messa5e
from fa&tor fto unit
'oland data
8/13/2019 Jul09 Hinton Deeplearn
114/126
'oland data
8odelin5 te &orrelational stru&ture of a stati& ima5ey usin5 t6o &opies of te ima5e
8/13/2019 Jul09 Hinton Deeplearn
115/126
ifw jfw
hfw
f
i j
h
Ha& fa&tor sends te
s;uared output of a linearfilter to te idden units+
t is exa&tly te standard
model of simple and&omplex &ells+ t allo6s
&omplex &ells to extra&t
oriented ener5y+
Te standard model dropsout of doin5 elief
propa5ation for a fa&tored
tirdorder ener5y fun&tion+Copy . Copy 2
An advanta5e of modelin5 &orrelations
t i l t t i l
8/13/2019 Jul09 Hinton Deeplearn
116/126
et6een pixels rater tan pixels
* Durin5 5eneration a Kverti&al ed5e unit &an turn offte ori>ontal interpolation in a re5ion 6itout6orryin5 aout exa&tly 6ere te intensitydis&ontinuity 6ill e+
, Tis 5ives some translational invarian&e, t also 5ives a lot of invarian&e to ri5tness and
&ontrast+
, "o te Kverti&al ed5e unit is li3e a &omplex &ell+
* By modulatin5 te &orrelations et6een pixels ratertan te pixel intensities te 5enerative model &anstill allo6 interpolation parallel to te ed5e+
A prin&iple of ierar&i&al systems
8/13/2019 Jul09 Hinton Deeplearn
117/126
A prin&iple of ierar&i&al systems
* Ha& level in te ierar&y sould not try tomi&romana5e te level elo6+
* nstead it sould &reate an o?e&tive fun&tion for
te level elo6 and leave te level elo6 tooptimi>e it+
, Tis allo6s te fine details of te solution to
e de&ided lo&ally 6ere te detailed
information is availale+* E?e&tive fun&tions are a 5ood 6ay to do
astra&tion+
Time series models
8/13/2019 Jul09 Hinton Deeplearn
118/126
Time series models
* nferen&e is diffi&ult in dire&ted models of timeseries if 6e use nonlinear distriuted
representations in te idden units+
, t is ard to fit Dynami& Bayes Nets to i5
dimensional se;uen&es (e+5 motion &apture
data#+
* "o people tend to avoid distriuted
representations and use mu& 6ea3er metods(e+5+ %88@s#+
Time series models
8/13/2019 Jul09 Hinton Deeplearn
119/126
Time series models
* f 6e really need distriuted representations (6i& 6enearly al6ays do# 6e &an ma3e inferen&e mu& simplery usin5 tree tri&3s:
, Use an 'B8 for te intera&tions et6een idden andvisile variales+ Tis ensures tat te main sour&e of
information 6ants te posterior to e fa&torial+, 8odel sortran5e temporal information y allo6in5
several previous frames to provide input to te iddenunits and to te visile units+
* Tis leads to a temporal module tat &an e sta&3ed, "o 6e &an use 5reedy learnin5 to learn deep models
of temporal stru&ture+
An appli&ation to modelin5motion &apture data
8/13/2019 Jul09 Hinton Deeplearn
120/126
motion &apture data(Taylor 'o6eis ) %inton 2007#
* %uman motion &an e &aptured y pla&in5refle&tive mar3ers on te ?oints and ten usin5lots of infrared &ameras to tra&3 te -Dpositions of te mar3ers+
* $iven a s3eletal model te -D positions of temar3ers &an e &onverted into te ?oint an5lesplus 4 parameters tat des&rie te -D positionand te roll pit& and ya6 of te pelvis+, e only represent &an5esin ya6 e&ause pysi&s
doesn@t &are aout its value and 6e 6ant to avoid&ir&ular variales+
Te &onditional 'B8 model( ti ll d C'9#
8/13/2019 Jul09 Hinton Deeplearn
121/126
(a partially oserved C'9#
* "tart 6it a 5eneri& 'B8+* Add t6o types of &onditionin5
&onne&tions+
* $iven te data te idden unitsat time t are &onditionallyindependent+
* Te autore5ressive 6ei5ts &anmodel most sortterm temporalstru&ture very 6ell leavin5 teidden units to model nonlinearirre5ularities (su& as 6en tefoot its te 5round#+ t2 t. t
i
j
v
Causal 5eneration from a learned model
8/13/2019 Jul09 Hinton Deeplearn
122/126
5
* Feep te previous visile states fixed+, Tey provide a timedependent
ias for te idden units+
* !erform alternatin5 $is samplin5
for a fe6 iterations et6een teidden units and te most re&ent
visile units+
, Tis pi&3s ne6 idden and visile
states tat are &ompatile 6itea& oter and 6it te re&ent
istory+
i
j
%i5er level models
8/13/2019 Jul09 Hinton Deeplearn
123/126
5
* En&e 6e ave trained te model 6e &anadd layers li3e in a Deep Belief Net6or3+
* Te previous layer C'B8 is 3ept and itsoutput 6ile driven y te data is treatedas a ne6 3ind of Kfully oserved data+
* Te next level C'B8 as te samear&ite&ture as te first (tou5 6e &analter te numer of units it uses# and istrained te same 6ay+
* Upper levels of te net6or3 model moreKastra&t &on&epts+
* Tis 5reedy learnin5 pro&edure &an e?ustified usin5 a variational ound+
i
j
k
t2 t. t
Learnin5 6it Kstyle laels
8/13/2019 Jul09 Hinton Deeplearn
124/126
5 y
* As in te 5enerative model of
and6ritten di5its (%inton et al+
2004# style laels &an e
provided as part of te input to
te top layer+
* Te laels are represented y
turnin5 on one unit in a 5roup of
units ut tey &an also e
lended+
i
j
t2 t. t
k
l
8/13/2019 Jul09 Hinton Deeplearn
125/126
"o6 demo@s of multiple styles of
6al3in5
These can be foun atwww.cs.toronto.eu/!gwta"lor/
'eadin5s on deep elief nets
8/13/2019 Jul09 Hinton Deeplearn
126/126
5 p
A readin5 list (tat is still ein5 updated# &an efound at
666+&s+toronto+eduO