Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Exponen'al*Capacity*in*an*
Autoencoder*Neural*Network*with*a*
Hidden*Layer**
Alireza*Alemi*,*Alia*Abbara*
*
*
Santa*Fe,*NM,*Jan*2018*
Limits*of*computa'ons*in*deep*nets?*
random*binary*
*input*paJerns*
random*binary*
*output*labels**
when***N*⟶*∞*
Alireza*Alemi*******PIML*2018*
Limits*of*computa'ons*in*deep*nets?*
Capacity*=**#)input,output)associa/ons)
network)size)
when***N*⟶*∞*
random*binary*
*input*paJerns*
random*binary*
*output*labels**
Alireza*Alemi*******PIML*2018*
Capacity*of*mul'Olayer*perceptron*
(MLP)*with*one*hidden*layer?*
paJerns**
{!μ}*
input layer (Nv neurons)
hidden layer (Nh neurons) output layer
*
random*binary*
*output*labels**
Alireza*Alemi*******PIML*2018*
Inves'ga'ng*internal*representa'on*
paJerns**
{!μ}*
input layer (Nv neurons)
hidden layer (Nh neurons) output layer
*
random*binary*
*output*labels**
Alireza*Alemi*******PIML*2018*
Expansive*(or*overcomplete)*representa'on*
paJerns**
{!μ}*
input layer (Nv neurons)
hidden layer (Nh neurons)
Λ = *Nv*
Nh* > 1*
output layer*
random*binary*
*output*labels**
Alireza*Alemi*******PIML*2018*
Expansive*(or*overcomplete)*representa'on*
paJerns**
{!μ}*
Nervous)system) Expansion)ra/o)(Λ))
Entorhinal*cortex*�**Dentate*gyrus* �*10*
LGN*�*V1* �*25*
in*fly*olfactory*system:*
*antennal*lobe*�*mushroom*body*�*40*
in*the*cerebellum,**
mossy*fibers*�*granule*cells*100*�*200*
rodent’s*olfactory*bulb*�*piriform*
cortex*�*103*
Alireza*Alemi*******PIML*2018*
Expansive*representa'on*with*random*
weights*
paJerns**
{!μ}*
input layer (Nv neurons)
hidden layer (Nh neurons) output layer
*
random*binary*
*output*labels**
random fixed weights vij
Alireza*Alemi*******PIML*2018*
Expansive*representa'on*with*random*
weights*
input layer (Nv neurons)
hidden layer (Nh neurons) output layer
*
random*binary*
*output*labels**
random fixed weights vij
How*good*is*randomOweight*
expansive*representa'ons?*
paJerns**
{!μ}where* μ=1… p*
*
Alireza*Alemi*******PIML*2018*
Expansive*representa'on*with*random*
weights*
input layer (Nv neurons)
hidden layer (Nh neurons) output layer
*
random*binary*
*output*labels**
random fixed weights vij
How*good*is*randomOweight*
expansive*representa'ons?*
paJerns**
{!μ}where* μ=1… p*
*
How*many*of*input*paJerns*can*be*
reconstructed*from*this*representa'on?*
Alireza*Alemi*******PIML*2018*
Expansive*autoencoder*with*random*
weights*
input layer (Nv neurons)
hidden layer (Nh neurons)
learnable weights wji
paJerns**
{!μ}where* μ=1… p*
*
output layer(Nv neurons)
random fixed weights vij
Focus*of*this*talk:**
Op/mal)capacity)of)expansive)autoencoder?)
"µ**#µ* *yµ=#µ*
Alireza*Alemi*******PIML*2018*
Cri'cal*capacity*of*the*perceptron:*
Gardner*approach*
y
• Noiseless*output: yμ = sgn(*w#μ + $ )*
• Perceptron*learning*rule*(%=0):***Δwi = η!i (*dµ - yµ*) i (*dµ - yµ*)
• Find*w*to*store*a*set*of*random*ensemble*of*inputOoutput**
******pairs*{ (#μ, dµ) },* μ=1… p***with a robustness %*****
,***cri'cal*capacity:*
\forall*\mu,*i:*~~*d^\mu\big(*\frac{1}{\sqrt{N}}*\sum_{i=1}^{N}*w_i*\xi_i^\mu*O\theta\big)*>*\kappa*
�μ, i :*
Gardner E. (1988) J. Phys. A: Math. Gen.**
SAT*
UNSAT*
y^\mu*=*\text{sgn}*(*\frac{*\mathbf{w\cdot{\bm{\xi}}^\mu}*}{*\sqrt{N}*}O\theta)*
*\Omega*=*{\int_{\|*\mathbf{w}*\|^2=N}*d^{N}\mathbf{w}\,*
* *\prod_{\mu=1}^{p}**
* *\Theta\Big(*d^\mu*\big(*\frac{1}{\sqrt{N}}\sum_{i=1}^{N}w_{i}\xi_i^\mu*O*\theta*\big)*O*\kappa*\Big)}*
Gardner*volume*(N*⟶*∞):*
• Storage*criteria:*
�log(Ω)� averaging over the (quenched) distribution of the patterns*
\llangle\log{\Omega}\rrangle=\lim_{n\rightarrow*0}\frac{\llangle\Omega^n\rrangleO1}{n}*
*.**.*.*
μ*
\alpha_c=*\lim_{N\rightarrow*\inly}\frac{p_c}{N}*
&c = pc / N , when***N*⟶*∞*
*
Alireza*Alemi*******PIML*2018*
A*simplified*expansive*(overOcomplete)*
autoencoder**
*• Binary*neurons*(±1)*
• Expansion*ra'o:*' = Nh ∕ Nv
• Fixed*encoding*weights*vij*are*randomly*sampled*
from*((0, 1)*
• Goal:*reconstruct*random*paJerns*{#µ}*µ= 1,...,p ***********************by*learning*the*decoding*weights wji*
• Perceptron*learning*rule:*
******
• Dense*coding*level*for*paJerns:**P( !i=+1 ) = 0.5 i=+1 ) = 0.5
• Coding level of hidden layer: P( )i=+1 ) = f i=+1 ) = f *
• Maximal/cri'cal*storage*Capacity:***
&max = pmax /Nh as Nh ⟶ ∞**
• Storage*criteria:*mee'ng*fixedOpoint*equa'ons*******
*
*****
Dynamics:*
"µ**#µ* *yµ=#µ*
\sigma_{i}^\mu*=*\sgn*\big(\sum_{j=1}^{N_v}*v_{ij}\xi_{j}^\mu\big)*
y_{j}^\mu*=*\sgn*\big(\sum_{i=1}^{N_h}*w_{ji}\sigma_{i}^\mu\big)*
*\Delta*w_{ji}*=*\eta*(\xi_j^\mu*O*y_j^\mu)*\sigma_i^\mu*
�j, *:
Alemi A., Abbara A, 2017 ArXiv
input layer (Nv neurons)
hidden layer (Nh neurons)
random fixed weights vij
learnable weights wji
output layer(Nv neurons)
Alireza*Alemi*******PIML*2018*
Approximated*by*a*quenched*random*variable*with*a*Gaussian*distribu'on*:*
A*meanOfield*approxima'on*(MFA)*
*=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*w_{ji}*
\text{sgn}*\Big(\sum_{l\neq*j;l=1}^{N_v}*v_{il}*
\xi_l^{\mu}+v_{ij}\xi_j^\mu\Big)\bigg)*
\xi_j^\mu*=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*
w_{ji}*\text{sgn}*\Big(\sum_{l=1}^{N_v}*v_{il}*
\xi_l^{\mu}\Big)\bigg)*
=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*w_{ji}*
\text{sgn}*\Big(z_j^\mu+v_{ij}\xi_j^\mu\Big)
\bigg)*
\mathcal{G}(0,\sqrt{N_v})*
When*|z|<|v!| & wji vij >0 the*fixedOpoint*equa'ons*hold.*
\P(\sigma_i^\mu*\,\vert*\,*\xi_j^\mu)*\perp*\P(\sigma_k^\mu*\,*\vert*\,*\xi_j^\mu)*
\P(\boldsymbol{\sigma}^\mu\vert\xi_j^\mu)=\prod_i*\P(\sigma_i^\mu\vert\xi_j^\mu)*
*\Omega*=*{\int_{\|*\mathbf{w}*\|^2=N_h}*d^{N_h}\mathbf{w}\,*
* *\prod_{\mu=1}^{p}**
* *\Theta\big(*\xi_j^\mu*\frac{1}{\sqrt{N_h}}\sum_{i=1}^{N_h}w_{ji}\sigma_i^\mu*O*\kappa*\big)},*
(\sigma_i^\mu*\,*\perp*\sigma_k^\mu)*\,*\vert*\,*\xi_j^\mu*
*
\P\big(\sigma_i^\mu*\big\vert*\xi_j^\mu\big)\simeq\frac{1}{2}+\sigma_i^\mu\xi_j^\mu\frac{v_{ij}}{\sqrt{2\pi*N_v}}*
Alireza*Alemi*******PIML*2018*
A*meanOfield*approxima'on*(MFA)*
*=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*w_{ji}*
\text{sgn}*\Big(\sum_{l\neq*j;l=1}^{N_v}*v_{il}*
\xi_l^{\mu}+v_{ij}\xi_j^\mu\Big)\bigg)*
\xi_j^\mu*=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*
w_{ji}*\text{sgn}*\Big(\sum_{l=1}^{N_v}*v_{il}*
\xi_l^{\mu}\Big)\bigg)*
=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*w_{ji}*
\text{sgn}*\Big(z_j^\mu+v_{ij}\xi_j^\mu\Big)
\bigg)*
\mathcal{G}(0,\sqrt{N_v})*
When*|z|<|v!| & wji vij >0 the*fixedOpoint*equa'ons*hold.*
\P(\sigma_i^\mu*\,\vert*\,*\xi_j^\mu)*\perp*\P(\sigma_k^\mu*\,*\vert*\,*\xi_j^\mu)*
\P(\boldsymbol{\sigma}^\mu\vert\xi_j^\mu)=\prod_i*\P(\sigma_i^\mu\vert\xi_j^\mu)*
*\Omega*=*{\int_{\|*\mathbf{w}*\|^2=N_h}*d^{N_h}\mathbf{w}\,*
* *\prod_{\mu=1}^{p}**
* *\Theta\big(*\xi_j^\mu*\frac{1}{\sqrt{N_h}}\sum_{i=1}^{N_h}w_{ji}\sigma_i^\mu*O*\kappa*\big)},*
(\sigma_i^\mu*\,*\perp*\sigma_k^\mu)*\,*\vert*\,*\xi_j^\mu*
*
\P\big(\sigma_i^\mu*\big\vert*\xi_j^\mu\big)\simeq\frac{1}{2}+\sigma_i^\mu\xi_j^\mu\frac{v_{ij}}{\sqrt{2\pi*N_v}}*
Probability*distribu'on*
Alireza*Alemi*******PIML*2018*
MeanOField*Approxima'on*(MFA)*for*
the*replica*calcula'on*
*=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*w_{ji}*
\text{sgn}*\Big(\sum_{l\neq*j;l=1}^{N_v}*v_{il}*
\xi_l^{\mu}+v_{ij}\xi_j^\mu\Big)\bigg)*
\xi_j^\mu*=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*
w_{ji}*\text{sgn}*\Big(\sum_{l=1}^{N_v}*v_{il}*
\xi_l^{\mu}\Big)\bigg)*
=*\text{sgn}*\bigg(*\sum_{i=1}^{N_h}*w_{ji}*
\text{sgn}*\Big(z_j^\mu+v_{ij}\xi_j^\mu\Big)
\bigg)*
\mathcal{G}(0,\sqrt{N_v})*
When*|z|<|v!| & wji vij >0 the*fixedOpoint*equa'ons*hold.*
\P(\sigma_i^\mu*\,\vert*\,*\xi_j^\mu)*\perp*\P(\sigma_k^\mu*\,*\vert*\,*\xi_j^\mu)*
\P(\boldsymbol{\sigma}^\mu\vert\xi_j^\mu)=\prod_i*\P(\sigma_i^\mu\vert\xi_j^\mu)*
*\Omega*=*{\int_{\|*\mathbf{w}*\|^2=N_h}*d^{N_h}\mathbf{w}\,*
* *\prod_{\mu=1}^{p}**
* *\Theta\big(*\xi_j^\mu*\frac{1}{\sqrt{N_h}}\sum_{i=1}^{N_h}w_{ji}\sigma_i^\mu*O*\kappa*\big)},*
(\sigma_i^\mu*\,*\perp*\sigma_k^\mu)*\,*\vert*\,*\xi_j^\mu*
*
\P\big(\sigma_i^\mu*\big\vert*\xi_j^\mu\big)\simeq\frac{1}{2}+\sigma_i^\mu\xi_j^\mu\frac{v_{ij}}{\sqrt{2\pi*N_v}}*
Probability*distribu'on*
Gardner*volume:*
Replica*theory:*compute*�log(Ω)� over*distribu'on*of*paJerns*using*the*replica*method**
Alireza*Alemi*******PIML*2018*
Exponen'al*cri'cal*capacity*in*the*
MFA*
1 2 3 4 5 6 7 8 9 10Expansion ratio (Λ)
100
101
102
Cap
acity(α
)
Analytical mean-fieldExponential fit
where**
Alemi A., Abbara A, 2017 ArXiv
' = Nh ∕ Nv *
M=\sum_i*\frac{v_i*w_i}{N_h},*
Solu'ons*(replicaOsymmetric,*locally*stable):*
Alireza*Alemi*******PIML*2018*
Comparison*with*simula'on*
(perceptron*learning*rule)*
1 2 3 4 5
Expansion ratio (Λ)
100
101
102
Cap
acity(α
)
Simulation mean-fieldAnalytical mean-field
1 2 3 4 5
Expansion ratio (Λ)
0
0.2
0.4
0.6
0.8
1
symmetry
betweenw
ji&
vij
SimulationAnalytical
Alemi A., Abbara A, 2017 ArXiv Alireza*Alemi*******PIML*2018*
Effect*of*robustness*of*output,*
sparseness*in*the*hidden*layer*
1 2 3 4 5 6 7
Expansion ratio (Λ)
10-1
100
101
102
Cap
acity(α
c)
f = 0.5f = 0.1f = 0.05f = 0.01
1 2 3 4 5 6 7 8 9 10
Expansion ratio (Λ)
10-1
100
101
102
Cap
acity(α
c)
κ=0
κ=1
κ=2
κ=3
Alemi A., Abbara A, 2017 ArXiv Alireza*Alemi*******PIML*2018*
Comparison*with*the*fullOmodel*
1 2 3 4 5
Expansion ratio (Λ)
100
101
102Cap
acity(α
)Simulation full modelSimulation mean-field
Alemi A., Abbara A, 2017 ArXiv Alireza*Alemi*******PIML*2018*
Log*of*number*of*paJerns*vs.*total*
number*of*neurons*
Alireza*Alemi*******PIML*2018*
Discussion*
• Expansive*representa'on*has*important*computa'onal*implica'ons.*
• Exponen6al*scaling*of*the*capacity*with*the*expansion*ra'o*in*our*autoencoder*
– Expansion*makes*up*for*the*loss*of*informa'on*due*to*binariza'on*
• In*very*good*agreement*with*results*of*simula'ons*
using*an*online*learning*rule*
• Future*direc'ons*– Robustness*to*noise*by*training*encoding*weights*– Compu'ng*the*capacity*of*the*full*autoencoder*model*
– Extension*to*the*capacity*of*MLP*and*deep*nets*
Alireza*Alemi*******PIML*2018*
Acknowledgement*
Alia*Abbara*
Discussion*with*many*colleagues*…*
Alireza*Alemi*******PIML*2018*
Ques'on?*
Alireza*Alemi*******PIML*2018*