Perturbation Theory for wide Networks
( joint w/ Dan Roberts,Sho Yaida )
F = ↳no, ,
TCW - x - b) dllfcw , b)• •
•• °
. Fact : Any"
nice"
Mf can be approximated :•
° °
o
•'
cb
•! - -
- !
zazie ,pk+ ,Mf I Guy
, bj)n,> 31
.
o -
sea = Z'LE IR"°
.
zcyc.IR" ' zcyc.IR'll Tkm :[Montanari t , Chizatt ,
Rotskofft,. . . ]
← layer Let 's scale Wc? = OC 's,) ,
b'i''= o
.
When
z ! !! =
'
W'ig? or ( zjj? ) x b'f'
you train w !? by =GD :
T Tneuron sea fish
,
GD = 2- Wass optimal transporttool : Say a bit w happens when we Ems ! on elf .
Tojo Thereare actually many different -
when use lez - loss .
width limits : init, learning rate ,
no, be = Sw
⇐¥:! ;'
'II't
:{ ' wi.edu#oca-ibaoisJY'"s'Note that
Z' 3'
= f,µo+ ,6( Wore + b) dplhf.be) Net • Tag
-
K
wheret d Gen ) -- cmipneintgs . I w
u = ¥,
'
W' 'j'
8cway.yc.ggMeas Stilo
, g) dy -
- re . )w/ #atoms
e- width f tick , a) doc = u ( o )East : Any
" reasonable " o, any
f- R" → IR1¥ we usually take W
'? noChik)we can write
=
ne
z !! = JE! wtf} oeczcejj's ) + b'i' H) = e- I E.
Ei ECE "-" 395, ocnuhy a
The "standard
" init scheme : cw.CI Finally , we can describe GP recursively :
Wing n Glo , Cwlne - i ) b'i' ~ GCO,Cb) Eff .IE?.a.z'ieLz)--EfCb- ¥
,
olda .) x
Beal : line 2-' am → GP zeicxa) scz' ]
N,
- - - he - e → X ly
Notation : fixed set in IR"
z'
= { zf.gg ,yea }
= Cbt Cw E [of 't ' '
L.
) 6C D)>CA Thos
tem : Given z'
,zina is Gaussian wliid components : K !:3
,
=him Con, ( Z'E) Coca ,) , z
'(xn)
g qle-n Wi - " heute e
Con, ( Zina.
.
Za,
IZ' ) iiza.dz
" '
= Cb + Cw ( T (Za) 6 (Zaz) >kle-is
=snccb - Ee. . Ei
"
-idzYES¥⇒ . Geo ,
-
(" "I:;5)FBh
The entries of E'e-"
are" collective observables " :
O'
;"
= ate,
"
flag ; SZFII.ae#I/kYIYz--cb+CnhsfzaiqCzanskTILem : O '
;'= E[ Uhf ) + o Cri't) t T
mm mm
←xx E fr" Z'Y = Z'e)Coca)
Thus,
E[ e-it
'# o 's ] = ELE [ e-it'# o '
( zee I]
= Elf e- IET 55 . de-''
og ;]
K'
= Cbt Cw LOCH) Http) >kid kcgf-nntoxa.gs ypyno
This is a 3D system in K 'LL, kief , Kamp •
"B
xx
(*) seek fixed point for K 'LL : •
*if:c .= Est CE LE > k* B
⇒ folks lR=k* ⇒ kYL=K* He !-
•REE
(a)OK 'LL
= 1 He ⇒ CIL @2) " 7k¥ -- IWalk"I=k* F-
(⇒ OKEY =L He ↳ Cws@ ' 52=1-K* xp -- reactor) Var[zf%I=VaEf%to5⇒q/kYL=kYf=kYf=k* -EE
= e=
"
Tuning to criticality"
Necessary to do if want" nice
"
large d behavior
XavierEx : 6¥ Reluct ) = tllfso He
1¥ :c.÷¥⇒%:D .
Gioro tri
Zina = Iii'
wie} ocz' 'al e b'in .EE/e=oNTK3/ELNtk3- FE + OKEY)summary of we Did :[
" " " ' ' " t'" " " Y "
depth cures laziness" I
① We obtain recursions in l for full Ex : the,
v 'Ll = Efczie!) ] - 3 # [ H'¥532distribution { Zina ,
i - i,. . -
, we ,LEA } =-1
We get recursion'
-
m know" once solve
to all orders in ' In : her'ne-
recursionEA d
Pz !Z) = exp { - ( E÷yz + Z
"
I + zb 'zw tell 'eI"
= (Mick.) +⇐⇒
+. ..
→ -
this shows that @ order Yu only 2nd &+ GET "
7K¥) tree V 'T'
+ Octa)-
-4kcumulants appear ! =L @ crit
② At criticality : we solve V'es,W"
,.. . = Ene + n÷e
,
V'I'
recursions for nearby inputs ( i.e. get
Z'g),7 Z 'T , .
-- ) to obtain :
Pz !.gg#--expf-(EI*+Enz4+Leay2z6+ . . .) }
= C EE Ee '
t c En
=
This shows that depth amplifies finite width
effects :
• Dist to Gaussian t En
• Cov CHE'd'
,Cz 's, = En