Upload
hathu
View
232
Download
1
Embed Size (px)
Citation preview
Powers of Tensors and Fast Matrix Multiplication
François Le Gall
Department of Computer ScienceGraduate School of Information Science and Technology
The University of Tokyo
Simons Institute, 12 November 2014
Overview of our Results
Algebraic Complexity of Matrix Multiplication
• Model: algebraic circuits‣ gates: +, −, ×, ÷ (operations on two elements of the field)‣ input:‣ output:
Compute the product of two n� n matrices A and B over a field F
aij , bij (2n2 inputs)cij =
�nk=1 aikbkj (n2
outputs)
CM (n) = minimal number of algebraic operations needed to compute the product
Exponent of matrix multiplication
� = inf�
� | CM (n) � n� for all large enough n�
Obviously, 2 � � � 3.
note: may depend on the field F
note: may depend on the field F
History of the main improvements on theexponent of square matrix multiplication
Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall This work
analysis of a tensor by the laser method
History of the main improvements on theexponent of square matrix multiplication
Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
History of the main improvements on theexponent of square matrix multiplication
� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3
The tensors considered become more difficult to analyze(technical difficulties appear + the “size” of the tensor increases)
Previous versions (up to v2.2): analyzing the tensor required solving a complicated optimization problem (difficult when the size of the tensor increases)
Our new technique (v2.3): analyzing the tensor (i.e., obtaining an upper bound on ω from it) can be done in time polynomial in the size of the tensor
‣ analysis based on convex optimization
analysis of the tensor by the laser method (LM)
Applications of our methodany tensor from which an upper bound on ω can be
obtained from the laser methodcorresponding upper
bound on ω
Laser-method-based analysis v2.3
polynomial time
which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3
Applications of our method
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
any tensor from which an upper bound on ω can be
obtained from the laser method polynomial time
which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper
� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3
Laser-method-based analysis v2.3 corresponding upper
bound on ω
Applications of our method
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
any tensor from which an upper bound on ω can be
obtained from the laser method polynomial time
which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper
� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3
Laser-method-based analysis v2.3 corresponding upper
bound on ω
Applications of our method
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
any tensor from which an upper bound on ω can be
obtained from the laser method polynomial time
which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper
� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3
Laser-method-based analysis v2.3 corresponding upper
bound on ω
How to Obtain Upper Bounds on ω ?
Goal: compute the product of A =�
a11 a12
a21 a22
�by B =
�b11 b12
b21 b22
�
1. Compute: m1 = a11 � (b12 � b22),
m2 = (a11 + a12) � b22,
m3 = (a21 + a22) � b11,
m4 = a22 � (b21 � b11),
m5 = (a11 + a22) � (b11 + b22),
m6 = (a12 � a22) � (b21 + b22),
m7 = (a11 � a21) � (b11 + b12).
2. Output: �m2 + m4 + m5 + m6 = c11,
m1 + m2 = c12,
m3 + m4 = c21,
m1 �m3 + m5 �m7 = c22.
(for the product of two 2×2 matrices)
7 multiplications 18 additions/substractions
Strassen’s algorithm
Goal: compute the product of A =�
a11 a12
a21 a22
�by B =
�b11 b12
b21 b22
�
1. Compute: m1 = a11 � (b12 � b22),
m2 = (a11 + a12) � b22,
m3 = (a21 + a22) � b11,
m4 = a22 � (b21 � b11),
m5 = (a11 + a22) � (b11 + b22),
m6 = (a12 � a22) � (b21 + b22),
m7 = (a11 � a21) � (b11 + b12).
2. Output: �m2 + m4 + m5 + m6 = c11,
m1 + m2 = c12,
m3 + m4 = c21,
m1 �m3 + m5 �m7 = c22.
7 multiplications 18 additions/substractions
Strassen’s algorithm
Recursive application gives CM (2k) = O(7k)=� � � log2(7) = 2.807... [Strassen 69]
= O((2k)log2 7)
(for the product of two 2k×2k matrices)
7 multiplications 18 additions/substractions
Strassen’s algorithm
Recursive application gives CM (2k) = O(7k)=� � � log2(7) = 2.807... [Strassen 69]
= O((2k)log2 7)
(for the product of two 2k×2k matrices)
Suppose that the product of two m�m matrices can be computedwith t multiplications. Then
� � logm(t) or, equivalently, m� � t.
More generally:
Strassen’s algorithm is the case m = 2 and t = 7
The tensor of matrix multiplication
intuitive interpretation: ‣ this is a formal sum
‣ when the aik and the bkj are replaced by the corresponding entries of matrices, the coefficient of cij becomes
�nk=1 aikbkj
DefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�
i=1
p�
j=1
n�
k=1
aik � bkj � cij .�m,n, p� =
General 3-tensorsConsider three vector spaces U , V and W over FTake bases of U, V and W : U = span{x1, . . . , xdim(U)}
V = span{y1, . . . , ydim(V )}
A tensor over (U, V,W ) is an element of U � V �W
i.e., a formal sum T =dim(U)�
u=1
dim(V )�
v=1
dim(W )�
w=1
duvw xu � yv � zw
� F
“a three-dimension array with dim(U)� dim(V )� dim(W ) entries in F”
W = span{z1, . . . , zdim(W )}
General 3-tensorsA tensor over (U, V,W ) is an element of U � V �W
i.e., a formal sum T =dim(U)�
u=1
dim(V )�
v=1
dim(W )�
w=1
duvw xu � yv � zw
� F
DefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�
i=1
p�
j=1
n�
k=1
aik � bkj � cij .
dim(U) = mn, dim(V ) = np and dim(W ) = mp
U = span�{aik}1�i�m,1�k�n
�
V = span�{bk�j}1�k��n,1�j�p
�
W = span�{ci�j�}1�i��m,1�j��p
�
�m,n, p� =
dikk�ji�j� =�
1 if i = i�, j = j�, k = k�
0 otherwise
RankDefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�
i=1
p�
j=1
n�
k=1
aik � bkj � cij .�m,n, p� =
R(�2, 2, 2�) � 7
�2, 2, 2� =a11 � (b12 � b22)� (c12 + c22)+ (a11 + a12)� b22 � (�c11 + c12)+ (a21 + a22)� b11 � (c21 � c22)+ a22 � (b21 � b11)� (c11 + c21)+ (a11 + a22)� (b11 + b22)� (c11 + c22)+ (a12 � a22)� (b21 + b22)� c11
+ (a11 � a21)� (b11 + b12)� (�c22)
Strassen’s algorithm gives
rank = # of multiplications of the best (bilinear) algorithm
R(�m,n, p�) � mnp
How to obtain upper bounds on ω ?Remember:
In our terminology: R(�m,m, m�) � t =� m� � t
Theorem
R(�m,n, p�) � t =� (mnp)�/3 � t
border rank
Second generalization:
Suppose that the product of two m�m matrices can be computedwith t multiplications. Then
� � logm(t) or, equivalently, m� � t.
Theorem
R(�m,n, p�) � t =� (mnp)�/3 � t
First generalization:
[Bini et al. 1979]
R(�m,n, p�) � R(�m,n, p�)
How to obtain upper bounds on ω ?Theorem (the asymptotic sum inequality, special case) [Schönhage 1981]
R(�m1, n1, p1� � �m2, n2, p2�) � t =� (m1n1p1)�/3 + (m2n2p2)�/3 � t
Third generalization:
Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]
R
�k�
i=1
�mi, ni, pi��� t =�
k�
i=1
(minipi)�/3 � t
direct sum
Theorem
R(�m,n, p�) � t =� (mnp)�/3 � t
border rank
Second generalization:
Theorem
R(�m,n, p�) � t =� (mnp)�/3 � t
First generalization:
[Bini et al. 1979]
R(�m,n, p�) � R(�m,n, p�)
History of the main improvements on theexponent of square matrix multiplication
Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini et al.� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
upper bound on ω from the analysis of the rank of a tensor
analysis of the border rank of a tensor
analysis of a tensor by the
asymptotic sum inequality
analysis of a tensor by the laser method
The Laser Method on a Simpler Example
V. Strassen.Algebra and Complexity. Proceedings of the first European Congress of Mathematics, pp. 429-446, 1994.
from
The “laser method” Why this is called the “laser method”?
Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall
Ref. [27]variants (improvements)
of the laser method
The first CW constructionLet q be a positive integer.
Consider three vector spaces U , V and W of dimension q + 1 over F.
U = span{x0, . . . , xq}V = span{y0, . . . , yq} W = span{z0, . . . , zq}
Coppersmith and Winograd (1987) introduced the following tensor:
Teasy = T 011easy + T 101
easy + T 110easy,
where T 011easy =
q�
i=1
x0 � yi � zi
T 101easy =
q�
i=1
xi � y0 � zi
T 110easy =
q�
i=1
xi � yi � z0
�= �1, 1, q�
�= �1, q, 1�
�= �q, 1, 1�
T 011easy =
q�
i=1
x00 � y0i � z0i
T 101easy =
q�
i=1
xi0 � y00 � zi0
T 110easy =
q�
i=1
x0i � yi0 � z00
1×1 matrix by 1×q matrix
q×1 matrix by 1×1 matrix
1×q matrix by q×1 matrix
tensor over (U, V,W )
The first CW constructionU = span{x0, . . . , xq}V = span{y0, . . . , yq} W = span{z0, . . . , zq}
Coppersmith and Winograd (1987) introduced the following tensor:
Teasy = T 011easy + T 101
easy + T 110easy,
where
U = U0 � U1, where U0 = span{x0} and U1 = span{x1, . . . , xq}V = V0 � V1, where V0 = span{y0} and V1 = span{y1, . . . , yq}W = W0 �W1, where W0 = span{z0} and W1 = span{z1, . . . , zq}
tensor over (U0, V1,W1)
tensor over (U1, V0,W1)
tensor over (U1, V1,W0)
T 011easy =
q�
i=1
x0 � yi � zi
T 101easy =
q�
i=1
xi � y0 � zi
T 110easy =
q�
i=1
xi � yi � z0
This is not a direct sum
The first CW constructionTeasy = T 011
easy + T 101easy + T 110
easy
Since the sum in not direct, we cannot use the asymptotic sum inequality
Consider T�2easy = (T 011
easy + T 101easy + T 110
easy)� (T 011easy + T 101
easy + T 110easy)
= T 011easy � T 011
easy + T 011easy � T 101
easy + · · · + T 110easy � T 110
easy (9 terms)
Consider T�Neasy = T 011
easy � · · ·� T 011easy + · · · + T 110
easy � · · ·� T 110easy (3N terms)
Coppersmith and Winograd showed how to select terms that ��
322/3
�N
do not share variables (i.e., form a direct sum)
R(Teasy) � q + 2
R(Teasy) = q + 2Actually,
Note: R(T�Neasy ) = (q + 1)N+o(N) would imply � = 2
by zeroing variables(i.e., without increasing the rank)
Theorem [Coppermith and Winograd 87]
The first CW construction: Analysis
=�
322/3
�(1�o(1))N
Consider T�Neasy = T 011
easy � · · ·� T 011easy + · · · + T 110
easy � · · ·� T 110easy (3N terms)
N copies of T 011easy
0 copies of T 101easy
0 copies of T 110easy
0 copies of T 011easy
0 copies of T 101easy
N copies of T 110easy
H�
13 , 2
3
�= � 1
3 log�
13
�� 2
3 log�
23
�(entropy)
= log�31/3 �
�32
�2/3�
= log�
322/3
�
The tensor T�Neasy can be converted into a direct sum of
exp��
H
�13,23
�� o(1)
�N
�
terms, each containing N3 copies of T 011
easy,N3 copies of T 101
easy and N3 copies of T 110
easy.
by zeroing variables(i.e., without increasing the rank)
Theorem [Coppermith and Winograd 87]
The first CW construction: Analysis
=�
322/3
�(1�o(1))N
isomorphic to�T 011
easy
��N/3 ��T 101
easy
��N/3 ��T 110
easy
��N/3 �=�qN/3, qN/3, qN/3
�
Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]
R
�k�
i=1
�mi, ni, pi��� t =�
k�
i=1
(minipi)�/3 � t
Consequence:�
322/3
�(1�o(1))N
� qN�/3 � R(T�Neasy ) � R(Teasy)N = (q + 2)N
=� 322/3
� q�/3 � q + 2 =� � � 2.403... for q = 8
The tensor T�Neasy can be converted into a direct sum of
exp��
H
�13,23
�� o(1)
�N
�
terms, each containing N3 copies of T 011
easy,N3 copies of T 101
easy and N3 copies of T 110
easy.
Idea behind the proofTeasy = T 011
easy + T 101easy + T 110
easy T 011easy =
q�
i=1
x0 � yi � zi
T 101easy =
q�
i=1
xi � y0 � zi
T 110easy =
q�
i=1
xi � yi � z0
= (T 011easy + T 101
easy + T 110easy)� (T 011
easy + T 101easy + T 110
easy)
= T 011easy � T 011
easy + T 011easy � T 101
easy + · · · + T 110easy � T 110
easy (9 terms)
tensor over (U0 � U1)� (V1 � V0)� (W1 �W1)label 011011
011011001111 111100011110 100111 110011
110110 101101 111001
T�2easy
Consider N = 2
T 011easy � T 101
easy =q�
i,i�=0
(x0 � xi�)� (yi � y0)� (zi � zi�)
tensor over (U0 � U1)� (V1 � V0)� (W1 �W1)T 011
easy � T 101easy =
q�
i,i�=0
(x0 � xi�)� (yi � y0)� (zi � zi�)
Idea behind the proof
= (T 011easy + T 101
easy + T 110easy)� (T 011
easy + T 101easy + T 110
easy)
= T 011easy � T 011
easy + T 011easy � T 101
easy + · · · + T 110easy � T 110
easy (9 terms)011011001111 111100
011110 100111 110011110110 101101 111001
T 011easy � T 011
easy =q�
i,i�=0
(x0 � x0)� (yi � yi�)� (zi � zi�)tensor over (U0 � U0)� (V1 � V1)� (W1 �W1)
SHARE VARIABLES
T�2easy
Consider N = 2
label 011011
remove this term(e.g., by setting all variables in V1 � V1 to zero)note: this removes more than one term!
by setting all variables in U1 � U0, V0 � V0and W0 �W1 to zero
Idea behind the proofConclusion: we can convert (a sum of 9 terms) into a direct sum of 2 termsT�2
easy
Consider T�Neasy = T 011
easy � · · ·� T 011easy + · · · + T 110
easy � · · ·� T 110easy (3N terms)
1 · · · 11 · · · 10 · · · 0labels:3N 3N
NEXT STEP
0 · · · 01 · · · 11 · · · 1
Idea behind the proof
number of possibilities
The proof of this theorem is based on a complicated construction using the existence of dense sets of integers with no three-term arithmetic progression
3N
#0 = N/3#1 = 2N/3
#0 = N/3#1 = 2N/3
#0 = N/3#1 = 2N/3
0 · · · 1 1 · · · 0 0 · · · 1
Theorem [Coppermith and Winograd 87]
=�
322/3
�(1�o(1))N
The tensor T�Neasy can be converted into a direct sum of
exp��
H
�13,23
�� o(1)
�N
�
terms, each containing N3 copies of T 011
easy,N3 copies of T 101
easy and N3 copies of T 110
easy.
�N
N3 , 2N
3
�� exp
�H
�13,23
�N
�
that do not share a blue part, a red part or a green part
We can obtain labels of the form�
322/3
�(1�o(1))N
General Formulationof the Laser Method
and Reinterpretation
The laser method: general formulation
V�(T ) = limN��
V�,N (T )1/N The value of T
For any tensor T , any N � 1 and any � � [2, 3] define V�,N (T )
isomorphic to�k
i=1�mi, ni, pi�
Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]k�
i=1
(minipi)�/3 � R
�k�
i=1
�mi, ni, pi��
Otherwise we use V�(T ) = V�(T � �T � �2T )1/3This is the definition for symmetric tensors.
This is an increasing function of ρV�(T � T �) � V�(T ) + V�(T �) V�(T � T �) � V�(T )� V�(T �)
as the maximum of�k
i=1(minipi)�/3 over all restrictions of T�N
V�(�m,n, p�) = (mnp)�/3
V�(T ) = limN��
V�,N (T )1/N The value of T
For any tensor T , any N � 1 and any � � [2, 3] define V�,N (T )
isomorphic to�k
i=1�mi, ni, pi�
Example: The first CW construction
Theorem [Coppermith and Winograd 87]
=�
322/3
�(1�o(1))N
isomorphic to�T 011
easy
��N/3 ��T 101
easy
��N/3 ��T 110
easy
��N/3 �=�qN/3, qN/3, qN/3
�
V�,N (Teasy) ��
322/3
�(1�o(1))N
� q�N/3V�(Teasy) �
322/3
� q�/3
The tensor T�Neasy can be converted into a direct sum of
exp��
H
�13,23
�� o(1)
�N
�
terms, each containing N3 copies of T 011
easy,N3 copies of T 101
easy and N3 copies of T 110
easy.
as the maximum of�k
i=1(minipi)�/3 over all restrictions of T�N
The laser method: general formulation
Theorem (simple generalization of the asymptotic sum inequality)
V�(T ) � R(T )
V�(T ) = limN��
V�,N (T )1/N The value of T
V�(�m,n, p�) = (mnp)�for instance,
For any tensor T , any N � 1 and any � � [2, 3] define V�,N (T )
isomorphic to�k
i=1�mi, ni, pi�
Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]k�
i=1
(minipi)�/3 � R
�k�
i=1
�mi, ni, pi��
as the maximum of�k
i=1(minipi)�/3 over all restrictions of T�N
The laser method: general formulationConsider three vector spaces U , V and W over F
Assume that U , V and W are decomposed as
U =�
i�I
Ui V =�
j�J
Vj W =�
k�K
Wk for some I, J, K � Z
A tensor T over (U, V,W ) is an element of U � V �W
The tensor T is a partitioned tensor (with respect to this decomposition) if
T =�
(i,j,k)�I�J�K
Tijk
where Tijk � Ui � Vj �Wk for each (i, j, k) � I � J �K
it can be written as
support of the tensor: supp(T ) = {(i, j, k) � I � J �K | Tijk �= 0}each non-zero Tijk is called a component of T
We say that the tensor is tight if there exists some integer d such thati + j + k = d for all (i, j, k) � supp(T )
Example: The first CW construction
Teasy = T 011easy + T 101
easy + T 110easy,
where
U = U0 � U1, where U0 = span{x0} and U1 = span{x1, . . . , xq}V = V0 � V1, where V0 = span{y0} and V1 = span{y1, . . . , yq}W = W0 �W1, where W0 = span{z0} and W1 = span{z1, . . . , zq}
tensor over (U0, V1,W1)
tensor over (U1, V0,W1)
tensor over (U1, V1,W0)
T 011easy =
q�
i=1
x0 � yi � zi
T 101easy =
q�
i=1
xi � y0 � zi
T 110easy =
q�
i=1
xi � yi � z0
supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)}
I = {0, 1}J = {0, 1}K = {0, 1}
i + j + k = 2 for all (i, j, k) � supp(Teasy)it is tight, since
Main Theorem [LG 14] (reinterpretation of prior works)
The laser method: general formulation
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
H: entropy
�(P ): to be defined later (zero in the case of simple tensors)
Conclusion: we can compute a lower bound on the value of T if we knowa lower bound on the value of each component
we can then obtain an upper bound on ω via V�(T ) � R(T )
concretely, we use V�(T ) � R(T ) =� � � � and do a binary search on ρ
P�: projection of P along the �-th coordinate (= marginal distribution)
Example: The first CW constructionMain Theorem [LG 14] (reinterpretation of prior works)
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
H: entropy
�(P ): to be defined later (zero in the case of simple tensors)
supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)} V�(T 011easy) = V�(�1, 1, q�) = q�/3
V�(T 101easy) = V�(�q, 1, 1�) = q�/3
V�(T 110easy) = V�(�1, q, 1�) = q�/3
P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/3
P1(0) = 1/3, P1(1) = 2/3 and P2 = P3 = P1
�(P ) = 0
log(V�(Teasy)) � H
�13,23
�+
13
log�q�/3
�+
13
log�q�/3
�+
13
log�q�/3
�
P�: projection of P along the �-th coordinate (= marginal distribution)
Example: The first CW constructionTheorem [Coppersmith and Winograd 87]
=�
322/3
�(1�o(1))N
supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)} V�(T 011easy) = V�(�1, 1, q�) = q�/3
V�(T 101easy) = V�(�q, 1, 1�) = q�/3
V�(T 110easy) = V�(�1, q, 1�) = q�/3
P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/3
P1(0) = 1/3, P1(1) = 2/3 and P2 = P3 = P1
�(P ) = 0
V�,N (Teasy) ��
322/3
�(1�o(1))N
� q�N/3V�(Teasy) �
322/3
� q�/3
log(V�(Teasy)) � H
�13,23
�+
13
log�q�/3
�+
13
log�q�/3
�+
13
log�q�/3
�
The tensor T�Neasy can be converted into a direct sum of
exp��
H
�13,23
�� o(1)
�N
�
terms, each containing N3 copies of T 011
easy,N3 copies of T 101
easy and N3 copies of T 110
easy.
Main Theorem [LG 14]
The laser method: general formulation
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
Interpretation: the laser method enables us to convert (by zeroing variables)
T�N into a direct sum of
terms, each isomorphic to �
(i,j,k)�supp(T )
[T ijk]�P (i,j,k)N
exp
��3�
�=1
H(P�)3
� �(P )� o(1)
�N
�
The second CW constructionLet q be a positive integer.
Coppersmith and Winograd (1987) considered the following tensor:
Consider three vector spaces U , V and W of dimension q + 2 over F.
U = span{x0, . . . , xq, xq+1}
V = span{y0, . . . , yq, yq+1}W = span{z0, . . . , zq, zq+1}
where
TCW = T 011CW + T 101
CW + T 110CW + T 002
CW + T 020CW + T 200
CW ,
T 011CW = T 011
easy
T 101CW = T 101
easy
T 110CW = T 110
easy
and T 002CW = x0 � y0 � zq+1
�= �1, 1, 1�T 020
CW = x0 � yq+1 � z0�= �1, 1, 1�
T 200CW = xq+1 � y0 � z0
�= �1, 1, 1� .
R(TCW) = q + 2TCW = Teasy + T 002CW + T 020
CW + T 200CW
The second CW construction
This is not a direct sum
U = span{x0, . . . , xq, xq+1}V = span{y0, . . . , yq, yq+1}
W = span{z0, . . . , zq, zq+1}
T 011CW tensor over (U0, V1,W1)
T 101CW tensor over (U1, V0,W1)
T 110CW tensor over (U1, V1,W0)
TCW = T 011CW + T 101
CW + T 110CW + T 002
CW + T 020CW + T 200
CW
U = U0 � U1�U2, where U0 = span{x0}, U1 = span{x1, . . . , xq} and U2 = span{xq+1}V = V0 � V1�V2, where V0 = span{y0}, V1 = span{y1, . . . , yq} and V2 = span{yq+1}W = W0 �W1�W2, where W0 = span{z0}, W1 = span{z1, . . . , zq} and W2 = span{zq+1}
T 002CW = x0 � y0 � zq+1 tensor over (U0, V0,W2)
T 020CW = x0 � yq+1 � z0 tensor over (U0, V2,W0)
T 200CW = xq+1 � y0 � z0 tensor over (U2, V0,W0)
T 002CW = x0 � y0 � zq+1
�= �1, 1, 1�T 020
CW = x0 � yq+1 � z0�= �1, 1, 1�
T 200CW = xq+1 � y0 � z0
�= �1, 1, 1�
The second CW construction: laser method
T 011CW tensor over (U0, V1,W1)
T 101CW tensor over (U1, V0,W1)
T 110CW tensor over (U1, V1,W0)
TCW = T 011CW + T 101
CW + T 110CW + T 002
CW + T 020CW + T 200
CW
supp(TCW) = {(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)}
T 002CW = x0 � y0 � zq+1 tensor over (U0, V0,W2)
T 020CW = x0 � yq+1 � z0 tensor over (U0, V2,W0)
T 200CW = xq+1 � y0 � z0 tensor over (U2, V0,W0)
T 002CW = x0 � y0 � zq+1
�= �1, 1, 1�T 020
CW = x0 � yq+1 � z0�= �1, 1, 1�
T 200CW = xq+1 � y0 � z0
�= �1, 1, 1�
V�(T 002CW ) = V�(T 020
CW ) = V�(T 200CW ) = 1 V�(T 011
CW ) = V�(T 101CW ) = V�(T 110
CW ) = q�/3
P1(0) = � + 2(1/3� �), P1(1) = 2�, P1(2) = (1/3� �)
P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = �with 0 � � � 1/3take
P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/3� �)
The second CW construction: laser method
Main Theorem [LG 14]
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
=� log(V�(TCW)) � H
�23� �, 2�,
13� �
�+ log(q��)
V�(TCW) � R(TCW) = q + 2combined with
� � 2.38718... for q = 6 and � = 0.3173this gives
supp(TCW) = {(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)}
V�(T 002CW ) = V�(T 020
CW ) = V�(T 200CW ) = 1 V�(T 011
CW ) = V�(T 101CW ) = V�(T 110
CW ) = q�/3
P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = �with 0 � � � 1/3take
P1(0) = � + 2(1/3� �), P1(1) = 2�, P1(2) = (1/3� �)
P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/3� �)
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
Analysis of the second construction
T�2CW = (T 011
CW + T 101CW + T 110
CW + T 002CW + T 020
CW + T 200CW )�2 (36 terms)
T�2CW = T 400 + T 040 + T 004 + T 310 + T 301 + T 103 + T 130 + T 013
+ T 031 + T 220 + T 202 + T 022 + T 211 + T 121 + T 112,
R(T�2CW) � (q + 2)2
where
T 400 = T 200CW � T 200
CW ,
T 310 = T 200CW � T 110
CW + T 110CW � T 200
CW ,
T 220 = T 200CW � T 020
CW + T 020CW � T 200
CW + T 110CW � T 110
CW ,
T 211 = T 200CW � T 011
CW + T 011CW � T 200
CW + T 110CW � T 101
CW + T 101CW � T 110
CW ,
and the other 11 terms are obtained by permuting the variables(e.g., T 040 = T 020
CW � T 020CW ).
Idea: rewrite it as a (non-direct) sum of 15 terms by regrouping terms
Analysis of the second power
MERGING
T�2CW = T 400 + T 040 + T 004 + T 310 + T 301 + T 103 + T 130 + T 013
+ T 031 + T 220 + T 202 + T 022 + T 211 + T 121 + T 112,
where
T 400 = T 200CW � T 200
CW ,
T 310 = T 200CW � T 110
CW + T 110CW � T 200
CW ,
T 220 = T 200CW � T 020
CW + T 020CW � T 200
CW + T 110CW � T 110
CW ,
T 211 = T 200CW � T 011
CW + T 011CW � T 200
CW + T 110CW � T 101
CW + T 101CW � T 110
CW ,
and the other 11 terms are obtained by permuting the variables(e.g., T 040 = T 020
CW � T 020CW ).
Analysis of the second powersupp(T�2
CW) = {(4, 0, 0), . . . , (0, 0, 4), (3, 1, 0), . . . , (0, 1, 3), (2, 2, 0), . . . , (0, 2, 2), (2, 1, 1), . . . , (1, 1, 2)}3 permutations 6 permutations 3 permutations 3 permutations
lower bounds on the values of each component can be computed (recursively)choice of distribution: P (4, 0, 0) = . . . = P (0, 0, 4) = �, P (3, 1, 0) = . . . = P (0, 1, 3) = �
P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �(4-1=3 parameters)
Analysis of the second powersupp(T�2
CW) = {(4, 0, 0), . . . , (0, 0, 4), (3, 1, 0), . . . , (0, 1, 3), (2, 2, 0), . . . , (0, 2, 2), (2, 1, 1), . . . , (1, 1, 2)}3 permutations 6 permutations 3 permutations 3 permutations
lower bounds on the values of each component can be computed (recursively)choice of distribution: P (4, 0, 0) = . . . = P (0, 0, 4) = �, P (3, 1, 0) = . . . = P (0, 1, 3) = �
P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �
we have �(P ) = 0
(4-1=3 parameters)
Analysis of the second powersupp(T�2
CW) = {(4, 0, 0), . . . , (0, 0, 4), (3, 1, 0), . . . , (0, 1, 3), (2, 2, 0), . . . , (0, 2, 2), (2, 1, 1), . . . , (1, 1, 2)}3 permutations 6 permutations 3 permutations 3 permutations
lower bounds on the values of each component can be computed (recursively)choice of distribution: P (4, 0, 0) = . . . = P (0, 0, 4) = �, P (3, 1, 0) = . . . = P (0, 1, 3) = �
P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �
we have �(P ) = 0
(4-1=3 parameters)
Main Theorem [LG 14]
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
Theorem
V�(T ) � R(T ) =� � � 2.3755... for q = 6 and � = 0.00023, � = 0.0125,
� = 0.10254 and � = 0.2056
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
Analysis of the second power
What about the third power (using similar merging schemes)?
this does not give any improvement
Analysis of the fourth powerT�4
CW = (T 011CW + T 101
CW + T 110CW + T 002
CW + T 020CW + T 200
CW )�4 (64 terms)
R(T�4CW) � (q + 2)4
Idea: rewrite it as a (non-direct) sum of a smaller number of terms by regrouping terms
T�4CW = T 800 + T 710 + T 620 + T 611 + T 530 + T 521 + T 440 + T 431
+ T 422 + T 332 + permutations of these terms
T 080, T 008, T 701, T 107, T 170, T 017, T 071, . . .
10-1=9 parameters for the probability distributionthis time �(P ) �= 0
Main Theorem [LG 14]
The laser method: general formulation
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
H: entropy
�(P ): to be defined later (zero in the case of simple tensors)P�: projection of P along the �-th coordinate (= marginal distribution)
Main Theorem [LG 14]
The laser method: general formulation
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
H: entropy
�(P ): to be defined later (zero in the case of simple tensors)
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )
when the structure of support is simple, we typically haveP1 = Q1, P2 = Q2, P3 = Q3 =� P = Q and thus �(P ) = 0
P�: projection of P along the �-th coordinate (= marginal distribution)
The laser method: general formulationInterpretation: the laser method enables us to convert (by zeroing variables)
T�N into a direct sum of
terms, each isomorphic to �
(i,j,k)�supp(T )
[T ijk]�P (i,j,k)N
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )
“type P”
we can control only the choice of the marginal distributions P1, P2 and P3
what we obtain is a (non-direct) sum of all “type Q” termsthe most frequent terms are those with Q maximizing H(Q) the fact that “type P” are not the most frequent introduces the penalty term -Γ(P)
exp
��3�
�=1
H(P�)3
� �(P )� o(1)
�N
�
Main Theorem [LG 14]
The laser method: computing the bound
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
How to find the best distribution for a given ρ?
If Γ(P) = 0 for all distributions P, the best distribution can be done efficiently (numerically) using convex optimization
maximization of a concave function under linear constraints
assume that (a lower bound on) each V�(Tijk) is known
linearconcave =0
Main Theorem [LG 14]
The laser method: computing the bound
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
How to find the best distribution for a given ρ?assume that (a lower bound on) each V�(Tijk) is known
linearconcave ???
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )In general:
hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]
The laser method: computing the bound
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )In general:
hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
Main Theorem [LG 14]
The laser method: computing the bound
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
How to find the best distribution for a given ρ?assume that (a lower bound on) each V�(Tijk) is known
linearconcave ???
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )In general:
Simplification: restrict the search to the set of distributions P such that �(P ) = 0
hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]
still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]
The laser method: computing the bound
Simplification: restrict the search to the set of distributions P such that �(P ) = 0still hard to solve, but can be done up to the 8th power of the CW tensor
[Vassilevska-Williams 12]
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
Main Theorem [LG 14]
The laser method: computing the bound
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
How to find the best distribution for a given ρ?assume that (a lower bound on) each V�(Tijk) is known
linearconcave ???
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )In general:
Simplification: restrict the search to the set of distributions P such that �(P ) = 0
hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]
still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]
The laser method: general formulationMain Theorem [LG 14]
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
call this expression f(P )
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )
Efficient method to find a solution [LG 14](close to the optimal solution):
The laser method: general formulationMain Theorem [LG 14]
For any tight partitioned tensor T , any probability distribution P over supp(T ),
and any � � [2, 3], we have
log(V�(T )) �3�
�=1
H(P�)3
+�
(i,j,k)�supp(T )
P (i, j, k) log(V�(Tijk))� �(P ).
call this expression f(P )
where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3
�(P ) = max[H(Q)]�H(P )
1. find a distribution P that maximizes f(P ), and call it P
2. find the distribution Q that maximizes H(Q) under the constraints
Q1 = P1, Q2 = P2 and Q3 = P3. Call it Q.
3. output f(Q)
Since �(Q) = 0, we have log(V�(T )) � f(Q) from the theorem
concave objective function, linear constraints
concave objective function, linear constraints
Efficient method to find a solution [LG 14](close to the optimal solution):
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
Analysis of power 16 and 32
solutions to the optimization problems obtained numerically by convex optimization
Conclusion
any tight partitioned tensor for which (lower bounds on) the value
of each component is knownupper bound on ω
polynomial time
m Upper boundNumber of variables in
Authorsin the optimization problem
1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)
analysis of the m-th power of the tensor by CW
We constructed a time-efficient implementation of the laser method
We applied it to study higher powers of the basic tensor by CW
recent result [Ambainis, Filmus, LG 14]:studying higher powers (using the same approach) cannot give an upper bound better than 2.3725
Laser-method-based analysis (v2.3)
convex optimization