Powers of Tensors and Fast Matrix Multiplication · PDF filePowers of Tensors and Fast Matrix Multiplication ... ‣ gates: ... which tensor? powers of the basic tensor from Coppersmith

Powers of Tensors and Fast Matrix Multiplication

François Le Gall

Department of Computer ScienceGraduate School of Information Science and Technology

The University of Tokyo

Simons Institute, 12 November 2014

Overview of our Results

Algebraic Complexity of Matrix Multiplication

• Model: algebraic circuits‣ gates: +, −, ×, ÷ (operations on two elements of the field)‣ input:‣ output:

Compute the product of two n� n matrices A and B over a field F

aij , bij (2n2 inputs)cij =

�nk=1 aikbkj (n2

outputs)

CM (n) = minimal number of algebraic operations needed to compute the product

Exponent of matrix multiplication

� = inf�

� | CM (n) � n� for all large enough n�

Obviously, 2 � � � 3.

note: may depend on the field F

note: may depend on the field F

History of the main improvements on theexponent of square matrix multiplication

Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall This work

analysis of a tensor by the laser method


Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall


� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

LM-based analysis v1LM-based analysis v2.0LM-based analysis v2.1LM-based analysis v2.2LM-based analysis v2.3

The tensors considered become more difficult to analyze(technical difficulties appear + the “size” of the tensor increases)

Previous versions (up to v2.2): analyzing the tensor required solving a complicated optimization problem (difficult when the size of the tensor increases)

Our new technique (v2.3): analyzing the tensor (i.e., obtaining an upper bound on ω from it) can be done in time polynomial in the size of the tensor

‣ analysis based on convex optimization

analysis of the tensor by the laser method (LM)

Applications of our methodany tensor from which an upper bound on ω can be

obtained from the laser methodcorresponding upper

bound on ω

Laser-method-based analysis v2.3

polynomial time

which tensor? powers of the basic tensor from Coppersmith and Winograd’s paper

m Upper boundNumber of variables in

Authorsin the optimization problem

1 � < 2.3871900 1 CW (1987)2 � < 2.3754770 3 CW (1987)4 � < 2.3729269 9 Stothers (2010)8 � < 2.3729 29 Vassilevska Williams (2012)16 � < 2.3728640 101 Le Gall (2014)32 � < 2.3728639 373 Le Gall (2014)

analysis of the m-th power of the tensor by CW



Applications of our method





any tensor from which an upper bound on ω can be

obtained from the laser method polynomial time




Laser-method-based analysis v2.3 corresponding upper

bound on ω












bound on ω












bound on ω

How to Obtain Upper Bounds on ω ?

Goal: compute the product of A =�

a11 a12

a21 a22

�by B =

�b11 b12

b21 b22

�

1. Compute: m1 = a11 � (b12 � b22),

m2 = (a11 + a12) � b22,

m3 = (a21 + a22) � b11,

m4 = a22 � (b21 � b11),

m5 = (a11 + a22) � (b11 + b22),

m6 = (a12 � a22) � (b21 + b22),

m7 = (a11 � a21) � (b11 + b12).

2. Output: �m2 + m4 + m5 + m6 = c11,

m1 + m2 = c12,

m3 + m4 = c21,

m1 �m3 + m5 �m7 = c22.

(for the product of two 2×2 matrices)

7 multiplications 18 additions/substractions

Strassen’s algorithm

Goal: compute the product of A =�

a11 a12

a21 a22

�by B =

�b11 b12

b21 b22

�

1. Compute: m1 = a11 � (b12 � b22),

m2 = (a11 + a12) � b22,

m3 = (a21 + a22) � b11,

m4 = a22 � (b21 � b11),

m5 = (a11 + a22) � (b11 + b22),

m6 = (a12 � a22) � (b21 + b22),

m7 = (a11 � a21) � (b11 + b12).

2. Output: �m2 + m4 + m5 + m6 = c11,

m1 + m2 = c12,

m3 + m4 = c21,

m1 �m3 + m5 �m7 = c22.



Recursive application gives CM (2k) = O(7k)=� � � log2(7) = 2.807... [Strassen 69]

= O((2k)log2 7)

(for the product of two 2k×2k matrices)



Recursive application gives CM (2k) = O(7k)=� � � log2(7) = 2.807... [Strassen 69]

= O((2k)log2 7)

(for the product of two 2k×2k matrices)

Suppose that the product of two m�m matrices can be computedwith t multiplications. Then

� � logm(t) or, equivalently, m� � t.

More generally:

Strassen’s algorithm is the case m = 2 and t = 7

The tensor of matrix multiplication

intuitive interpretation: ‣ this is a formal sum

‣ when the aik and the bkj are replaced by the corresponding entries of matrices, the coefficient of cij becomes

�nk=1 aikbkj

DefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�

i=1

p�

j=1

n�

k=1

aik � bkj � cij .�m,n, p� =

General 3-tensorsConsider three vector spaces U , V and W over FTake bases of U, V and W : U = span{x1, . . . , xdim(U)}

V = span{y1, . . . , ydim(V )}

A tensor over (U, V,W ) is an element of U � V �W

i.e., a formal sum T =dim(U)�

u=1

dim(V )�

v=1

dim(W )�

w=1

duvw xu � yv � zw

� F

“a three-dimension array with dim(U)� dim(V )� dim(W ) entries in F”

W = span{z1, . . . , zdim(W )}

General 3-tensorsA tensor over (U, V,W ) is an element of U � V �W

i.e., a formal sum T =dim(U)�

u=1

dim(V )�

v=1

dim(W )�

w=1

duvw xu � yv � zw

� F

DefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�

i=1

p�

j=1

n�

k=1

aik � bkj � cij .

dim(U) = mn, dim(V ) = np and dim(W ) = mp

U = span�{aik}1�i�m,1�k�n

�

V = span�{bk�j}1�k��n,1�j�p

�

W = span�{ci�j�}1�i��m,1�j��p

�

�m,n, p� =

dikk�ji�j� =�

1 if i = i�, j = j�, k = k�

0 otherwise

RankDefinitionThe tensor corresponding to the multiplication of an m� n matrix byan n� p matrix is m�

i=1

p�

j=1

n�

k=1

aik � bkj � cij .�m,n, p� =

R(�2, 2, 2�) � 7

�2, 2, 2� =a11 � (b12 � b22)� (c12 + c22)+ (a11 + a12)� b22 � (�c11 + c12)+ (a21 + a22)� b11 � (c21 � c22)+ a22 � (b21 � b11)� (c11 + c21)+ (a11 + a22)� (b11 + b22)� (c11 + c22)+ (a12 � a22)� (b21 + b22)� c11

+ (a11 � a21)� (b11 + b12)� (�c22)

Strassen’s algorithm gives

rank = # of multiplications of the best (bilinear) algorithm

R(�m,n, p�) � mnp

How to obtain upper bounds on ω ?Remember:

In our terminology: R(�m,m, m�) � t =� m� � t

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

border rank

Second generalization:

Suppose that the product of two m�m matrices can be computedwith t multiplications. Then

� � logm(t) or, equivalently, m� � t.

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

First generalization:

[Bini et al. 1979]

R(�m,n, p�) � R(�m,n, p�)

How to obtain upper bounds on ω ?Theorem (the asymptotic sum inequality, special case) [Schönhage 1981]

R(�m1, n1, p1� � �m2, n2, p2�) � t =� (m1n1p1)�/3 + (m2n2p2)�/3 � t

Third generalization:

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]

R

�k�

i=1

�mi, ni, pi�� t =�

k�

i=1

(minipi)�/3 � t

direct sum

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

border rank

Second generalization:

Theorem

R(�m,n, p�) � t =� (mnp)�/3 � t

First generalization:

[Bini et al. 1979]

R(�m,n, p�) � R(�m,n, p�)


Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini et al.� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

upper bound on ω from the analysis of the rank of a tensor

analysis of the border rank of a tensor

analysis of a tensor by the

asymptotic sum inequality

analysis of a tensor by the laser method

The Laser Method on a Simpler Example

V. Strassen.Algebra and Complexity. Proceedings of the first European Congress of Mathematics, pp. 429-446, 1994.

from

The “laser method” Why this is called the “laser method”?

Upper bound Year Authors� � 3� < 2.81 1969 Strassen� < 2.79 1979 Pan� < 2.78 1979 Bini, Capovani, Romani and Lotti� < 2.55 1981 Schonhage� < 2.53 1981 Pan� < 2.52 1982 Romani� < 2.50 1982 Coppersmith and Winograd� < 2.48 1986 Strassen� < 2.376 1987 Coppersmith and Winograd� < 2.373 2010 Stothers� < 2.3729 2012 Vassilevska Williams� < 2.3728639 2014 Le Gall

Ref. [27]variants (improvements)

of the laser method

The first CW constructionLet q be a positive integer.

Consider three vector spaces U , V and W of dimension q + 1 over F.

U = span{x0, . . . , xq}V = span{y0, . . . , yq} W = span{z0, . . . , zq}

Coppersmith and Winograd (1987) introduced the following tensor:

Teasy = T 011easy + T 101

easy + T 110easy,

where T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

�= �1, 1, q�

�= �1, q, 1�

�= �q, 1, 1�

T 011easy =

q�

i=1

x00 � y0i � z0i

T 101easy =

q�

i=1

xi0 � y00 � zi0

T 110easy =

q�

i=1

x0i � yi0 � z00

1×1 matrix by 1×q matrix

q×1 matrix by 1×1 matrix

1×q matrix by q×1 matrix

tensor over (U, V,W )

The first CW constructionU = span{x0, . . . , xq}V = span{y0, . . . , yq} W = span{z0, . . . , zq}

Coppersmith and Winograd (1987) introduced the following tensor:


easy + T 110easy,

where

U = U0 � U1, where U0 = span{x0} and U1 = span{x1, . . . , xq}V = V0 � V1, where V0 = span{y0} and V1 = span{y1, . . . , yq}W = W0 �W1, where W0 = span{z0} and W1 = span{z1, . . . , zq}

tensor over (U0, V1,W1)



T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

This is not a direct sum

The first CW constructionTeasy = T 011

easy + T 101easy + T 110

easy

Since the sum in not direct, we cannot use the asymptotic sum inequality

Consider T�2easy = (T 011


easy)� (T 011easy + T 101

easy + T 110easy)

= T 011easy � T 011

easy + T 011easy � T 101

easy + · · · + T 110easy � T 110

easy (9 terms)

Consider T�Neasy = T 011

easy � · · ·� T 011easy + · · · + T 110

easy � · · ·� T 110easy (3N terms)

Coppersmith and Winograd showed how to select terms that ��

322/3

�N

do not share variables (i.e., form a direct sum)

R(Teasy) � q + 2

R(Teasy) = q + 2Actually,

Note: R(T�Neasy ) = (q + 1)N+o(N) would imply � = 2

by zeroing variables(i.e., without increasing the rank)

Theorem [Coppermith and Winograd 87]

The first CW construction: Analysis

=�

322/3

�(1�o(1))N


easy � · · ·� T 011easy + · · · + T 110


N copies of T 011easy

0 copies of T 101easy




N copies of T 110easy

H�

13 , 2

3

�= � 1

3 log�

13

�� 2

3 log�

23

�(entropy)

= log�31/3 �

�32

�2/3�

= log�

322/3

�

The tensor T�Neasy can be converted into a direct sum of

exp��

H

�13,23

�� o(1)

�N

�

terms, each containing N3 copies of T 011

easy,N3 copies of T 101

easy and N3 copies of T 110

easy.

by zeroing variables(i.e., without increasing the rank)


The first CW construction: Analysis

=�

322/3

�(1�o(1))N

isomorphic to�T 011

easy

��N/3 ��T 101

easy

��N/3 ��T 110

easy

��N/3 �=�qN/3, qN/3, qN/3

�

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]

R

�k�

i=1

�mi, ni, pi�� t =�

k�

i=1

(minipi)�/3 � t

Consequence:�

322/3

�(1�o(1))N

� qN�/3 � R(T�Neasy ) � R(Teasy)N = (q + 2)N

=� 322/3

� q�/3 � q + 2 =� � � 2.403... for q = 8


exp��

H

�13,23

�� o(1)

�N

�




easy.

Idea behind the proofTeasy = T 011


easy T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

= (T 011easy + T 101

easy + T 110easy)� (T 011


easy)

= T 011easy � T 011


easy + · · · + T 110easy � T 110

easy (9 terms)

tensor over (U0 � U1)� (V1 � V0)� (W1 �W1)label 011011

011011001111 111100011110 100111 110011

110110 101101 111001

T�2easy

Consider N = 2

T 011easy � T 101

easy =q�

i,i�=0

(x0 � xi�)� (yi � y0)� (zi � zi�)

tensor over (U0 � U1)� (V1 � V0)� (W1 �W1)T 011

easy � T 101easy =

q�

i,i�=0

(x0 � xi�)� (yi � y0)� (zi � zi�)

Idea behind the proof

= (T 011easy + T 101

easy + T 110easy)� (T 011


easy)

= T 011easy � T 011


easy + · · · + T 110easy � T 110

easy (9 terms)011011001111 111100

011110 100111 110011110110 101101 111001

T 011easy � T 011

easy =q�

i,i�=0

(x0 � x0)� (yi � yi�)� (zi � zi�)tensor over (U0 � U0)� (V1 � V1)� (W1 �W1)

SHARE VARIABLES

T�2easy

Consider N = 2

label 011011

remove this term(e.g., by setting all variables in V1 � V1 to zero)note: this removes more than one term!

by setting all variables in U1 � U0, V0 � V0and W0 �W1 to zero

Idea behind the proofConclusion: we can convert (a sum of 9 terms) into a direct sum of 2 termsT�2

easy


easy � · · ·� T 011easy + · · · + T 110


1 · · · 11 · · · 10 · · · 0labels:3N 3N

NEXT STEP

0 · · · 01 · · · 11 · · · 1

Idea behind the proof

number of possibilities

The proof of this theorem is based on a complicated construction using the existence of dense sets of integers with no three-term arithmetic progression

3N

#0 = N/3#1 = 2N/3

#0 = N/3#1 = 2N/3

#0 = N/3#1 = 2N/3

0 · · · 1 1 · · · 0 0 · · · 1


=�

322/3

�(1�o(1))N


exp��

H

�13,23

�� o(1)

�N

�




easy.

�N

N3 , 2N

3

�� exp

�H

�13,23

�N

�

that do not share a blue part, a red part or a green part

We can obtain labels of the form�

322/3

�(1�o(1))N

General Formulationof the Laser Method

and Reinterpretation

The laser method: general formulation

V�(T ) = limN��

V�,N (T )1/N The value of T

For any tensor T , any N � 1 and any � � [2, 3] define V�,N (T )

isomorphic to�k

i=1�mi, ni, pi�

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]k�

i=1

(minipi)�/3 � R

�k�

i=1

�mi, ni, pi��

Otherwise we use V�(T ) = V�(T � �T � �2T )1/3This is the definition for symmetric tensors.

This is an increasing function of ρV�(T � T �) � V�(T ) + V�(T �) V�(T � T �) � V�(T )� V�(T �)

as the maximum of�k

i=1(minipi)�/3 over all restrictions of T�N

V�(�m,n, p�) = (mnp)�/3




isomorphic to�k

i=1�mi, ni, pi�

Example: The first CW construction


=�

322/3

�(1�o(1))N

isomorphic to�T 011

easy

��N/3 ��T 101

easy

��N/3 ��T 110

easy

��N/3 �=�qN/3, qN/3, qN/3

�

V�,N (Teasy) ��

322/3

�(1�o(1))N

� q�N/3V�(Teasy) �

322/3

� q�/3


exp��

H

�13,23

�� o(1)

�N

�




easy.




Theorem (simple generalization of the asymptotic sum inequality)

V�(T ) � R(T )



V�(�m,n, p�) = (mnp)�for instance,


isomorphic to�k

i=1�mi, ni, pi�

Theorem (the asymptotic sum inequality, general form) [Schönhage 1981]k�

i=1

(minipi)�/3 � R

�k�

i=1

�mi, ni, pi��



The laser method: general formulationConsider three vector spaces U , V and W over F

Assume that U , V and W are decomposed as

U =�

i�I

Ui V =�

j�J

Vj W =�

k�K

Wk for some I, J, K � Z

A tensor T over (U, V,W ) is an element of U � V �W

The tensor T is a partitioned tensor (with respect to this decomposition) if

T =�

(i,j,k)�I�J�K

Tijk

where Tijk � Ui � Vj �Wk for each (i, j, k) � I � J �K

it can be written as

support of the tensor: supp(T ) = {(i, j, k) � I � J �K | Tijk �= 0}each non-zero Tijk is called a component of T

We say that the tensor is tight if there exists some integer d such thati + j + k = d for all (i, j, k) � supp(T )

Example: The first CW construction


easy + T 110easy,

where

U = U0 � U1, where U0 = span{x0} and U1 = span{x1, . . . , xq}V = V0 � V1, where V0 = span{y0} and V1 = span{y1, . . . , yq}W = W0 �W1, where W0 = span{z0} and W1 = span{z1, . . . , zq}




T 011easy =

q�

i=1

x0 � yi � zi

T 101easy =

q�

i=1

xi � y0 � zi

T 110easy =

q�

i=1

xi � yi � z0

supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)}

I = {0, 1}J = {0, 1}K = {0, 1}

i + j + k = 2 for all (i, j, k) � supp(Teasy)it is tight, since

Main Theorem [LG 14] (reinterpretation of prior works)


For any tight partitioned tensor T , any probability distribution P over supp(T ),

and any � � [2, 3], we have

log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy

�(P ): to be defined later (zero in the case of simple tensors)

Conclusion: we can compute a lower bound on the value of T if we knowa lower bound on the value of each component

we can then obtain an upper bound on ω via V�(T ) � R(T )

concretely, we use V�(T ) � R(T ) =� � � � and do a binary search on ρ

P�: projection of P along the �-th coordinate (= marginal distribution)

Example: The first CW constructionMain Theorem [LG 14] (reinterpretation of prior works)



log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy


supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)} V�(T 011easy) = V�(�1, 1, q�) = q�/3

V�(T 101easy) = V�(�q, 1, 1�) = q�/3

V�(T 110easy) = V�(�1, q, 1�) = q�/3

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/3

P1(0) = 1/3, P1(1) = 2/3 and P2 = P3 = P1

�(P ) = 0

log(V�(Teasy)) � H

�13,23

�+

13

log�q�/3

�+

13

log�q�/3

�+

13

log�q�/3

�


Example: The first CW constructionTheorem [Coppersmith and Winograd 87]

=�

322/3

�(1�o(1))N

supp(Teasy) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)} V�(T 011easy) = V�(�1, 1, q�) = q�/3

V�(T 101easy) = V�(�q, 1, 1�) = q�/3

V�(T 110easy) = V�(�1, q, 1�) = q�/3

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = 1/3

P1(0) = 1/3, P1(1) = 2/3 and P2 = P3 = P1

�(P ) = 0

V�,N (Teasy) ��

322/3

�(1�o(1))N

� q�N/3V�(Teasy) �

322/3

� q�/3

log(V�(Teasy)) � H

�13,23

�+

13

log�q�/3

�+

13

log�q�/3

�+

13

log�q�/3

�


exp��

H

�13,23

�� o(1)

�N

�




easy.

Main Theorem [LG 14]




log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

Interpretation: the laser method enables us to convert (by zeroing variables)

T�N into a direct sum of

terms, each isomorphic to �

(i,j,k)�supp(T )

[T ijk]�P (i,j,k)N

exp

��3�

�=1

H(P�)3

� �(P )� o(1)

�N

�

The second CW constructionLet q be a positive integer.

Coppersmith and Winograd (1987) considered the following tensor:

Consider three vector spaces U , V and W of dimension q + 2 over F.

U = span{x0, . . . , xq, xq+1}

V = span{y0, . . . , yq, yq+1}W = span{z0, . . . , zq, zq+1}

where

TCW = T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW ,

T 011CW = T 011

easy

T 101CW = T 101

easy

T 110CW = T 110

easy

and T 002CW = x0 � y0 � zq+1

�= �1, 1, 1�T 020

CW = x0 � yq+1 � z0�= �1, 1, 1�

T 200CW = xq+1 � y0 � z0

�= �1, 1, 1� .

R(TCW) = q + 2TCW = Teasy + T 002CW + T 020

CW + T 200CW

The second CW construction

This is not a direct sum

U = span{x0, . . . , xq, xq+1}V = span{y0, . . . , yq, yq+1}

W = span{z0, . . . , zq, zq+1}

T 011CW tensor over (U0, V1,W1)



TCW = T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW

U = U0 � U1�U2, where U0 = span{x0}, U1 = span{x1, . . . , xq} and U2 = span{xq+1}V = V0 � V1�V2, where V0 = span{y0}, V1 = span{y1, . . . , yq} and V2 = span{yq+1}W = W0 �W1�W2, where W0 = span{z0}, W1 = span{z1, . . . , zq} and W2 = span{zq+1}

T 002CW = x0 � y0 � zq+1 tensor over (U0, V0,W2)

T 020CW = x0 � yq+1 � z0 tensor over (U0, V2,W0)

T 200CW = xq+1 � y0 � z0 tensor over (U2, V0,W0)

T 002CW = x0 � y0 � zq+1

�= �1, 1, 1�T 020

CW = x0 � yq+1 � z0�= �1, 1, 1�

T 200CW = xq+1 � y0 � z0

�= �1, 1, 1�

The second CW construction: laser method




TCW = T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW

supp(TCW) = {(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)}

T 002CW = x0 � y0 � zq+1 tensor over (U0, V0,W2)

T 020CW = x0 � yq+1 � z0 tensor over (U0, V2,W0)

T 200CW = xq+1 � y0 � z0 tensor over (U2, V0,W0)

T 002CW = x0 � y0 � zq+1

�= �1, 1, 1�T 020

CW = x0 � yq+1 � z0�= �1, 1, 1�

T 200CW = xq+1 � y0 � z0

�= �1, 1, 1�

V�(T 002CW ) = V�(T 020

CW ) = V�(T 200CW ) = 1 V�(T 011

CW ) = V�(T 101CW ) = V�(T 110

CW ) = q�/3

P1(0) = � + 2(1/3� �), P1(1) = 2�, P1(2) = (1/3� �)

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = �with 0 � � � 1/3take

P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/3� �)

The second CW construction: laser method




log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

=� log(V�(TCW)) � H

�23� �, 2�,

13� �

�+ log(q��)

V�(TCW) � R(TCW) = q + 2combined with

� � 2.38718... for q = 6 and � = 0.3173this gives

supp(TCW) = {(0, 1, 1), (1, 0, 1), (1, 1, 0), (0, 0, 2), (0, 2, 0), (2, 0, 0)}

V�(T 002CW ) = V�(T 020

CW ) = V�(T 200CW ) = 1 V�(T 011

CW ) = V�(T 101CW ) = V�(T 110

CW ) = q�/3

P (0, 1, 1) = P (1, 0, 1) = P (1, 1, 0) = �with 0 � � � 1/3take

P1(0) = � + 2(1/3� �), P1(1) = 2�, P1(2) = (1/3� �)

P (0, 0, 2) = P (0, 2, 0) = P (2, 0, 0) = (1/3� �)





Analysis of the second construction

T�2CW = (T 011

CW + T 101CW + T 110

CW + T 002CW + T 020

CW + T 200CW )�2 (36 terms)

T�2CW = T 400 + T 040 + T 004 + T 310 + T 301 + T 103 + T 130 + T 013

+ T 031 + T 220 + T 202 + T 022 + T 211 + T 121 + T 112,

R(T�2CW) � (q + 2)2

where

T 400 = T 200CW � T 200

CW ,

T 310 = T 200CW � T 110

CW + T 110CW � T 200

CW ,

T 220 = T 200CW � T 020

CW + T 020CW � T 200

CW + T 110CW � T 110

CW ,

T 211 = T 200CW � T 011

CW + T 011CW � T 200

CW + T 110CW � T 101

CW + T 101CW � T 110

CW ,

and the other 11 terms are obtained by permuting the variables(e.g., T 040 = T 020

CW � T 020CW ).

Idea: rewrite it as a (non-direct) sum of 15 terms by regrouping terms

Analysis of the second power

MERGING

T�2CW = T 400 + T 040 + T 004 + T 310 + T 301 + T 103 + T 130 + T 013

+ T 031 + T 220 + T 202 + T 022 + T 211 + T 121 + T 112,

where

T 400 = T 200CW � T 200

CW ,

T 310 = T 200CW � T 110

CW + T 110CW � T 200

CW ,

T 220 = T 200CW � T 020

CW + T 020CW � T 200

CW + T 110CW � T 110

CW ,

T 211 = T 200CW � T 011

CW + T 011CW � T 200

CW + T 110CW � T 101

CW + T 101CW � T 110

CW ,

and the other 11 terms are obtained by permuting the variables(e.g., T 040 = T 020

CW � T 020CW ).

Analysis of the second powersupp(T�2

CW) = {(4, 0, 0), . . . , (0, 0, 4), (3, 1, 0), . . . , (0, 1, 3), (2, 2, 0), . . . , (0, 2, 2), (2, 1, 1), . . . , (1, 1, 2)}3 permutations 6 permutations 3 permutations 3 permutations

lower bounds on the values of each component can be computed (recursively)choice of distribution: P (4, 0, 0) = . . . = P (0, 0, 4) = �, P (3, 1, 0) = . . . = P (0, 1, 3) = �

P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �(4-1=3 parameters)




P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �

we have �(P ) = 0

(4-1=3 parameters)




P (2, 2, 0) = . . . = P (0, 2, 2) = �, P (2, 1, 1) = . . . = P (1, 1, 2) = �

we have �(P ) = 0

(4-1=3 parameters)




log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

Theorem

V�(T ) � R(T ) =� � � 2.3755... for q = 6 and � = 0.00023, � = 0.0125,

� = 0.10254 and � = 0.2056





Analysis of the second power

What about the third power (using similar merging schemes)?

this does not give any improvement

Analysis of the fourth powerT�4

CW = (T 011CW + T 101

CW + T 110CW + T 002

CW + T 020CW + T 200

CW )�4 (64 terms)

R(T�4CW) � (q + 2)4

Idea: rewrite it as a (non-direct) sum of a smaller number of terms by regrouping terms

T�4CW = T 800 + T 710 + T 620 + T 611 + T 530 + T 521 + T 440 + T 431

+ T 422 + T 332 + permutations of these terms

T 080, T 008, T 701, T 107, T 170, T 017, T 071, . . .

10-1=9 parameters for the probability distributionthis time �(P ) �= 0





log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy

�(P ): to be defined later (zero in the case of simple tensors)P�: projection of P along the �-th coordinate (= marginal distribution)





log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

H: entropy


where the max is over all distributions Q over supp(T )such that P1 = Q1, P2 = Q2 and P3 = Q3

�(P ) = max[H(Q)]�H(P )

when the structure of support is simple, we typically haveP1 = Q1, P2 = Q2, P3 = Q3 =� P = Q and thus �(P ) = 0


The laser method: general formulationInterpretation: the laser method enables us to convert (by zeroing variables)

T�N into a direct sum of

terms, each isomorphic to �

(i,j,k)�supp(T )

[T ijk]�P (i,j,k)N


�(P ) = max[H(Q)]�H(P )

“type P”

we can control only the choice of the marginal distributions P1, P2 and P3

what we obtain is a (non-direct) sum of all “type Q” termsthe most frequent terms are those with Q maximizing H(Q) the fact that “type P” are not the most frequent introduces the penalty term -Γ(P)

exp

��3�

�=1

H(P�)3

� �(P )� o(1)

�N

�


The laser method: computing the bound



log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

How to find the best distribution for a given ρ?

If Γ(P) = 0 for all distributions P, the best distribution can be done efficiently (numerically) using convex optimization

maximization of a concave function under linear constraints

assume that (a lower bound on) each V�(Tijk) is known

linearconcave =0





log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

How to find the best distribution for a given ρ?assume that (a lower bound on) each V�(Tijk) is known

linearconcave ???


�(P ) = max[H(Q)]�H(P )In general:

hard to solve, but can be done up to the 4th power of the CW tensor [Stothers 10]













log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).


linearconcave ???



Simplification: restrict the search to the set of distributions P such that �(P ) = 0


still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]


Simplification: restrict the search to the set of distributions P such that �(P ) = 0still hard to solve, but can be done up to the 8th power of the CW tensor

[Vassilevska-Williams 12]









log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).


linearconcave ???



Simplification: restrict the search to the set of distributions P such that �(P ) = 0


still hard to solve, but can be done up to the 8th power of the CW tensor [Vassilevska-Williams 12]

The laser method: general formulationMain Theorem [LG 14]



log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

call this expression f(P )


�(P ) = max[H(Q)]�H(P )

Efficient method to find a solution [LG 14](close to the optimal solution):

The laser method: general formulationMain Theorem [LG 14]



log(V�(T )) �3�

�=1

H(P�)3

+�

(i,j,k)�supp(T )

P (i, j, k) log(V�(Tijk))� �(P ).

call this expression f(P )


�(P ) = max[H(Q)]�H(P )

1. find a distribution P that maximizes f(P ), and call it P

2. find the distribution Q that maximizes H(Q) under the constraints

Q1 = P1, Q2 = P2 and Q3 = P3. Call it Q.

3. output f(Q)

Since �(Q) = 0, we have log(V�(T )) � f(Q) from the theorem

concave objective function, linear constraints

concave objective function, linear constraints

Efficient method to find a solution [LG 14](close to the optimal solution):





Analysis of power 16 and 32

solutions to the optimization problems obtained numerically by convex optimization

Conclusion

any tight partitioned tensor for which (lower bounds on) the value

of each component is knownupper bound on ω

polynomial time





We constructed a time-efficient implementation of the laser method

We applied it to study higher powers of the basic tensor by CW

recent result [Ambainis, Filmus, LG 14]:studying higher powers (using the same approach) cannot give an upper bound better than 2.3725

Laser-method-based analysis (v2.3)

convex optimization

Documents

Powers of Tensors and Fast Matrix Multiplication · PDF filePowers of Tensors and Fast Matrix Multiplication ... ‣ gates: ... which tensor? powers of the basic tensor from Coppersmith