52
Approximate Neumann Series or Exact Matrix Inversion for Massive MIMO? Oscar Gustafsson, Erik Bertilsson, Johannes Klasson, and Carl Ingemarsson

ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Approximate NeumannSeries or Exact MatrixInversion for MassiveMIMO?Oscar Gustafsson, Erik Bertilsson, JohannesKlasson, and Carl Ingemarsson

Page 2: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 1

Matrix Inversion in Massive MIMO

• N terminals,M antennas

• Channel matrix,H ∈ CM×N

• Gram matrix,X = HHH ∈ CN×N to be inverted

for zero forcing (or MMSE)

• X: conjugate symmetric (Hermitian) and

semi-definite

• X: with uncorrelated channels andM � N ,

diagonally dominant

Page 3: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 1

Matrix Inversion in Massive MIMO

• N terminals,M antennas

• Channel matrix,H ∈ CM×N

• Gram matrix,X = HHH ∈ CN×N to be inverted

for zero forcing (or MMSE)

• X: conjugate symmetric (Hermitian) and

semi-definite

• X: with uncorrelated channels andM � N ,

diagonally dominant

Page 4: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 1

Matrix Inversion in Massive MIMO

• N terminals,M antennas

• Channel matrix,H ∈ CM×N

• Gram matrix,X = HHH ∈ CN×N to be inverted

for zero forcing (or MMSE)

• X: conjugate symmetric (Hermitian) and

semi-definite

• X: with uncorrelated channels andM � N ,

diagonally dominant

Page 5: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 2

Matrix Inversion in Massive MIMO

PUL UL UL UL DLG DL G

Tframe

NUL,1 NUL,2 NDL

• One matrix inversion per frame

• Computed between reception of pilot and

transmission of first downlink data

• Latency, not throughput

Page 6: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 2

Matrix Inversion in Massive MIMO

PUL UL UL UL DLG DL G

Tframe

NUL,1 NUL,2 NDL

• One matrix inversion per frame

• Computed between reception of pilot and

transmission of first downlink data

• Latency, not throughput

Page 7: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 2

Matrix Inversion in Massive MIMO

PUL UL UL UL DLG DL G

Tframe

NUL,1 NUL,2 NDL

• One matrix inversion per frame

• Computed between reception of pilot and

transmission of first downlink data

• Latency, not throughput

Page 8: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 3

Algorithms for Matrix Inversion

• Exact algorithms

• Numerical issues, especially in fixed-point, forclose to singular (sub-)matrices

• Division and/or square-roots• Cubic complexity

• LDLᵀ-decomposition

• Lowest operation count• Reasonable fixed-point properties• No square-roots

Page 9: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 3

Algorithms for Matrix Inversion

• Exact algorithms

• Numerical issues, especially in fixed-point, forclose to singular (sub-)matrices

• Division and/or square-roots• Cubic complexity

• LDLᵀ-decomposition

• Lowest operation count• Reasonable fixed-point properties• No square-roots

Page 10: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 4

Algorithms for Matrix Inversion

• Neumann series expansion

• Precondition matrixA ≈ X−1

X̂−1K =

(K∑

n=1

(I−AX)n−1

)A, (1)

• “High parallelism”

• “Low complexity”

• “No division”

• “Numerically stable”

Page 11: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 4

Algorithms for Matrix Inversion

• Neumann series expansion

• Precondition matrixA ≈ X−1

X̂−1K =

(K∑

n=1

(I−AX)n−1

)A, (1)

• “High parallelism”

• “Low complexity”

• “No division”

• “Numerically stable”

Page 12: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 5

Algorithms for Matrix Inversion

Diagonal precondition matrix

A =

a1,1 0 · · · 00 a2,2 . . . 0...

. . ....

...

0 0 · · · aN,N

ai,i = 1/xi,i

I−AX =

0 y1,2 · · · y1,N

y2,1 0 . . . y2,N...

. . ....

...yN,1 yN,2 · · · 0

Page 13: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 5

Algorithms for Matrix Inversion

Diagonal precondition matrix

A =

a1,1 0 · · · 00 a2,2 . . . 0...

. . ....

...

0 0 · · · aN,N

ai,i = 1/xi,i

I−AX =

0 y1,2 · · · y1,N

y2,1 0 . . . y2,N...

. . ....

...yN,1 yN,2 · · · 0

Page 14: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 6

Algorithms for Matrix Inversion

Tri-diagonal precondition matrix

A =

a1,1 a1,2 0 · · · 0a2,1 a2,2 a2,3 . . . 00 a3,2 a3,3 . . . 0...

. . ....

...

0 0 0 · · · aN,N

Sequential computation ofAGeneric I−AX

Page 15: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 6

Algorithms for Matrix Inversion

Tri-diagonal precondition matrix

A =

a1,1 a1,2 0 · · · 0a2,1 a2,2 a2,3 . . . 00 a3,2 a3,3 . . . 0...

. . ....

...

0 0 0 · · · aN,N

Sequential computation ofAGeneric I−AX

Page 16: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 7

Algorithms for Matrix Inversion

Diagonal + column precondition matrix

A =

a1,1 0 · · · 0a2,1 a2,2 . . . 0...

. . ....

...

aN,1 0 · · · aN,N

I−AX =

0 y1,2 · · · y1,N0 y2,2bb . . . y2,N...

. . ....

...

0 yN,2 · · · yN,N

Page 17: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 7

Algorithms for Matrix Inversion

Diagonal + column precondition matrix

A =

a1,1 0 · · · 0a2,1 a2,2 . . . 0...

. . ....

...

aN,1 0 · · · aN,N

I−AX =

0 y1,2 · · · y1,N0 y2,2bb . . . y2,N...

. . ....

...

0 yN,2 · · · yN,N

Page 18: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 8

Computational Complexity

• The latency (time to obtain the result) of analgorithm depends on two aspects:

• Total number of operations→ latency scales withnumber of processing elements (PEs)

• Number of sequential operations→ latency doesnot scale with number of PEs

• Pipelining of the PEs

• Increases clock frequency• Increases latency

Page 19: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 8

Computational Complexity

• The latency (time to obtain the result) of analgorithm depends on two aspects:

• Total number of operations→ latency scales withnumber of processing elements (PEs)

• Number of sequential operations→ latency doesnot scale with number of PEs

• Pipelining of the PEs

• Increases clock frequency• Increases latency

Page 20: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 8

Computational Complexity

• The latency (time to obtain the result) of analgorithm depends on two aspects:

• Total number of operations→ latency scales withnumber of processing elements (PEs)

• Number of sequential operations→ latency doesnot scale with number of PEs

• Pipelining of the PEs

• Increases clock frequency• Increases latency

Page 21: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 8

Computational Complexity

• The latency (time to obtain the result) of analgorithm depends on two aspects:

• Total number of operations→ latency scales withnumber of processing elements (PEs)

• Number of sequential operations→ latency doesnot scale with number of PEs

• Pipelining of the PEs

• Increases clock frequency• Increases latency

Page 22: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 9

Computational Complexity Example

4× 4 exact matrix inversion based on LDLᵀ

-

-

-

-

- -

-

- - -

-

-

-

--

- -

-

-

-

--

-

-

- -

- - -

-

Page 23: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 10

How Many Cycles?

• Assume multiply-and-add (MAD) operations

• Reciprocals performed using Newton-Raphson→a number of sequential MAD operations

• Sum-of-products computed using sequential

MADs

• O operations, each with P pipeline stages

implemented on Q processing elements (PEs)

require

Calg ≥ max

{⌈O

Q

⌉+ P − 1, PClatency

}cycles. (2)

Page 24: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 10

How Many Cycles?

• Assume multiply-and-add (MAD) operations

• Reciprocals performed using Newton-Raphson→a number of sequential MAD operations

• Sum-of-products computed using sequential

MADs

• O operations, each with P pipeline stages

implemented on Q processing elements (PEs)

require

Calg ≥ max

{⌈O

Q

⌉+ P − 1, PClatency

}cycles. (2)

Page 25: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 10

How Many Cycles?

• Assume multiply-and-add (MAD) operations

• Reciprocals performed using Newton-Raphson→a number of sequential MAD operations

• Sum-of-products computed using sequential

MADs

• O operations, each with P pipeline stages

implemented on Q processing elements (PEs)

require

Calg ≥ max

{⌈O

Q

⌉+ P − 1, PClatency

}cycles. (2)

Page 26: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 11

Algorithm Comparison – Complexity

Method MADs Reciprocals

Exact method

LDLᵀ+EQU 12N

3 + 12N

2 −N N

Neumann series

Diagonal,K = 2 N2 −N NK = 3 1

2N3 +N2 − 1

2N N

Tri-diagonals,K = 2 3N2 + 7N − 10 2N − 1K = 3 1

2N3 + 6N2 + 1

2N − 2 2N − 1

Diag. + column,K = 2 32N

2 + 52N − 4 N

K = 3 12N

3 + 52N

2 − 2N − 1 N

Page 27: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 12

Algorithm Comparison – Latency

Method MADs Reciprocals

Exact method

LDLᵀ+EQU 4N − 4 N

Neumann series

Diagonal,K = 2 2 1K = 3 N + 1 1

Tri-diagonals,K = 2 2N + 5 NK = 3 3N + 5 N

Diag. + column,K = 2 N + 2 1K = 3 2N + 1 1

Page 28: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 13

Results

Bit-error rate for the four approaches,N = 20,M = 120

0 1 2 3 4 510-8

10-6

10-4

10-2

100

DiagonalColumn DiagonalTridiagonalLDL

Page 29: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 14

Results

Reciprocal⇒ Three sequential MAD operations

4× 4-matrix#PE: 1, latency: 48

20 40Cycle

0

0.5

1#O

pera

tions

#PE: 2, latency: 29

5 10 15 20 25Cycle

0

1

2

#Ope

ratio

ns

#PE: 3, latency: 26

5 10 15 20 25Cycle

0

2

4

#Ope

ratio

ns

#PE: 4, latency: 25

5 10 15 20 25Cycle

0

2

4

#Ope

ratio

ns

Page 30: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 15

Results – 16× 16

Solid: actual result, dashed: from equation

5 10 15Processing elements

102

103

104C

ycle

sTri-diagonalCol. + Diag.DiagonalExact

Page 31: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 16

Results – 8× 8

Solid: actual result, dashed: from equation

5 10 15Processing elements

101

102

103C

ycle

sCol. + Diag.DiagonalExact

Page 32: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 17

Results

With P = 1, 2, 3, 4 levels of pipelining4× 4-matrix

P: 1, latency: 48

20 40Cycle

0

0.5

1#O

pera

tions

P: 2, latency: 57

10 20 30 40 50Cycle

0

0.5

1

#Ope

ratio

ns

P: 3, latency: 77

20 40 60Cycle

0

0.5

1

#Ope

ratio

ns

P: 4, latency: 98

20 40 60 80Cycle

0

0.5

1#O

pera

tions

Page 33: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 18

Results – 16× 16

Time in single cycle latency operations, assuming

pipelining increases speed linearly

Solid: P = 1, dashed: P = 2, dash-dotted: P = 3

1 2 3 4Processing elements

102

103

Tim

e

Col. + Diag.DiagonalExact

Page 34: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 19

Results – 8× 8

Time in single cycle latency operations, assuming

pipelining increases speed linearly

Solid: P = 1, dashed: P = 2, dash-dotted: P = 3

1 2 3 4Processing elements

101

102

Tim

e

Col. + Diag.DiagonalExact

Page 35: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 20

Design Example

• Assume a latency requirement of 0.05 ms (10% of

an LTE-like frame with 2 UL and 2 DL symbols)

• For N = 8 and one PE, 304 cycles are required forthe exact algorithm

• One PE operating at fclk = 6.08MHz

• N = 30 ⇒ fclk ≈ 280MHz

• 2 kInv/s, idle 90% of the time

Page 36: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 20

Design Example

• Assume a latency requirement of 0.05 ms (10% of

an LTE-like frame with 2 UL and 2 DL symbols)

• For N = 8 and one PE, 304 cycles are required forthe exact algorithm

• One PE operating at fclk = 6.08MHz

• N = 30 ⇒ fclk ≈ 280MHz

• 2 kInv/s, idle 90% of the time

Page 37: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 20

Design Example

• Assume a latency requirement of 0.05 ms (10% of

an LTE-like frame with 2 UL and 2 DL symbols)

• For N = 8 and one PE, 304 cycles are required forthe exact algorithm

• One PE operating at fclk = 6.08MHz

• N = 30 ⇒ fclk ≈ 280MHz

• 2 kInv/s, idle 90% of the time

Page 38: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 20

Design Example

• Assume a latency requirement of 0.05 ms (10% of

an LTE-like frame with 2 UL and 2 DL symbols)

• For N = 8 and one PE, 304 cycles are required forthe exact algorithm

• One PE operating at fclk = 6.08MHz

• N = 30 ⇒ fclk ≈ 280MHz

• 2 kInv/s, idle 90% of the time

Page 39: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 20

Design Example

• Assume a latency requirement of 0.05 ms (10% of

an LTE-like frame with 2 UL and 2 DL symbols)

• For N = 8 and one PE, 304 cycles are required forthe exact algorithm

• One PE operating at fclk = 6.08MHz

• N = 30 ⇒ fclk ≈ 280MHz

• 2 kInv/s, idle 90% of the time

Page 40: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 21

Is Neumann useful at all?

• If less than three terms are used, the complexity

may be lower

• Only compute parts of the third iteration

• Allow increasing the number of terminals further

• But numerically most efficient when the ratio

between number of antennas and terminals is high

• May give a better result with singular or close to

singular matrices (not correct result maybe not as

bad as an exact algorithm)

• (Really) large matrices

Page 41: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 21

Is Neumann useful at all?

• If less than three terms are used, the complexity

may be lower

• Only compute parts of the third iteration

• Allow increasing the number of terminals further

• But numerically most efficient when the ratio

between number of antennas and terminals is high

• May give a better result with singular or close to

singular matrices (not correct result maybe not as

bad as an exact algorithm)

• (Really) large matrices

Page 42: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 21

Is Neumann useful at all?

• If less than three terms are used, the complexity

may be lower

• Only compute parts of the third iteration

• Allow increasing the number of terminals further

• But numerically most efficient when the ratio

between number of antennas and terminals is high

• May give a better result with singular or close to

singular matrices (not correct result maybe not as

bad as an exact algorithm)

• (Really) large matrices

Page 43: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 21

Is Neumann useful at all?

• If less than three terms are used, the complexity

may be lower

• Only compute parts of the third iteration

• Allow increasing the number of terminals further

• But numerically most efficient when the ratio

between number of antennas and terminals is high

• May give a better result with singular or close to

singular matrices (not correct result maybe not as

bad as an exact algorithm)

• (Really) large matrices

Page 44: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput

• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant• Diagonally dominant⇒ well conditioned⇒ exact

algorithm behaves well• Few terminals⇒more diagonally dominant⇒

fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem• Required latency/parallelism determined by frame

structure

Page 45: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant• Diagonally dominant⇒ well conditioned⇒ exact

algorithm behaves well• Few terminals⇒more diagonally dominant⇒

fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem• Required latency/parallelism determined by frame

structure

Page 46: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant

• Diagonally dominant⇒ well conditioned⇒ exactalgorithm behaves well

• Few terminals⇒more diagonally dominant⇒fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem• Required latency/parallelism determined by frame

structure

Page 47: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant• Diagonally dominant⇒ well conditioned

⇒ exactalgorithm behaves well

• Few terminals⇒more diagonally dominant⇒fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem• Required latency/parallelism determined by frame

structure

Page 48: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant• Diagonally dominant⇒ well conditioned⇒ exact

algorithm behaves well• Few terminals⇒more diagonally dominant

⇒fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem• Required latency/parallelism determined by frame

structure

Page 49: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant• Diagonally dominant⇒ well conditioned⇒ exact

algorithm behaves well• Few terminals⇒more diagonally dominant⇒

fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem• Required latency/parallelism determined by frame

structure

Page 50: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant• Diagonally dominant⇒ well conditioned⇒ exact

algorithm behaves well• Few terminals⇒more diagonally dominant⇒

fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem

• Required latency/parallelism determined by frame

structure

Page 51: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Matrix Inversion for Massive MIMO Oscar Gustafsson July 25, 2017 22

Conclusions

• Latency, not throughput• Complexity for Neumann series withK = 3 higherthan best exact algorithm

• Few terms for Neumann when diagonallydominant• Diagonally dominant⇒ well conditioned⇒ exact

algorithm behaves well• Few terminals⇒more diagonally dominant⇒

fewer Neumann terms (but also less complexity forexact algorithm)

• With few PEs compared to matrix size, the limited

parallelism of the exact algorithm is no problem• Required latency/parallelism determined by frame

structure

Page 52: ApproximateNeumann SeriesorExactMatrix InversionforMassive ...arith24.arithsymposium.org/slides/s10-gustafsson.pdf · •Assumemultiply-and-add(MAD)operations ... MatrixInversionforMassiveMIMO

Thank you!Questions?

www.liu.se