Block Lu Factorization

Embed Size (px)

Citation preview

  • 8/12/2019 Block Lu Factorization

    1/22

    Block LU Factorization

    Lecture 24

    MA471 Fall 2003

  • 8/12/2019 Block Lu Factorization

    2/22

    Example Case

    1) Suppose we are faced with the solution of a

    linear system Ax=b

    2) Further suppose:

    1) A is large (dim(A)>10,000)

    2) A is dense

    3) A is full

    4) We have a sequence of different bvectors.

  • 8/12/2019 Block Lu Factorization

    3/22

    Problems

    Suppose we are able to compute the

    matrix It costs N2doubles to store the matrix

    E.g. for N=100,000 we require 76.3 gigabytes

    of storage for the matrix alone. 32 bit processors are limited to 4 gigabytes of

    memory

    Most desktops (even 64 bit) do not have 76.3gigabytes

    What to do?

  • 8/12/2019 Block Lu Factorization

    4/22

    Divide and Conquer

    P0 P1 P2 P3

    P4 P5 P6 P7

    P8 P9 P10 P11

    P12 P13 P14 P15

    One approach is to assume we have a square number of processors.

    We then divide the matrix into blocks storing one block per processor.

  • 8/12/2019 Block Lu Factorization

    5/22

    Back to the Linear System

    We are now faced with LU factorization ofa distributed matrix.

    This calls for a modified LU routine which

    acts on blocks of the matrix. We will demonstrate this algorithm for one

    level.

    i.e. we need to construct matrices L,Usuch that A=LU and we only store singleblocks of A,L,U on any processor.

  • 8/12/2019 Block Lu Factorization

    6/22

    Constructing the

    Block LU Factorization

    A00 A01 A02

    A10 A11 A12

    A20 A21 A22

    =

    L00 0 0

    L10 1 0

    L20 0 1

    *

    U00 U01 U02

    0 ?11 ?12

    0 ?21 ?22

    First we LU factorize A00 and look for the above block factorization.

    However, we need to figure out what each of the entries are:

    A00 = L00*U00 (compute by L00, U00 by LU factorization)

    A01 = L00*U01 => U01 = L00\A01A02 = L00*U02 => U02 = L00\A02

    A10 = L10*U00 => L10 = A10/U00

    A20 = L20*U00 => L20 = A20/U00

    A11 = L10*U01 + ?11 => ?11 = A11 L10*U01..

  • 8/12/2019 Block Lu Factorization

    7/22

    cont

    A00 = L00*U00 (compute by L00, U00 by LU factorization)

    A01 = L00*U01 => U01 = L00\A01

    A02 = L00*U02 => U02 = L00\A02

    A10 = L10*U00 => L10 = A10/U00

    A20 = L20*U00 => L20 = A20/U00

    A11 = L10*U01 + ?11 => ?11 = A11 L10*U01

    A12 = L10*U02 + ?12 => ?12 = A12 L10*U02

    A21 = L20*U01 + ?21 => ?21 = A21 L20*U01

    A22 = L20*U02 + ?22 => ?22 = A22 L20*U02

    In the general case:

    Anm = Ln0*U0m + ?nm => ?nm = Anm Ln0*U0m

  • 8/12/2019 Block Lu Factorization

    8/22

    Summary First Stage

    A00 A01 A02

    A10 A11 A12

    A20 A21 A22

    =

    L00 0 0

    L10 1 0

    L20 0 1

    *

    U00 U01 U02

    0 ?11 ?12

    0 ?21 ?22

    First step: LU factorize uppermost block diagonal

    Second step: a) compute U0n = L00\A0n n>0

    b) compute Ln0 = An0/U00 n>0

    Third step: compute ?nm = Anm Ln0*U0m, (n,m>0)

  • 8/12/2019 Block Lu Factorization

    9/22

    Now Factorize Lower SE Block

    ?11 ?12

    ?21 ?22

    =

    L11 0

    L21 1

    *

    U11 U12

    0 ??22

    We repeat the previous algorithm this time on the two by two SE block.

  • 8/12/2019 Block Lu Factorization

    10/22

    End Result

    A00 A01 A02

    A10 A11 A12

    A20 A21 A22

    =

    L00 0 0

    L10 L11 0

    L20 L21 L22

    *

    U00 U01 U02

    0 U11 U12

    0 0 U22

  • 8/12/2019 Block Lu Factorization

    11/22

    Matlab

    Version

  • 8/12/2019 Block Lu Factorization

    12/22

    Parallel Algorithm

    P0 P1 P2

    P3 P4 P5

    P6 P7 P8

    P0: A00 = L00*U00 (compute by L00, U00 by LU factorization)

    P1: U01 = L00\A01

    P2: U02 = L00\A02

    P3: L10 = A10/U00P6: L20 = A20/U00

    P4: A11

  • 8/12/2019 Block Lu Factorization

    13/22

    Parallel Communication

    L00

    U00U01 U02

    L10 A11 A12

    L20 A21 A22

    P0: L00,U00 =lu(A)

    P1: U01 = L00\A01

    P2: U02 = L00\A02

    P3: L10 = A10/U00P6: L20 = A20/U00

    P4: A11

  • 8/12/2019 Block Lu Factorization

    14/22

    Communication

    Summary

    P0: L00,U00 =lu(A)

    P1: U01 = L00\A01

    P2: U02 = L00\A02

    P3: L10 = A10/U00

    P6: L20 = A20/U00

    P4: A11

  • 8/12/2019 Block Lu Factorization

    15/22

    Upshot

    Notes:

    1) I added an MPI_Barrier purely to separate the LU factorization and the backsolve.

    2) In terms of efficiency we can see that quite a bit of time is spent in MPI_Wait

    compared to compute time.

    3) The compute part of this code can be optimized much more making the parallelefficiency even worse.

    a b

    (a) P0: sends L00 to P1,P2sends U00 to P3,P6

    (b) P1: sends U01 to P4,P7

    (c) P2: sends U02 to P5,P8

    (d) P3: sends L10 to P4,P5(e) P4: sends L20 to P7,P8

    cde

    (f) P4: sends L11 to P5sends U11 to P7

    (g) P1: sends U12 to P8

    (h) P3: sends L21 to P8

    f

    1ststage: 1ststage:

    g

    h

  • 8/12/2019 Block Lu Factorization

    16/22

    Block Back Solve

    After factorization we are left with the task

    of using the distributed L and U to

    compute the backsolve:

    U00

    L00U01 U02

    L10U11

    L11U12

    L20 L21U22

    L22

    Block distribution of L and U

    P0 P1 P2

    P3 P4 P5

    P6 P7 P8

  • 8/12/2019 Block Lu Factorization

    17/22

    Recall

    Given an LU factorization of Anamely,

    L,Usuch that A=LU

    Then we can solve Ax=b by

    y=L\b

    x=U\y

  • 8/12/2019 Block Lu Factorization

    18/22

    Distributed Back Solve

    L00 0 0

    L10 L11 0

    L20 L21 L22

    =

    y0

    y1

    y2

    b0

    b1

    b2

    P0: solve L00*y0 = b0 send: y0 to P3,P6

    P3: send: L10*y0 to P4

    P4: solve L11*y1 = b1-L10*y0 send: y1 to P7

    P6: send: L20*y0 to P8\

    P7: send: L21*y1 to P8

    P8: solve L22*y2 = b2-L20*y0-L21*y1

    Results: y0 on P0, y1 on P4, y2 on P8

    P0 P1 P2

    P3 P4 P5

    P6 P7 P8

  • 8/12/2019 Block Lu Factorization

    19/22

    Matlab

    Code

    B k S l

  • 8/12/2019 Block Lu Factorization

    20/22

    Back Solve

    After the factorization we computed a solution to Ax=b

    This consists of two distributed block triangular systems to solve

  • 8/12/2019 Block Lu Factorization

    21/22

    Barrier Between Back Solves

    This time I inserted an MPI_Barrier call between the backsolves.

    This highlights the serial nature of the backsolves..

  • 8/12/2019 Block Lu Factorization

    22/22

    Example Code

    http://www.math.unm.edu/~timwar/MA471F03/blocklu.m

    http://www.math.unm.edu/~timwar/MA471F03/parlufact2.c

    http://www.math.unm.edu/~timwar/MA471F03/blocklu.mhttp://www.math.unm.edu/~timwar/MA471F03/parlufact2.chttp://www.math.unm.edu/~timwar/MA471F03/parlufact2.chttp://www.math.unm.edu/~timwar/MA471F03/blocklu.m