15
HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By : Zaid Abassi Supervisor : Rolf Hilgendorf April 2, 2014

HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

Embed Size (px)

Citation preview

Page 1: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

HSDSL, TechnionSpring 2014

Preliminary Design Review

Matrix Multiplication on FPGAProject No. : 1998

Project B 044169 By:Zaid AbassiSupervisor:Rolf Hilgendorf

April 2, 2014

Page 2: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

Background and Motivation:

1. Matrix multiplication naively carried out is unjustifiably expensive, ergo there is a need for research into an efficient algorithm for Matrix Multiplication with a parallel approach.

Page 3: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

• 2. In application specific (in this case Matrix Multiplication) designs, as opposed to broader architectural designs, the order and magnitude of operations is known at design time thus providing a potential to save overhead that would have been incurred.

Page 4: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

3. Matrix multiplication is an elementary building block of more advanced Linear Algebra Core operations on matrices such as inverting matrices and linear transformations, so the need for efficient matrix multiplication is ever greater.

Page 5: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

4. Over the years matrix multiplication complexity in software has improved with specialized data structures and we aim to research inspired approaches on an FPGA implementation.

Page 6: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf
Page 7: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

Our Goal

To develop a matrix multiplication algorithm especially on FPGA to maximize efficiency via parallel design, while at the same time reducing power consumption as much as possible.

Page 8: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

The System Top Level View

Page 9: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf
Page 10: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

Processing Entity (PE)

Page 11: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

PE unit

Page 12: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

PE unit• The controller for each PE is

a FSM to regulate PE operations : storage, computation and communication (broadcasting).

• The controller needs to be smart and autonomously manage synchronized PE operations with handshake and global communication depending on implicit synchronization between all PEs.

Page 13: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

PE unit

• Each PE is equipped with its own local memory for the purpose of storing entries of the multiplied matrices upon commencing and for broadcasting via same rows and columns

Page 14: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

Handling Larger Matrices

• For handling larger matrices, we choose the possibility of breaking down the input matrices to a sequence of smaller updates using a hierarchical blocking of input matrices. Each update in the hierarchy is called a “loop”.

• No loop-carried dependency so we aim to pipeline outer loop to overlap current cycle’s computation along with previous cycle’s write back and next cycle’s prefetching of matrices.

Page 15: HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf

A Problem With Larger Matrices

• Moving data in and out of the computational grid for each hierarchy block independently can be expensive and so we need to amortize the cost.