MaPU: A Novel Mathematical Computing Architecture - MaPU... · MaPU: A Novel Mathematical Computing...

Preview:

Citation preview

MaPU: A Novel Mathematical Computing Architecture

Shashank Kedia & Robert Macy III

1

● High performance CPUs and GPUs have good theoretical performance but low power efficiency relative

to performance

● Superscalar and GPGPU have been proven to be power inefficient

● Most systems operate at 60% of peak performance

● Supercomputers using thousands of processors have massive power and space requirements

● Develop a chip that can do mathematical calculations at a good performance to power ratio relative to

gpus and cpus

Why MaPU?

2

Architecture overviewThree main components:

● Scalar Pipeline: communicates

with the system on chip and

controls microcode pipeline.

● Microcode Pipeline: Consists of

functional units (FUs) defining

data flow.

● Multi-Granularity Parallel

Memory System (MGP) allows

efficient custom data access

patterns.

3

Architecture Details: MGP Memory SystemMGP allows efficient data access patterns.

Given parameters W, the number of bytes that can be

accessed in parallel, N, the total capacity in bytes, and

G, the number of bytes available for reading/writing,

the memory system can be partitioned to define

memory accesses.

Physical banks combine to form logic banks.

Each logic bank consists of G physical banks.

4

Architecture Details: MGP Memory System (matrix accesses)Matrices can be accessed in row or column

order.

Matrix accesses in MGP requires storing the

i-th row in the i mod W-th logic bank.

Rows can be accessed by setting G=W and

columns by setting G=1.

5

Architecture Details: Cascading pipeline with state machine-based program modelDataflow can change to fit desired algorithm

Facilitated by customizing FUs used and their

interactions via microcode.

State machines can be used to describe each FU

Allows easier FU organization, user specifies each

FU state machine and a final state machine

specifying delays for ensuring appropriate execution

order.

6

Architecture Details: SoC ArchitectureOverview of tape-out design implemented by

authors.

APE (Algebraic Processing Engine) refers to the

MaPU cores.

CSU is a DMA controller.

7

Results: Comparison with C66xAll comparisons shown here are in simulations

APE runs at 1GHz and C66x at 1.25 GHz

8

Result: Power Usage

9

Results: Power Usage

Figure 15 in the paper seems to be incorrect and a copy of Figure 14

10

Results: Comparison with other processors

Source: M. H. Ionica and D. Gregg, “The movidius myriad architecture’s potential for scientific computing,” Micro, IEEE, vol. 35, no. 1, pp. 6–14, 2015

11

Results: Microcode Statistics

12

ConclusionIntroduces a new architecture for fast and efficient matrix-related computations.

Defines a process for molding architecture to specific uses via defining state machines

in microcode pipeline.

Demonstrates an improvement in power efficiency over CPUs/GPUs.

Few points for comparison against competing architectures.

13

Discussion1. Does the amount of overhead (defining state machine) and compiler

optimizations still make it better than an ASIC?

2. Is this as generic an architecture as claimed?

3. Are simulation results as useful given a physical chip tape out is there?

14

Thank You

15

Recommended