36
Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Embed Size (px)

Citation preview

Page 1: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Softcore Vector Processor

Team ASP

Brandon Harris

Arpith Jacob

Page 2: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Outline• Motivation

• Smith-Waterman

• Solution

• System Architecture• Overview• Functional Unit• Instruction Controller• Processing Element• Memory Controller

• ISA

• Results

• Future Research

Page 3: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 4: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 5: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 6: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 7: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 8: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 9: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 10: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 11: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 12: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 13: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation• Smith-Waterman sequence alignment

Page 14: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Motivation

•Similar Problems

• HMMer, BLAST, RNA Secondary Structure Prediction

• Smith-Waterman sequence alignment

Page 15: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Our Solution• Softcore Vector Processor

• Massively Parallel

• Software programmable

• Configurable Instantiation

• Why Softcore?

• Optimize for specific applications

• Adapt to changes in algorithms

• FPGA technology improves with time

Page 16: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Architectural Overview• Streaming Architecture

• Memory Mapped FIFOs

• Read Once Data

• Write Once Data

• Provides communication between components

Software DMA

SVP

Functional

Unit

DMA Software

SVP

Functional

Unit

Page 17: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Architectural Overview

Software DMA

SVP

Functional

Unit

DMA Software

SVP

Functional

Unit

• Streaming Architecture• Memory Mapped FIFOs

• Read Once Data

• Write Once Data

• Provides communication between components

Page 18: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Functional Unit

Instruction

Controller

Instr.

Mem

Processing

Element

Processing

Element

Processing

Element

Memory Controller

Shared Local MemoryStream In Stream Out

Reg

File

Reg

File

Reg

File

Page 19: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Processing

Element

Processing

Element

Processing

Element

R0: 0R1: 1R2:R3:R4:R5:

R5R5 1010addi R1addi R1

Instruction Controller• SIMD Instruction Broadcast

addi 10R5 R1

R0: 0R1: 0R2: R3:R4:R5:

R0: 0R1: 2R2:R3:R4:R5:

0 1 2

10 11 12

Page 20: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Processing

Element

Processing

Element

Processing

Element

R2Ld R2 0Ld R3 0R3

• SIMD Instruction Broadcast

R0: 0R1: 0R2: R3: ptr1R4:R5:

R0: 0R1: 0R2: R3: ptr1R4:R5:

R0: 0R1: 0R2: R3: ptr1R4:R5:

R2 0Ld R3

ptr1 ptr1 ptr1

Instruction Controller

Page 21: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Processing

Element

Processing

Element

Processing

Element

R2Ldir IR3R0

Instruction Controller• SIMD Instruction Broadcast

• Instruction Register Broadcast• 40% Register Savings

R0: 0R1: 0R2: R3:R4:R5:

R0: 0R1: 0R2: R3:R4:R5:

R0: 0R1: 0R2: R3:R4:R5:

ptr1 ptr1 ptr1

R0: 0R1: R2: R3:R4:R5:

Ld

Page 22: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Processing

Element

Processing

Element

Processing

Element

R2 R0

Instruction Controller• SIMD Instruction Broadcast

• Instruction Register Broadcast• 40% Register Savings

R0: 0R1: 0R2: R3:R4:R5:

R0: 0R1: 0R2: R3:R4:R5:

R0: 0R1: 0R2: R3:R4:R5:

ptr1

R0: 0R1: R2: R3:R4:R5:

ptr1Ld

Page 23: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Processing Element

Register

File

Register

File

Ra Addr Rb Addr

Data Select

Pipeline Register

ALU

Pipeline Register

Compare

Write Enables Data

Ra Data Left

Rb Data Left Rb Data Right

Ra Data Right

ImmediateRa Addr Rb Addr

Wr Enable Left Wr En Right

Memory Controller

Mem Wr Enable

bmseti R17 EQ 16

1 1 1 1 1

1 2

2 16

0 1

0

0

0

Page 24: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Functional Unit

Reg

File

Instruction

Controller

Instr.

Mem

Reg

File

Processing

Element

Reg

File

Processing

Element

Processing

Element

Memory Controller

Shared Local MemoryStream In Stream Out

Page 25: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Functional Unit

Reg

File

Instruction

Controller

Instr.

Mem

Reg

File

Processing

Element

Reg

File

Processing

Element

Processing

Element

Memory Controller

Shared Local MemoryStream In Stream Out

Page 26: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Memory Controller

Memory Controller

Dual

Ported

Block

RAM

Dual

Ported

Block

RAM

Dual

Ported

Block

RAM

IC PE 0-3

Single Cycle Read

Page 27: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Memory Controller

Memory Controller

Dual

Ported

Block

RAM

Dual

Ported

Block

RAM

Dual

Ported

Block

RAM

IC PE 0-3

Multiple Cycle Write

Page 28: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Instruction Set Architecture• Custom ISA

• Two Sets of Instruction Types• Instruction Controller• Processing Element

• Optimized for target applications

• Max, Min, Loop

• Expandable

• Core vs. Application Specific

Page 29: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Sample Code_query_loop:

subir %r8, %r3, %ir10nopnopmax %r4, %r4, %r8add %r3, %r19, PE_ZERO_REG

bmseti PE_ID_REG EQ PE_NUM_ELEMENTS - 1icaddi %ir15, %ir8, PE_NUM_ELEMENTS - 1nopnopldir PE_MEM_REG, PE_ZERO_REG(%ir15)nopnopnopnopaddi %r3, PE_MEM_REG, 0

bmend

ld PE_MEM_REG, PE_ZERO_REG(DB_ADDRESS)icaddi %ir7, %ir7, 1icaddi %ir9, %ir9, 1

icloop %ir4, %ir5, _query_loop

_query_loop:

icaddi %ir15, %ir8, PE_NUM_ELEMENTS - 1subir %r8, %r3, %ir10add %r3, %r19, PE_ZERO_REGldir PE_MEM_REG, PE_ZERO_REG(%ir15)max %r4, %r4, %r8

bmseti PE_ID_REG EQ PE_NUM_ELEMENTS - 1icaddi %ir7, %ir7, 1icaddi %ir9, %ir9, 1addi %r3, PE_MEM_REG, 0

bmend

ld PE_MEM_REG, PE_ZERO_REG(DB_ADDRESS)

icloop %ir4, %ir5, _query_loop

Page 30: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Results• VHDL Implementation

• Simulated• Synthesized

• Smith-Waterman• 16 PE version tested• Millions of Cell Updates Per Second (MCUPS)

Page 31: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Smith-Waterman Speedup

System Freq MCUPS Speedup

P4 1.8 GHz 15 1

SVP16 150 MHz 52 3.47

SVP32 150 MHz 103 6.87

SVP64 125 MHz 167 11.13

SVP128 120 MHz 302 20.13

SVP128 150 MHz 378 25.20

Page 32: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Comparative Performance

System* Freq PEs/Chip MCUPS/PE

Chips MCUPS/Chip

Cost($1000)

MCUPS/$1000

SVP128 150 MHz 128 2.95 1 378 5 75

SVP128 120 MHz 128 2.36 1 302 5 60

SVP64 125 MHz 64 2.61 1 167 5 33

SVP32 150 MHz 32 3.22 1 103 5 20

Kestrel 20 MHz 64 0.78 8 50 25† 16

GeneMatcher2 192 MHz 192 5.21 16 1000 69 14

Fuzion 150 200 MHz 1536 1.63 1 2500 ? ?

* Reference [1]† Estimated

Page 33: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Performance

PEs Freq (MHz) Area BRAM

16 150 13% 22

32 150 22% 38

64 125 41% 70

128 120 80% 134

• Hardware• Xilinx Vertex 4 VLX200

Page 34: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Future Work • Software Development

• How can HMMer and other systolic algorithms be implemented?

• ISA Expansion• What additional instructions are needed?• What instructions can be added to optimize?

• Hardware Development• How can we optimize the hardware to make it

faster and smaller?• What hardware can we add to enhance performance?• How can we take advantage of advances in FPGAs, such as DSP48s?

Page 35: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Acknowledgments • Special Thanks

• Young Cho• Roger Chamberlain• Jeremy Buhler• Joseph Lancaster

• References• Di Blas et al, “The Kestrel Parallel Processor,” IEEE Transactions on Parallel and Distributed Systems, January 2005• A. Jacob et al, “Whole Genome Comparison Using Commodity Workstations,” Technical Report, 2003

Page 36: Softcore Vector Processor Team ASP Brandon Harris Arpith Jacob

Questions?

Team ASP

Brandon Harris

Arpith Jacob