28
End User Update: High-Performance Reconfigurable Computing End User Update: High-Performance Reconfigurable Computing Tarek El-Ghazawi Director, GW Institute for Massively Parallel Applications and Computing Technologies(IMPACT) Co-Director, NSF Center for High-Performance Reconfigurable Computing (CHREC) The George Washington University hpcl.gwu.edu Tarek El-Ghazawi Director, GW Institute for Massively Parallel Applications and Computing Technologies(IMPACT) Co-Director, NSF Center for High-Performance Reconfigurable Computing (CHREC) The George Washington University hpcl.gwu.edu

End User Update: High-Performance Reconfigurable Computing

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: End User Update: High-Performance Reconfigurable Computing

End User Update: High-Performance Reconfigurable Computing

End User Update: High-Performance Reconfigurable Computing

Tarek El-Ghazawi

Director, GW Institute for Massively Parallel Applications and Computing Technologies(IMPACT)

Co-Director, NSF Center for High-Performance Reconfigurable Computing (CHREC)

The George Washington Universityhpcl.gwu.edu

Tarek El-Ghazawi

Director, GW Institute for Massively Parallel Applications and Computing Technologies(IMPACT)

Co-Director, NSF Center for High-Performance Reconfigurable Computing (CHREC)

The George Washington Universityhpcl.gwu.edu

Page 2: End User Update: High-Performance Reconfigurable Computing

2Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Paul Muzio’s Outline!

PerformanceWhat hardware accelerators are you using/evaluating?Describe the applications that you are porting to accelerators?What kinds of speed-ups are you seeing (provide the basis for

the comparison)?How does it compare to scaling out (i.e., just using more X86

processors)?What are the bottlenecks to further performance improvements?

EconomicsDescribe the programming effort required to make use of the

accelerator.AmortizationCompare accelerator cost to scaling out costEase of use issues

FuturesWhat is the future direction of hardware based accelerators?Software futures?What are your thoughts on what the vendors need to do to

ensure wider acceptance of accelerators?

Page 3: End User Update: High-Performance Reconfigurable Computing

3Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Why Accelerators: A Historical Perspective

Vector Machines

MPPs with Multicores and Heterogeneous Accelerators

MassivelyParallel

Processors

1993-HPCC

2006-End of Moore’s Law in Clocking!

Hopes are in Architecture!

Performance

Time

Page 4: End User Update: High-Performance Reconfigurable Computing

4Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Which Accelerators?

We considered HPRCs more than anything elseTo be addressed today

We are increasingly using GPUs

Some Cell

Page 5: End User Update: High-Performance Reconfigurable Computing

5Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

High-Performance Reconfigurable Computing (HPRC)

IEEE Computer, March 2007

High-Performance Reconfigurable Computers are parallel computing systems that contain multiple microprocessors and multiple FPGAs. In current settings, the design uses FPGAs as coprocessors that are deployed to execute the small portion of the application that takes most of the time—under the 10-90 rule, the 10 percent of code that takes 90 percent of the execution time.

Page 6: End User Update: High-Performance Reconfigurable Computing

6Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Evaluated FPGA-Accelerated Systems

SRC- 6

SRC- 6E

XD1

HC-36

Altix-350

Altix-4700

Page 7: End User Update: High-Performance Reconfigurable Computing

An Architectural Classification for Hardware Accelerated

High-Performance Computers

An Architectural Classification for Hardware Accelerated

High-Performance Computers

El-Ghazawi et. al. The Performance Potential of HPRCs. IEEE Computer, February 2008

Page 8: End User Update: High-Performance Reconfigurable Computing

8Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Uniform Nodes Non-Uniform System (UNNS)

μP Node

…μP 1 μP N

RP Node

…RP 1 RP N

RP Node

…RP 1 RP N

μP Node

…μP 1 μP N

IN and/or GSM

HPRC Examples: SRC 6/7, SGI Altix/RC100 Systems

Page 9: End User Update: High-Performance Reconfigurable Computing

9Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Non-Uniform NodesUniform System (NNUS)

IN and/or GSM

HPRC Example: Cray XD1, Cray XT5h

μP RPμP RP

Page 10: End User Update: High-Performance Reconfigurable Computing

Applications and Performance

Applications and Performance

Cryptography, Remote Sensing and BioinformaticsCryptography, Remote Sensing and Bioinformatics

Page 11: End User Update: High-Performance Reconfigurable Computing

11Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Multi-Spectral Imagery 10’s of bands (MODIS ≡ 36 bands, SeaWiFS ≡ 8 bands, IKONOS ≡ 5 bands)

Hyperspectral Imagery100’s-1000’s of bands (AVIRIS ≡ 224 bands, AIRS ≡ 2378 bands)Challenges (Curse of

Dimensionality)Solution

Dimension ReductionMultispectral / Hyperspectral Imagery Comparison

Hyperspectral Dimension Reduction

Page 12: End User Update: High-Performance Reconfigurable Computing

12Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Hyperspectral Dimension Reduction(Techniques)

Principal Component Analysis (PCA):Most Common Method

Dimension ReductionComplex and Global

computations: difficult for parallel processing and hardware implementations

Wavelet-Based Dimension Reduction*:Simple and Local OperationsHigh-Performance

Implementation

Multi-Resolution Wavelet Decomposition of Each Pixel 1-D Spectral Signature (Preservation of Spectral Locality)

* S. Kaewpijit, J. Le Moigne, T. El-Ghazawi, “Automatic Reduction of Hyperspectral Imagery Using Wavelet Spectral Analysis”, IEEE Transactions on Geoscience and Remote Sensing, Vol. 41, No. 4, April, 2003, pp. 863-871.

Page 13: End User Update: High-Performance Reconfigurable Computing

13Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Wavelet-Based Dimension Reduction(Execution Profiles on SRC)

Total Execution Time = 20.21 sec (Pentium4, 1.8GHz)

Total Execution Time = 1.67 sec (SRC-6E, P3)

Speedup = 12.08 x (without-streaming)

Speedup = 13.21 x (with-streaming)

Total Execution Time = 0.84 sec (SRC-6)Speedup = 24.06 x (without-streaming)

Speedup = 32.04 x (with-streaming)

Page 14: End User Update: High-Performance Reconfigurable Computing

14Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Cloud Detection

Software/Reference Mask

Band 2 (Green Band) Band 3 (Red Band) Band 4 (Near-IR Band) Band 5 (Mid-IR Band)

Band 6 (Thermal IR Band) Hardware Floating-Point Mask(Approximate Normalization)

Hardware Fixed-Point Mask(Approximate Normalization)

Page 15: End User Update: High-Performance Reconfigurable Computing

15Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

GC

TATTGG- 0

GATACTTT-

Protein and DNA Matching-The Scoring Matrix

-4-4-3-12581114GC

TATTGG-

-5-4-2147101316

-2-4-3-20369121-1-3-3-114710420-2-302587531-1-2036

1086420-214131197531-121614121086420GATACTTT-

0_1,_,1

,1,1max,

penaltygapjiFpenaltygapjiFyxsjiF

jiFji

Page 16: End User Update: High-Performance Reconfigurable Computing

16Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

28x86x2x961IDEA Breaker

253x779x22x8723Smith-Waterman (DNA Sequencing)

1116x3439x96x38514DES Breaker

SAVINGS

17x

Cost Savings

198x610x6838RC5(32/12/16) Breaker

Size ReductionPower SavingsSpeedupApplication

Savings of HPRC (Based on one Altix 4700 10U rack)

Assumptions100% cluster efficiencyCost Factor P : RP 1 : 400Power Factor P : RP 1 : 11.2

1 10U Rack: 1230 W µP board (with two µPs): 220 W

Size Factor P : RP 1 : 34.5 Cluster of 100 µPs = four 19-inch racks

» footprint = 6 square feet Reconfigurable computer (10U)

» footprint = 2.07 square feet

Page 17: End User Update: High-Performance Reconfigurable Computing

17Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

24x116x23x2321RC5(32/8/8) Breaker

25x120x24x2402IDEA Breaker

1x6x1x125HyperspectralDimension Reduction

29x140x28x2794Smith-Waterman (DNA Sequencing)

127x608x122x12162DES Breaker

SAVINGS

1x

Cost Savings

1x5x110Cloud Detection

Size ReductionPower SavingsSpeedupApplication

Savings of HPRC (Based on one Cray-XD1 chassis)

Assumptions 100% cluster efficiency Cost Factor P : RP 1 : 100 Power Factor P : RP 1 : 20

Reconfigurable processor (based on one XD1 Chassis): 2200 W µP board (with two µPs): 220 W

Size Factor P : RP 1 : 95.8 Cluster of 100 µPs = four 19-inch racks

» footprint = 6 square feet Reconfigurable computer (one XD1 Chassis)

» footprint = 5.75 square feet

Page 18: End User Update: High-Performance Reconfigurable Computing

18Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

34x313x6x1140RC5(32/12/16) Breaker

0.96x9x0.16x32HyperspectralDimension Reduction

34x313x6x1138Smith-Waterman (DNA Sequencing)

203x1856x34x6757DES Breaker

19x176x3x641IDEA Breaker

SAVINGS

0.14x

Cost Savings

0.84x8x28Cloud Detection

Size ReductionPower SavingsSpeedupApplication

Savings of HPRC (Based on SRC-6)

Assumptions 100% cluster efficiency Cost Factor P : RP 1 : 200 Power Factor P : RP 1 : 3.64

Reconfigurable processor (based on SRC-6): 200 W µP board (with two µPs): 220 W

Size Factor P : RP 1 : 33.3 Cluster of 100 µPs = four 19-inch racks

» footprint = 6 square feet Reconfigurable computer (SRC MAPstationTM)

» footprint = 1 square feet

Page 19: End User Update: High-Performance Reconfigurable Computing

ProgrammingProgramming

Page 20: End User Update: High-Performance Reconfigurable Computing

20Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 200820

Historical PerspectiveHistorical Perspective

Users

Tools Evolution

Circuit

Designers

Schematics

RTL

Glue LogicGlue LogicGlue LogicGlue Logic

Logic FabricLogic Fabric(180 nm)(180 nm)

DSP & N

etworki

ng

Designers

IP Core Generators

HDLs

Custom Custom Comp.Comp.

Custom Custom Comp.Comp.

DSP Slice & Dual Port DSP Slice & Dual Port Block RAM Block RAM (130 nm)(130 nm)

Embedded

Software

Engineers

Embedded

System

Des

igners HW/SW Codesign

Embedded & DSP IDEs

HLLs

PSoCPSoCPSoCPSoC

Embedded Processors Embedded Processors & Transceivers & Transceivers (90 nm)(90 nm)

InIn--Socket AcceleratorsSocket Accelerators(65 nm)(65 nm)

HPRCHPRCHPRCHPRC

Platform Specifications & Parallel SW Languages

Improved-HLLs

RC-Aware

Domain

Scientis

ts

Domain Scie

ntists

New Methodologies, Programming

Models and IDEs

Technology

Applications

Page 21: End User Update: High-Performance Reconfigurable Computing

21Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Programming HPRCs

Page 22: End User Update: High-Performance Reconfigurable Computing

22Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 20082222

Productivity Analysis of Existing ToolsProductivity Analysis of Existing Tools

Tools considered Impulse-C Handel-C Carte-C Mitrion-C SysGen RC-Toolbox HDLs

Utility Frequency Area

Cost Acquisition time

Learning time Development

timeResults excerpted from GWU papers in SPL’07 and FPT’07 conferences.

Page 23: End User Update: High-Performance Reconfigurable Computing

23Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Future Hardware Development

More use of socket-based integration

Better integration with memory hierarchy

Better accelerators, in the FPGA sideMore computationally oriented/Floating-point

cores?Coarser grain FPGAs?

On-chip FPGAs and accelerators?

Page 24: End User Update: High-Performance Reconfigurable Computing

24Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Parallelism Concepts: From Systems to Commodity Chips

Early 1970sFirst Vector andSIMD Systems•CDC STAR-100•TI ASC•ILLIAC IV 1985

FPGAXilinx

1996-1998SIMD AltivecBy Apple, IBM, and Motorola

2001Vector Processor/SIMD CELL BE

1998HPRCSRC

Hybrid-Reconfigurable/ Chip? Accelerators as cores?GPGPUs

NVIDIA and AMD

1971-78First MIMD System•CMU C.mmp (16 PDP11s)

Mulicore CPUsIBM Power 4

Time

Coming soon?

Page 25: End User Update: High-Performance Reconfigurable Computing

25Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Future Software

More user/application-centric programming

Unified parallel programming interface?

More efficient compiling

Tools for accelerator-GPP application co-design

Virtualization for ease-of-use and portability

HELP MAY BE COMING?

Page 26: End User Update: High-Performance Reconfigurable Computing

26Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

DARPA Studies

DARPA is looking at bridging productivity gap for FPGAs

NSF CHREC Schools (UF, GWU) and (BYU, VT) conducted a DARPA study

DARPA has at least one more ongoing study

Are we going to see any BAAs?

Page 27: End User Update: High-Performance Reconfigurable Computing

27Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Page 28: End User Update: High-Performance Reconfigurable Computing

28Tarek El-Ghazawi, GWU HPC User Forum, Roanoke April 21, 2008

Conclusions Lots of common issues among accelerators For the applications that they can do well, they do really well! FPGAs were not built originally for computingLimited applicationsLess than user friendly interfacesVery long time for compiling

Programming languages expose a restrictive view of the system and are often hardware oriented, Need a single system wide language paradigm

A major bottleneck is data transfer rates between the microprocessor and the FPGA

More work is needed on how to manage heterogeneity Virtualization for portability and ease of useAdvanced programming models based on parallel computingNew tools for performance tuning and debugging in

heterogeneous environmentsBetter integration into memory hierarchy

The above requires fundamental work that will be unlikely supported by vendors alone, it needs a for example DARPA Driven Industry/University effort (like HPCS)