8
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company Proprietary Company Proprietary Slide Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications Presentation For IPDPS Conference 28 April 2004

WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

Embed Size (px)

DESCRIPTION

WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 3 Multi-Threaded Array Processing Architecture Multi-threaded Array Processor Fully programmable in C Hardware multi-threading Extensible instruction set Scalable internal parallelism Array of Processing Elements (PEs) Compute, bandwidth scale together From 10s to 1,000s of PEs Built-in PE redundancy High performance, low power ~10 GFLOPS/Watt Multiple high speed I/O channels

Citation preview

Page 1: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 11

An Ultra-High Performance Scalable Processing Architecture

for HPC and Embedded Applications

PresentationFor

IPDPS Conference

28 April 2004

Page 2: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 22

CS301 Up Close Multi-Threaded Array

Processor 25.6 GFLOPS 3W worst-case, 2W typical 200MHz 64 PEs, 4 Kbytes each

PE ArrayPE Array

ControlControl SRAMSRAM

BusBus

ClearConnect bus 64-bit full duplex 1.6 Gbyte/s each direction 2x 0.8-Gbyte/s bridge ports

Scratchpad memory 128 Kbytes of SRAM

Availability Currently available

Page 3: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 33

Multi-Threaded Array Processing Architecture

Multi-threaded Array Processor Fully programmable in C Hardware multi-threading Extensible instruction set

Scalable internal parallelism Array of Processing Elements (PEs) Compute, bandwidth scale together From 10s to 1,000s of PEs Built-in PE redundancy

High performance, low power ~10 GFLOPS/Watt

Multiple high speed I/O channels

Page 4: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 44

Processing ElementsPEs are highly optimised execution units:• ALU, MAC, FPU• High-bandwidth, multiport register file• High bandwidth per PE DMA (PIO, SIO)• Closely coupled SRAM for data

64 PEs at 200MHz• 25.6 GFLOPS• 51.2 Gbyte/s bandwidth to PE memory• 12,800 MIPS

Supports multiple data types:• 8, 16, 24, 32-bit, ... fixed-point arithmetic• 32-bit IEEE floating-point arithmetic

Page 5: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 55

ClearConnectTM High-Speed BusLanes from 25 to 100Gbit/s full duplex

• Packet switched architecture• Scales to 4 lanes per bus• Lane widths: 32 to 256-bit• Distributed arbitration• Low power• Highly flexible

Page 6: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 77

Off the shelf Products

CS301 64 PE chip - 2W, 25 GFLOPS - Hardware Development Support

Fully functional SDK - Application Support - Software Libraries

Dual 64 PCI Development Board – 50 GFLOPS performance- Acceleration for clusters and HPC applications- Development environment for embedded applications- Growing catalog of software application libraries- Scalable with robust evolution path

Page 7: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 88

Systems Integration Examples

PC plug-in accelerator

Coprocessors in a PC server*

Coprocessors in a blade server*

COTS hardwareCOTS hardware

*Images courtesy of Angstrom Microsystems**Image courtesy of Office of Naval Research

Silver Fox **

AlgorithmAlgorithmdevelopmentdevelopment

for embeddedfor embeddedapplicationsapplications

Page 8: WorldScape Defense Company, L.L.C. Company Proprietary Slide 1 An Ultra-High Performance Scalable Processing Architecture for HPC and Embedded Applications

WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 99

WorldScape’s Offering

Chip Technology - 64 PE/256 PE… - customizable…

Support Tools- SDK, VSIPL, PCA morphware…

Board Level Integration- custom, I/O, i/f, …

Application Integration- FFT, PC, HSI, SceneServer …