Upload
phebe-nichols
View
219
Download
0
Embed Size (px)
DESCRIPTION
WorldScape Defense Company, L.L.C. Company Proprietary WorldScape Defense Company, L.L.C. Company Proprietary Slide 3 Multi-Threaded Array Processing Architecture Multi-threaded Array Processor Fully programmable in C Hardware multi-threading Extensible instruction set Scalable internal parallelism Array of Processing Elements (PEs) Compute, bandwidth scale together From 10s to 1,000s of PEs Built-in PE redundancy High performance, low power ~10 GFLOPS/Watt Multiple high speed I/O channels
Citation preview
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 11
An Ultra-High Performance Scalable Processing Architecture
for HPC and Embedded Applications
PresentationFor
IPDPS Conference
28 April 2004
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 22
CS301 Up Close Multi-Threaded Array
Processor 25.6 GFLOPS 3W worst-case, 2W typical 200MHz 64 PEs, 4 Kbytes each
PE ArrayPE Array
ControlControl SRAMSRAM
BusBus
ClearConnect bus 64-bit full duplex 1.6 Gbyte/s each direction 2x 0.8-Gbyte/s bridge ports
Scratchpad memory 128 Kbytes of SRAM
Availability Currently available
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 33
Multi-Threaded Array Processing Architecture
Multi-threaded Array Processor Fully programmable in C Hardware multi-threading Extensible instruction set
Scalable internal parallelism Array of Processing Elements (PEs) Compute, bandwidth scale together From 10s to 1,000s of PEs Built-in PE redundancy
High performance, low power ~10 GFLOPS/Watt
Multiple high speed I/O channels
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 44
Processing ElementsPEs are highly optimised execution units:• ALU, MAC, FPU• High-bandwidth, multiport register file• High bandwidth per PE DMA (PIO, SIO)• Closely coupled SRAM for data
64 PEs at 200MHz• 25.6 GFLOPS• 51.2 Gbyte/s bandwidth to PE memory• 12,800 MIPS
Supports multiple data types:• 8, 16, 24, 32-bit, ... fixed-point arithmetic• 32-bit IEEE floating-point arithmetic
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 55
ClearConnectTM High-Speed BusLanes from 25 to 100Gbit/s full duplex
• Packet switched architecture• Scales to 4 lanes per bus• Lane widths: 32 to 256-bit• Distributed arbitration• Low power• Highly flexible
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 77
Off the shelf Products
CS301 64 PE chip - 2W, 25 GFLOPS - Hardware Development Support
Fully functional SDK - Application Support - Software Libraries
Dual 64 PCI Development Board – 50 GFLOPS performance- Acceleration for clusters and HPC applications- Development environment for embedded applications- Growing catalog of software application libraries- Scalable with robust evolution path
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 88
Systems Integration Examples
PC plug-in accelerator
Coprocessors in a PC server*
Coprocessors in a blade server*
COTS hardwareCOTS hardware
*Images courtesy of Angstrom Microsystems**Image courtesy of Office of Naval Research
Silver Fox **
AlgorithmAlgorithmdevelopmentdevelopment
for embeddedfor embeddedapplicationsapplications
WorldScape Defense Company, L.L.C. WorldScape Defense Company, L.L.C. Company ProprietaryCompany Proprietary Slide Slide 99
WorldScape’s Offering
Chip Technology - 64 PE/256 PE… - customizable…
Support Tools- SDK, VSIPL, PCA morphware…
Board Level Integration- custom, I/O, i/f, …
Application Integration- FFT, PC, HSI, SceneServer …