Upload
meryl-mitchell
View
220
Download
3
Tags:
Embed Size (px)
Citation preview
Cell Broadband Engine Architecture
Cell Broadband Engine Architecture
Bardia MahjourENCM 515
March 2007
Bardia MahjourENCM 515
March 2007
AgendaAgenda
Introduction History Applications Architecture Features Some Statistics Programming Model CBEA as DSP Comparison with TigerSHARC Conclusion
Introduction History Applications Architecture Features Some Statistics Programming Model CBEA as DSP Comparison with TigerSHARC Conclusion
IntroductionIntroduction Single Chip Multi-processor 9 processors built into a single die
Needs that arose in areas such as:
Cryptography Graphics transformations and lighting Physics Fast-Fourier Transforms (FFT) Matrix operations Scientifically compute-intensive tasks
Goals: power-efficient cost-effective high-performance processing wide range of applications including game consoles.
IBM XL Family of compilers (XL C/C++)
Single Chip Multi-processor 9 processors built into a single die
Needs that arose in areas such as:
Cryptography Graphics transformations and lighting Physics Fast-Fourier Transforms (FFT) Matrix operations Scientifically compute-intensive tasks
Goals: power-efficient cost-effective high-performance processing wide range of applications including game consoles.
IBM XL Family of compilers (XL C/C++)
Cell die photo courtesy of Thomas Way, IBM Burlington
HistoryHistory
A joint venture by Sony, Toshiba, and IBM (STI)
Official Design phase started in March of 2001
Three giant companies spent 4 years and US$400M to design and develop Cell
First Commercial Use in Sony’s PlayStation 3 in November 2006.
A joint venture by Sony, Toshiba, and IBM (STI)
Official Design phase started in March of 2001
Three giant companies spent 4 years and US$400M to design and develop Cell
First Commercial Use in Sony’s PlayStation 3 in November 2006.
ApplicationsApplications Console Video Games
PlayStation 3
Home Cinema Toshiba’s HDTV
Embedded Applications Medical Imaging, aerospace, telecommunication, defense, etc. Mercury Computer Systems, Inc.
Super Computing Roadrunner
Blade Servers
Console Video Games PlayStation 3
Home Cinema Toshiba’s HDTV
Embedded Applications Medical Imaging, aerospace, telecommunication, defense, etc. Mercury Computer Systems, Inc.
Super Computing Roadrunner
Blade Servers
ArchitectureArchitecture
PowerPC Processor Element (PPE) - 64-bit PowerPC RISC core (can run OS)
Synergistic Processor Elements (SPEs) - Each element is a DSP processor. CBEA has 8 of them!
Element Interconnect Bus (EIB)
Memory Interface Controller (MIC)
Cell Broadband Engine Interface (BEI)
PowerPC Processor Element (PPE) - 64-bit PowerPC RISC core (can run OS)
Synergistic Processor Elements (SPEs) - Each element is a DSP processor. CBEA has 8 of them!
Element Interconnect Bus (EIB)
Memory Interface Controller (MIC)
Cell Broadband Engine Interface (BEI)
FeaturesFeatures
PPE has a pipeline 10 levels deep
Each SPE has: a 128x128 register file a floating-point unit two fixed-point units VMX vector arithmetic unit Local Store DMA controller
PPE has a pipeline 10 levels deep
Each SPE has: a 128x128 register file a floating-point unit two fixed-point units VMX vector arithmetic unit Local Store DMA controller
Some StatisticsSome Statistics
Observed clock speed: > 4 GHz
Peak performance (single precision): > 256 Gflops
Peak performance (double precision): >26 GFlops
Local storage size per SPU: 256KB
Area: 221 mm²
Technology 90nm SOI
Total number of transistors: 234M
Observed clock speed: > 4 GHz
Peak performance (single precision): > 256 Gflops
Peak performance (double precision): >26 GFlops
Local storage size per SPU: 256KB
Area: 221 mm²
Technology 90nm SOI
Total number of transistors: 234M
Programming ModelProgramming Model
Function Offload Model
Device Extension Model
Computational Acceleration Model
Streaming Models
Shared-memory Multi-processor Model
Asymmetric Thread Runtime Model
User-Mode Thread Model
SPE Overlay
Function Offload Model
Device Extension Model
Computational Acceleration Model
Streaming Models
Shared-memory Multi-processor Model
Asymmetric Thread Runtime Model
User-Mode Thread Model
SPE Overlay
Function Offload Model
Function Offload Model
Remote Procedure Call (RPC)Remote Procedure Call (RPC)
/* file hello.idl */
interface greeting{[sync] idl_id_t hello ([in] int nbytes, [in, size_is(nbytes)]
char message[]);}
/* file hello.c */
#include <stub.h>
int main( ){
char* str = “Hi, from the Cell!”;
hello( strlen(str), str);
}
/* file spu_hello.c */
#include <stdio.h>
#include <stub.h>
idl_id_t hello( int nbytes, char msg[]) {
printf(“SPE: %s\n”, ms);
return 0;
}
/* file hello.idl */
interface greeting{[sync] idl_id_t hello ([in] int nbytes, [in, size_is(nbytes)]
char message[]);}
/* file hello.c */
#include <stub.h>
int main( ){
char* str = “Hi, from the Cell!”;
hello( strlen(str), str);
}
/* file spu_hello.c */
#include <stdio.h>
#include <stub.h>
idl_id_t hello( int nbytes, char msg[]) {
printf(“SPE: %s\n”, ms);
return 0;
}
Function Offload Model
Function Offload Model
Thread Runtime Model
Thread Runtime Model
speid_t spe_create_thread( spe_gid_t gid, spe_program_handle_t
*spe_program_handle,void *argp, void
*envp, unsigned long *mask, int flags );
Example PPE Code:
#include <libspe.h>
#define NUM_SPES 8
extern spe_program_handle_t spe_code;
int main( ) {
for (i = 0; i < NUM_SPES; i++)
spe_ids[i] = spe_create_thread(gid,&spe_code,
NULL, NULL, -1, 0);
return 0;
}
speid_t spe_create_thread( spe_gid_t gid, spe_program_handle_t
*spe_program_handle,void *argp, void
*envp, unsigned long *mask, int flags );
Example PPE Code:
#include <libspe.h>
#define NUM_SPES 8
extern spe_program_handle_t spe_code;
int main( ) {
for (i = 0; i < NUM_SPES; i++)
spe_ids[i] = spe_create_thread(gid,&spe_code,
NULL, NULL, -1, 0);
return 0;
}
CBEA as DSPCBEA as DSPStrictly speaking : Cell is a microprocessor
Designed to bridge the gap between conventional and special-purpose processors
Handles heavy digital signal processing workloads ( 3D graphics, 48 MPEG-2 Channels, etc. )
Meets most of the ideal DSP processor requirements
Strictly speaking : Cell is a microprocessor
Designed to bridge the gap between conventional and special-purpose processors
Handles heavy digital signal processing workloads ( 3D graphics, 48 MPEG-2 Channels, etc. )
Meets most of the ideal DSP processor requirements
Comparison with TigerSHARC
Comparison with TigerSHARC
Size requirement
Power consumption and heat
generation
Supports floating-point ops in
hardware
Bandwidth and data-width
Avoids resource dependencies
Scalability
Ease of programming
Size requirement
Power consumption and heat
generation
Supports floating-point ops in
hardware
Bandwidth and data-width
Avoids resource dependencies
Scalability
Ease of programming
ConclusionConclusion
Cell Broadband Engine Architecture is an extremely powerful, scalable and fast processor. It is not purely a digital signal processor, however, the wide range of applications it is suited for includes DSP. Furthermore, many of the requirements of DSP applications were the rationale behind CBEA’s design and architectural decisions.
Cell Broadband Engine Architecture is an extremely powerful, scalable and fast processor. It is not purely a digital signal processor, however, the wide range of applications it is suited for includes DSP. Furthermore, many of the requirements of DSP applications were the rationale behind CBEA’s design and architectural decisions.
ReferencesReferences[1] IBM Research, The Cell Architecture, Innovation Matters.
Available at http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html
Accessed Feb 19th, 2007
[2] IBM Systems and Technology Group, Cell Broadband Engine Programming Tutorial Version 2.0, December 15, 2006
[3] Wikipedia , Cell Microprocessor Implementations.
Available at http://en.wikipedia.org/wiki/Cell_microprocessor_implementations - endnote_sti32nm
Accessed Feb 20th, 2007
[4] Signalogic 1995-2007, DSP Applications.
Available at http://www.signalogic.com/index.pl?page=dsp_app#WhatDSP
Accessed Feb 21st, 2007
[5] Wikipedia , Cell Microprocessor.
Available at http://en.wikipedia.org/wiki/Cell_Broadband_Engine
Accessed Feb 22nd, 2007
[6] IBM Journal of Research and Development, Introduction to the Cell multiprocessor (September 7, 2005) Available at http://researchweb.watson.ibm.com/journal/rd/494/kahle.html
[7] Smith, M. R. (1992). How RISCy is DSP? Micro, IEEE, Volume 12, Issue 6, 10-22.
[8] Analog Devices Inc. One Technology Way, ADSP-TS201 TigerSHARC Processor Programming Reference, Version 1.1, April 2005
[1] IBM Research, The Cell Architecture, Innovation Matters.
Available at http://domino.research.ibm.com/comm/research.nsf/pages/r.arch.innovation.html
Accessed Feb 19th, 2007
[2] IBM Systems and Technology Group, Cell Broadband Engine Programming Tutorial Version 2.0, December 15, 2006
[3] Wikipedia , Cell Microprocessor Implementations.
Available at http://en.wikipedia.org/wiki/Cell_microprocessor_implementations - endnote_sti32nm
Accessed Feb 20th, 2007
[4] Signalogic 1995-2007, DSP Applications.
Available at http://www.signalogic.com/index.pl?page=dsp_app#WhatDSP
Accessed Feb 21st, 2007
[5] Wikipedia , Cell Microprocessor.
Available at http://en.wikipedia.org/wiki/Cell_Broadband_Engine
Accessed Feb 22nd, 2007
[6] IBM Journal of Research and Development, Introduction to the Cell multiprocessor (September 7, 2005) Available at http://researchweb.watson.ibm.com/journal/rd/494/kahle.html
[7] Smith, M. R. (1992). How RISCy is DSP? Micro, IEEE, Volume 12, Issue 6, 10-22.
[8] Analog Devices Inc. One Technology Way, ADSP-TS201 TigerSHARC Processor Programming Reference, Version 1.1, April 2005