View
78
Download
0
Category
Preview:
DESCRIPTION
Программирование процессора Cell. Киреев С.Е. Летняя школа по параллельному программированию, Новосибирск, 28 августа 2009. Сравнение современных процессоров. Реализации систем на базе Cell. Roadrunner – самый мощный суперкомпьютер в мире на базе процессоров Opteron и Cell. - PowerPoint PPT Presentation
Citation preview
Cell .. , , 28 2009
CellRoadrunner Opteron Cell
Cell Cell
Cell IBMIBM BladeCenter QS 212 Cell B.E. 3.2 GHzIBM BladeCenter QS 222 PowerXCell 8i 3.2 GHz2 PowerXCell 8i 4.0 GHz
Mercury Computer SystemsMercury Dual Cell Based System 22 Cell B.E. 3.2 GHzMercury Dual Cell Based Blade 22 Cell B.E. 3.2 GHzMercury PCI Express Cell Accelerator Board 21 Cell B.E. 2.8 GHz
FixstarsFixstars GagaAccell 180 Accelerator Board1 PowerXCell 8i 2.8 GHz
-PeakCell S2 PowerXCell 8i 3.2 GHzPeakCell W2 PowerXCell 8i 3.2 GHzPeakCell YPS4 PowerXCell 8i 3.2 GHz
Sony PlayStation3 1 CELL B.E. 3.2 GHz1 PPE6 SPE256 MB
Cell SPE
-100 () -30 ()
L2
L2
L2
L2
FSB
0
1
Cell CellPower Processor Element ()Synergistic Processor Element ()Element Interconnect Bus
Cell1 PPE (PowerPC, 2 )
8 SPE ( ) : 256 KB : 128 EIB
Cell. , :2 PPE8 SPE
Cell. , :2 PPE8 SPE
. , :SPE SIMDPPE VMX
Cell. , :2 PPE8 SPE
. , :SPE SIMDPPE VMX
PPE (read / write) SPE SPE (put / get)SPE (read / write) SPE SPESPE
SPE
:
SPE SPE ( ) LS SPE SPE
Load 1Count 1Store 1Load 2Count 2Store 2Load 1Count 1Store 1Load 2Count 2Store 2Count 3Load 3Load 4Count 4Store 3 SPE SPELoad SPECount Store SPE
Cell SPE
Cell Cell :
SPE
SPE
Cell SMP- + (OpenMP, PThreads, )
+ MPI
Cell
Cell SMP- + (OpenMP, PThreads, ) 0123 , 4-core SMP
Cell + MPI , CLUSTER
Cell Cell PPE SPE PPESPE SPE
CellFunction offload modelPPE{matrix a, b, c;multiply(a, b, c);}SPEmultiply(){ }mul(){ }
Cell SPESPESPESPEPPEwork() { }work() { }work() { }work() { }TaskTaskTaskTaskTaskTaskmain(){ AddTask(); }Task
Cell SPESPESPESPEPPEstep1() { }step2() { }step4() { }step3() { }InputdataOutputdatamain(){ make_input(); get_output(); }
Cell SPEPPESPESPESPESPESPESPESPE
Cell SPEPPESPESPESPESPESPESPESPE PPE
Cell SPE
libspe2LibSPE Cell, . : PPE, SPE,PPE- SPE-,SPE- callback- PPE-.
libspe2: Hello, World! PPE (ppu_prog.c)#include extern spe_program_handle_t spu_hello;int main (){ unsigned int entry = SPE_DEFAULT_ENTRY; spe_context_ptr_t spe;
spe = spe_context_create (0, NULL); spe_program_load (spe, &spu_hello); spe_context_run (spe, &entry, 0, (void *) 10, (void *) 20, NULL); spe_context_destroy (spe);
return 0;}
SPE (spu_prog.c)#include int main (unsigned long long spe, unsigned long long argp, unsigned long long envp) { printf("Hello, World! (%llu,%llu)\n", argp, envp); return 0;}
libspe2: Hello, World! Cell:main(){ }main(){ }spu_prog.cspu-gcc -o spu_prog spu_prog.cspu_progppu_prog.cspu_hello
ppu-embedspu spu_hello spu_prog spu_prog.oppu-gcc -o prog ppu_prog.c spu_prog.o -lspe2spu_prog.oprog SPE PPE+SPE
libspe2: Hello, World! Cell: PPE , SPE.main(){ }thread_func() { }thread_func() { }thread_func() { }pthread_create()main() { }main() { }main() { }SPESPESPErun()PPE
libspe2: Hello, World! PPE #include #include #define NTHREADS 40extern spe_program_handle_t spu_hello;
void *thread_func (void *data){ unsigned int entry = SPE_DEFAULT_ENTRY; spe_context_ptr_t spe; spe = spe_context_create (0,NULL); spe_program_load (spe, &spu_hello); spe_context_run (spe, &entry, 0, (void *)data, (void *)NTHREADS, NULL); spe_context_destroy (spe); return 0;}
int main (){ pthread_t tid [NTHREADS]; unsigned long i; for (i=0;i
Cell SPE
DMA- 16 KB: Get: SPE Put: SPE Mailbox- 32- :SPE in (4)SPE out (1)SPE out interrupt (1) 32- : SPE
DMA- PPE, SPE SPESPESPESPE
DMA- PPE, SPE SPESPESPESPE
DMA- PPE, SPE SPESPESPESPE
DMA- 16 . 1,2,4,8,16 16*N . DMA- 5- DMA-. DMA-.
DMA- , get put:Barrier:getb, putbFence:getf, putf
libspe 2.0: DMA SPE
// GET: PPE SPEvoid get (void *dest_lsa, unsigned long long sour_ea, unsigned long size){ int tag=mfc_tag_reserve(), mask=1
Mailbox: 4-
PPESPEspe_in_mbox_write()spu_read_in_mbox()spe_out_mbox_read()spu_write_out_mbox()spe_out_mbox_status()spe_in_mbox_status()spu_stat_in_mbox()spu_stat_out_mbox() inout
Mailbox: 4-
Signal: 4- PPESPEspe_in_mbox_write()spu_read_in_mbox()spe_out_mbox_read()spu_write_out_mbox()spe_out_mbox_status()spe_in_mbox_status()spu_stat_in_mbox()spu_stat_out_mbox()PPESPEspe_signal_write()spu_read_signal1()spu_stat_signal1()spu_stat_signal2()spe_signal_write()spu_read_signal2() inoutsig1sig2
libspe 2.0: Ping-pong PPE
while ( spe_out_mbox_status(spe) == 0 );// spe_out_mbox_read(spe ,&data ,1);// data++;// spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, data); // 1
SPE
spu_write_out_mbox(data);// data=spu_read_signal1();// 1
SPE SPE . SPE SPE. SPESPEspe_ls_area_get()
SPE mailbox- SPE ./ SPE SPE. SPESPEspe_ps_area_get()putget
Cell SPE
SPE SPE 16 . . . , , :
SPE : 256 KB 16 : , 16 ,
. :unsigned char buffer[1024] __attribute__ ((aligned(16))); :vector float vf1 = { 1.0, 2.0, 3.0, 4.0 };vector float vf2[2] = {{1.0, 2.0, 3.0, 4.0}, {5.0, 6.0, 7.0, 8.0}};vector float vf3[2] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};
intrinsics : (specific) ,: d = si_to_int(a); (generic) ,: c = spu_add (a, b); (built-ins) intrinsics ( )., DMA-.
:d = spu_add(a,b);d = spu_sub(a, b);d = spu_madd(a, b, c);d = spu_mul(a, b); :d = spu_and(a, b);d = spu_or(a, b);d = spu_eqv(a, b); : simdmath.
:d = spu_insert(s, v, n);d = spu_splats(s);d = spu_promote(s, n);s = spu_extract(v, n); :d = spu_convtf(a, scale);d = spu_convts(a, scale);d = spu_extend(a);
:, d = spu_rl(a, count);d = spu_sl(a, count); d = spu_sel(a, b, pattern);d = spu_shuffle(a, b, pattern);
: void mulv (float *a, float *b, float *c, int n){ int i, j, k; vector float *bv = (vector float *) b; vector float *cv = (vector float *) c; vector float s, t;
s = spu_splats(0.0); for (i=0; i
:
, , , , , , , , . !
Cell SPE
Cell IBM (IBM Cell SDK) libspe 2.0 : SIMD Math Library, MASS Library, FFT, Game math, Image Processing, Matrix, Vector, Multi-precision math, BLAS, LAPACK, Monte-CarloSync Library: , , , Software managed cache (OpenMP): xlc, xlfDaCS Data Communication and Synchronization library ALF Accelerated Library Framework
BSC Cell SuperscalarMercury Computer Systems: MultiCore FrameworkRapidMindGedae
IBM Cell Broadband Engine resource center, http://www.ibm.com/developerworks/power/cell/documents.htmlCell Developer's Corner, http://www.power.org/resources/devcorner/cellcornerSTI Center of Competence for the Cell Broadband Engine Processor, http://sti.cc.gatech.eduBarcelona Supercomputing Center: Cell Superscalar, www.bsc.es/cellsuperscalarRapidMind Development Platform, www.rapidmind.netMercury Computer Systems: MultiCore Framework, http://www.mc.com/software/multicore_framework.aspx
Recommended