Upload
jocelyn
View
19
Download
0
Embed Size (px)
Citation preview
Thur., Dec 19, 2013
Pin Yi Tsai
WEEKLY REPORT
OUTLINE
• Test about Bank Conflict of Shared Memory• Tesla M2050
Read Write
• Reference
TESLA M2050
• 448 CUDA cores
• Each SM features 32 CUDA processors => 14 SMs
• Number of shared memory banks: 32
• Each bank has a bandwidth of 32 bits every two clock cycles, and successive 32-bit words are assigned to successive banks
READ 10000 TIMES FROM SHARED MEMORY
• With bank conflict [(threadIdx.x)%N+(threadIdx.x%5)*32] 16-way: 610.16 ms 8-way: 610.149 ms 4-way: 609.832 ms 2-way: 609.294 ms
• Without bank conflict 603.944 ms
WRITE 1000 TIMES FROM SHARED MEMORY
• With bank conflict [(threadIdx.x)%N+(threadIdx.x%5)*32] 16-way: 301.033 ms 8-way: 300.821 ms 4-way: 285.678 ms 2-way: 255.378 ms
• Without bank conflict 243.473 ms
REFERENCE
• About Tesla Specification:
http://en.wikipedia.org/wiki/Nvidia_Tesla
• About Fermi Architecture:
http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
• About Bank:
http://docs.nvidia.com/cuda/pdf/CUDA_C_Best_Practices_Guide.pdf
The End