7
Thur., Dec 19, 2013 Pin Yi Tsai WEEKLY REPORT

20131219

  • Upload
    jocelyn

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 20131219

Thur., Dec 19, 2013

Pin Yi Tsai

WEEKLY REPORT

Page 2: 20131219

OUTLINE

• Test about Bank Conflict of Shared Memory• Tesla M2050

Read Write

• Reference

Page 3: 20131219

TESLA M2050

• 448 CUDA cores

• Each SM features 32 CUDA processors => 14 SMs

• Number of shared memory banks: 32

• Each bank has a bandwidth of 32 bits every two clock cycles, and successive 32-bit words are assigned to successive banks

Page 4: 20131219

READ 10000 TIMES FROM SHARED MEMORY

• With bank conflict [(threadIdx.x)%N+(threadIdx.x%5)*32] 16-way: 610.16 ms 8-way: 610.149 ms 4-way: 609.832 ms 2-way: 609.294 ms

• Without bank conflict 603.944 ms

Page 5: 20131219

WRITE 1000 TIMES FROM SHARED MEMORY

• With bank conflict [(threadIdx.x)%N+(threadIdx.x%5)*32] 16-way: 301.033 ms 8-way: 300.821 ms 4-way: 285.678 ms 2-way: 255.378 ms

• Without bank conflict 243.473 ms

Page 7: 20131219

The End