Upload
jocelyn
View
102
Download
2
Embed Size (px)
DESCRIPTION
2013/10/24 meeting at Delta R621
Citation preview
Thur., Oct 24, 2013
Pin Yi Tsai
WEEKLY REPORT
OUTLINE
• Current Work
• Compute Integral Image – parallel version
• Why the difference is so implicit?
• An accidental Error
• In Process
• Compute 11 types of Features
COMPUTE INTEGRAL IMAGE – PARALLEL VERSION
• Computation and communication time
input 16x16:
serial version: 0.006336 ms
for loop outside of kernel function:
parallel version: 6.80778 ms
for loop inside of kernel function:
parallel version: 5.88559e-39 ms
COMPUTE INTEGRAL IMAGE (CONT.)
input 640x480:
serial version: 5.1607 ms
parallel version: 4.94058 ms
WHY THE DIFFERENCE IS SO IMPLICIT?
• Profile:
Time : 4.91024 ms
======== Profiling result:
Time(%) Time Calls Avg Min Max Name
71.71 2.75ms 1 2.75ms 2.75ms 2.75ms computeByColumn(float*, int)
10.91 418.56us 2 209.28us 209.06us 209.50us [CUDA memcpy HtoD]
10.08 386.46us 2 193.23us 191.10us 195.36us [CUDA memcpy DtoH]
7.31 280.22us 1 280.22us 280.22us 280.22us computeByRow(float*, int, int)
Access the inconsistent memory
Memory Access is too time-consuming
AN ACCIDENTAL ERROR
• Occurred when copy data from Mat to one-dimension float array
0 175 175 175 175 175
0 174 174 174 174 174
6.78807e-29 175 175 175 175 175
0 175 175 175 175 175
0 175 175 175 175 175
6.79909e-29 134 151 158 136 142
0 138 132 135 140 135
6.80354e-29 136 136 143 142 137
AN ACCIDENTAL ERROR (CONT.)
• Why?
• memset(ar,0,sizeof(float)*(image1.step+1)*(image1.rows+1));
• The size is not correct.
The End