Real-time Signal Processing on Embedded Systems
Advanced Cutting-edge Research Seminar I&III
Practical Applications
Pedestrian Detection FPGA-based system
Pedestrian Tracking GPU-based system
Hardware Architecture forHigh-Accuracy Real-Time Pedestrian Detection with CoHOG Features
Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture
Parallel execution Merging histogram calculation and SVM
prediction FPGA implementation Conclusion
Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture
Parallel execution Merging histogram calculation and SVM
prediction FPGA implementation Conclusion
Pedestrian detection on automotive systems
Challenges: Various appearances of pedestrians
…Clothes’ shape and color, pose, etc. Template-base or simple gradient-base method does
not perform high-accuracy recognition
Viewpoint movement…all objects in an image are moving Background subtraction or
frame subtraction cannot be usedA robust recognition method
suitable for pedestrians is required
Pedestrian detection algorithms
Recent trend: Combination of gradients and histograms
Gradient: robust for illumination and color change Histogram: robust for deformation
Examples Histograms of oriented gradients (HOG)
Co-occurrence histograms of oriented gradients (CoHOG)* HOG-based method Using pairs of oriented gradients
One of today’s best algorithms for pedestrian detection However, Real-time execution is difficult to be achieved by
software implementation(e.g. a few seconds are required for processing on a 320x240 image)
* T. Watanabe, S. Ito, and K. Yokoi, “Co-occurrence histograms of oriented gradients for pedestrian detection,” PSIVT2009
Specialized hardware for real-time processing
Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture
Parallel execution Merging histogram calculation and SVM
prediction FPGA implementation Conclusion
Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture
Parallel execution Merging histogram calculation and SVM
prediction FPGA implementation Conclusion
Pedestrian detection using CoHOG
Classified by SVM
Calculate gradient orientations
Pick up pairwise pixels
Divide into small regions (BLOCKS)
Calculateco-occurrence histograms
Repeat for various positions of pixel pairs(called as OFFSETS)
Gradient orientations
Offset 1
Offset 2
CoHOG feature vector
Variations of offsets(31 offsets)
Co-occurrence histogram of oriented
gradients
Detection procedure
Sliding window approach Feature vectors are extracted in a
scan line order. Image size or window size is scaled to
detect pedestrians in another scale.
Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture
Parallel execution Merging histogram calculation and SVM
prediction FPGA implementation Conclusion
Parallel execution ofCoHOG feature calculation
Large number of co-occurrence histograms must be calculated → All histograms can be calculated in parallel
Offsets 31 parallel threads
Blocks Horizontal : 6 parallel threads Vertical: 12 parallel threads
Large parallelism
We execute31 parallel offsets and6 horizontal block-threads=186 parallel threads
Offset variations: 31
Block number: 6x12=72
Processing performance is drastically improved!
Merging histogram calculation and SVM prediction
Dimensions of CoHOG feature vector is very high 64×31offsets×72blocks=about 140k dimensions Large memory is required to store the feature vector Many multiplications must be executed during
SVM prediction f(x)=sign(w・ x+b)
Our proposal:Execute histogram calculationand SVM prediction simultaneously
Matrix size: 8x8=64
Block number: 6x12=72
Offset variations: 31
Merging histogram calculation and SVM prediction
Straightforward approach
+1+
1+1
×wi,j ×wi,j ×wi,j×wi,j
+
Scan image
+1 to a corresponding bin
Histogram is generated
Inner product is calculated for SVM prediction
Weighting vector values
image,
,
otherwise,0
)( are nsorientatio if,1
)(
i,jx
x
ji
jix
i
j
i j
ji
jiji xw,
,, )(xw
Histogram calculation
SVM prediction
Merging histogram calculation and SVM prediction
Proposed method
+wi,j
+wi,j
+wi,j
+
Scan image
Directly accumulate weighting vector values
Circuit size can be drastically reduced!
Large memory to store histograms and many multipliers for SVM prediction are unnecessary
i j
ji
ji
jiji
i,jw
i,jw
, image
,
, image,
otherwise,0
)( are nsorientatio if,
otherwise,0
)( are nsorientatio if,1
image,
,
otherwise,0
)( are nsorientatio if,1
)(
i,jx
x
ji
jix
ji
jiji xw,
,, )(xw
Histogram calculation
SVM prediction
Proposed architecture
Input image
Line buffers
Gradient orientation image generator
Sobel filter (horizontal)
Sobel filter (vertical)
Orientation classifier
Frame buffer
WxH
Controller
Sub-window data
Combined module forhistogram calculation and SVM prediction
Shift registers
Accumulator Result
s
6 blocks
31 offsets
Weighting vector ROMs
Proposed architecture
Parallel execution 31 offsets×6 blocks
= 186 parallel threads Merging histogram calculation and
SVM prediction No histogram memory and multipliers Only weighting vector ROMs and an
accumulator
Input image
Line buffers
Gradient orientation image generator
Sobel filter (horizontal)
Sobel filter (vertical)
Orientation classifier
Frame buffer
WxH
Controller
Sub-window data
Combined module forhistogram calculation and SVM prediction
Shift registers
Accumulator Result
s
6 blocks
31 offsets
Weighting vector ROMs
Efficient hardware architecture is successfully designed by using proposed
methods
Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture
Parallel execution Merging histogram calculation and SVM
prediction FPGA implementation Conclusion
FPGA implementation Implementation result
Target FPGA: Xilinx Virtex-5 XC5VLS330T-2
Device name Used Available Utilization
Number of Slice RegistersNumber of Slice LUTsNumber of occupied SlicesNumber of BlockRAMTotal Memory used (KB)Number of DSP48Es
5,98028,4958,580
612,196
2
207,360207,36051,840
32411,664
192
2%13%16%18%18%1%
Max delay: 5.997ns (Max frequency: 167MHz)
Our system can process139,166 sub-windows / second
Intel Core i7 3.2GHz:about 1,100 sub-windows / second More than 100 times
faster!
Capable for real-time processing on 38 fps 320x240 video sequence
20
Pedestrian detection system
FPGA board Receives input images from
host PC, and returns results of pedestrian detection
Xilinx Virtex-5 FPGA LX330T PCI Express endpoint DDR2 memory
Host PC Transfers images captured
by a camera, and displays detection results
CPU: Intel Core i7 3.2GHz Camera: USB webcam
(640x480 resolution)
PCI Express
Detection result
Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture
Parallel execution Merging histogram calculation and SVM
prediction FPGA implementation Conclusion
Conclusion High-performance and efficient
hardware architecture for CoHOG-based pedestrian detection is proposed Effectively exploits parallelism in CoHOG
algorithm→ 186 parallel processing is realized
Drastically reduces circuit area (memory and multipliers) by proposing simultaneous execution of histogram calculation and SVM prediction
Achieves more than 100 times faster processing by FPGA implementation than CPU→ Capable for real-time processing on 38 fps 320x240 video sequence
Parallel Implementation of Pedestrian Tracking Using Multiple Cues on GPGPU
Outline
Introduction Pedestrian Tracking using Multiple
Cues Parallel Implementation on NVIDIA
GPU Conclusion
Outline
Introduction Pedestrian Tracking using Multiple
Cues Parallel Implementation on NVIDIA
GPU Conclusion
Introduction
Pedestrian recognition Detection Tracking
Detection Tracking
Combination of 2 steps
Scan entire image
Track the pedestrians over
the frames
Input image
Introduction
Pedestrian Tracking Particle Filter
HSV color histogram (K. Okuma et.al., ECCV2004)
Simple background Complex background
HSV histogram within the rectangle
Succeed to track Fail to track
Introduction
Color information
HSV histogram HSV histogram
Shape information
Combining both color and shape information
Red shirt
Red carGray
gnd. Gray gnd.
Introduction
The contributions of this paper New pedestrian tracking algorithm
using both color and shape information based on particle filters
Parallel implementation on GPGPU for real-time processing
Outline
Introduction Pedestrian Tracking using Multiple
Cues Parallel Implementation on NVIDIA
GPU Conclusion
Particle Filter (pedestrian tracking)
Current frame (time t-1)
Particle
Prediction
MeasurementRe-sampling (time t)
Scatter particles
Measure the pedestrian likelihood
Eliminate low likelihood particles and replicate high likelihood particles.
Particle Filter (pedestrian tracking)
Current frame
Particle
Prediction
MeasurementRe-sampling
To define pedestrian likelihood,we useShape information…HOG featureColor information…HSV histogram
Histograms of Oriented Gradients
Represent object shape information
HOG Feature space
Pedestrian
Non-pedestrian
Discriminant borderHOG
Calculate gradient orientation
Aggregate gradient orientation of each block
Map the vector on the feature space
Learn beforehand by SVM
HSV Histogram
HSV color space
Represent object color information Convert an input image into a
HSV image Calculate a HSV hist. Calculate a Bhattacharyya dist.
Input imageHSV histogram HSV feature space
Reference HSV hist.
Bhattacharyya distance
HSV
Hue
Saturation
Value
Pedestrian tracking using multiple cues
HOG feature space
Pedestrian
Non-pedestrian
MeasurementPrediction
HSV feature space
Reference HSV hist.
HSVHOG
Pedestrian likelihood
)()1()( HSVHOG gccf
Weighted coefficient [0,1]
Existing algorithm
Tracking results
HOG+HSV (our proposed algorithm)
HSV only (K. Okuma et.al., ECCV2004)
HOG only
Outline
Introduction Pedestrian Tracking using Multiple
Cues Parallel Implementation on NVIDIA
GPU Conclusion
NVIDIA GPU architecture
Streaming multiprocessors (SM)
32-bit scalar processors (SP)
Shared memory Read only cache Device memoryIn case of Tesla C1060,•4GB Device memory•30 streaming multiprocessors (total 240 SPs)•1.3 GHz processor clock
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Device memory
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Implementation strategy
Run measurement process on GPU. Almost 99% computation time
Current frame Prediction
Re-sampling
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Device memory
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Measurement
Implementation strategy
Allocate each particle on SM Independent process of each particle
Current frame Prediction
Re-sampling
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Device memory
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Measurement
Implementation strategy
Exploit pixel level parallelism on SPs Sync. among SPs is fast.
Current frame Prediction
Re-sampling
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Device memory
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
SP SP
SP SP
SP SP
SP SP
Shrd mem
Cache
SM
Measurement
HSV likelihood calculation
Input imageHSV histogram HSV feature space
Reference HSV hist.
Bhattacharyya distance
HSV
Allocate each particle calculation to the SM
Sum all the histogramsCalculate the
Bhattacharyya dist.
Transfer the results to the CPU memory
Calculate HSV histogram on SPs per line
HOG likelihood calculation
HOG Feature space
Pedestrian
Non-pedestrian
Discriminant borderHOG
Allocate each particle calculation to the SM
Calculate grad. and angle on SPs
Calculate the distance to the discriminant border
Sum histograms
Calculate HOG histogram on SPs per some pixels
Transfer the results to the CPU memory
Processing time
GPU: NVIDIA Tesla C1060 Number of multiprocessors: 30 Total number of scalar processors:
240 Comparing Intel Core i7 965 @ 3.2
GHz
Core i7 Tesla C1060
0
20
40
60
80
100
120
140
processing time per frame[ms]
13.9 times faster
113.6 fps
Outline
Introduction Pedestrian Tracking using Multiple
Cues Parallel Implementation on NVIDIA
GPU Conclusion
Conclusion
Pedestrian tracking algorithm using HSV and HOG features is proposed
Real-time processing can be achieved by the parallel implementation using NVIDIA GPU
Report subject (not mandatory)
What do you think about the advance of signal processing on embedded systems in the future? Please submit the report by email to
[email protected]. Please write your student ID and
name. Deadline: Feb 3rd 17:00
レポート課題 ( 必須ではない )
組込みシステムにおける信号処理の今後について自由に述べよ ( 応用でも、やりたいことでも何でも OK) 提出先 [email protected] ID と名前をメール本文に明記すること。 締切 2/3 17:00