Download pptx - Real-time Signal Processing on Embedded Systems Advanced Cutting-edge Research Seminar I&III

Real-time Signal Processing on Embedded Systems

Advanced Cutting-edge Research Seminar I&III

Practical Applications

Pedestrian Detection FPGA-based system

Pedestrian Tracking GPU-based system

Hardware Architecture forHigh-Accuracy Real-Time Pedestrian Detection with CoHOG Features

Outline Introduction Pedestrian detection using CoHOG features Proposed hardware architecture

Parallel execution Merging histogram calculation and SVM

prediction FPGA implementation Conclusion




Pedestrian detection on automotive systems

Challenges: Various appearances of pedestrians

…Clothes’ shape and color, pose, etc. Template-base or simple gradient-base method does

not perform high-accuracy recognition

Viewpoint movement…all objects in an image are moving Background subtraction or

frame subtraction cannot be usedA robust recognition method

suitable for pedestrians is required

Pedestrian detection algorithms

Recent trend: Combination of gradients and histograms

Gradient: robust for illumination and color change Histogram: robust for deformation

Examples Histograms of oriented gradients (HOG)

Co-occurrence histograms of oriented gradients (CoHOG)* HOG-based method Using pairs of oriented gradients

One of today’s best algorithms for pedestrian detection However, Real-time execution is difficult to be achieved by

software implementation(e.g. a few seconds are required for processing on a 320x240 image)

* T. Watanabe, S. Ito, and K. Yokoi, “Co-occurrence histograms of oriented gradients for pedestrian detection,” PSIVT2009

Specialized hardware for real-time processing







Pedestrian detection using CoHOG

Classified by SVM

Calculate gradient orientations

Pick up pairwise pixels

Divide into small regions (BLOCKS)

Calculateco-occurrence histograms

Repeat for various positions of pixel pairs(called as OFFSETS)

Gradient orientations

Offset 1

Offset 2

CoHOG feature vector

Variations of offsets(31 offsets)

Co-occurrence histogram of oriented

gradients

Detection procedure

Sliding window approach Feature vectors are extracted in a

scan line order. Image size or window size is scaled to

detect pedestrians in another scale.




Parallel execution ofCoHOG feature calculation

Large number of co-occurrence histograms must be calculated → All histograms can be calculated in parallel

Offsets 31 parallel threads

Blocks Horizontal ： 6 parallel threads Vertical: 12 parallel threads

Large parallelism

We execute31 parallel offsets and6 horizontal block-threads=186 parallel threads

Offset variations: 31

Block number: 6x12=72

Processing performance is drastically improved!

Merging histogram calculation and SVM prediction

Dimensions of CoHOG feature vector is very high 64×31offsets×72blocks=about 140k dimensions Large memory is required to store the feature vector Many multiplications must be executed during

SVM prediction f(x)=sign(w・ x+b)

Our proposal:Execute histogram calculationand SVM prediction simultaneously

Matrix size: 8x8=64

Block number: 6x12=72

Offset variations: 31


Straightforward approach

+1+

1+1

×wi,j ×wi,j ×wi,j×wi,j

+

Scan image

+1 to a corresponding bin

Histogram is generated

Inner product is calculated for SVM prediction

Weighting vector values

image,

,

otherwise,0

)( are nsorientatio if,1

)(

i,jx

x

ji

jix

i

j

i j

ji

jiji xw,

,, )(xw

Histogram calculation

SVM prediction


Proposed method

+wi,j

+wi,j

+wi,j

+

Scan image

Directly accumulate weighting vector values

Circuit size can be drastically reduced!

Large memory to store histograms and many multipliers for SVM prediction are unnecessary

i j

ji

ji

jiji

i,jw

i,jw

, image

,

, image,

otherwise,0

)( are nsorientatio if,

otherwise,0


image,

,

otherwise,0


)(

i,jx

x

ji

jix

ji

jiji xw,

,, )(xw

Histogram calculation

SVM prediction

Proposed architecture

Input image

Line buffers

Gradient orientation image generator

Sobel filter (horizontal)

Sobel filter (vertical)

Orientation classifier

Frame buffer

WxH

Controller

Sub-window data

Combined module forhistogram calculation and SVM prediction

Shift registers

Accumulator Result

s

6 blocks

31 offsets

Weighting vector ROMs

Proposed architecture

Parallel execution 31 offsets×6 blocks

= 186 parallel threads Merging histogram calculation and

SVM prediction No histogram memory and multipliers Only weighting vector ROMs and an

accumulator

Input image

Line buffers

Gradient orientation image generator

Sobel filter (horizontal)

Sobel filter (vertical)

Orientation classifier

Frame buffer

WxH

Controller

Sub-window data

Combined module forhistogram calculation and SVM prediction

Shift registers

Accumulator Result

s

6 blocks

31 offsets

Weighting vector ROMs

Efficient hardware architecture is successfully designed by using proposed

methods




FPGA implementation Implementation result

Target FPGA: Xilinx Virtex-5 XC5VLS330T-2

Device name Used Available Utilization

Number of Slice RegistersNumber of Slice LUTsNumber of occupied SlicesNumber of BlockRAMTotal Memory used (KB)Number of DSP48Es

5,98028,4958,580

612,196

2

207,360207,36051,840

32411,664

192

2%13%16%18%18%1%

Max delay: 5.997ns (Max frequency: 167MHz)

Our system can process139,166 sub-windows / second

Intel Core i7 3.2GHz:about 1,100 sub-windows / second More than 100 times

faster!

Capable for real-time processing on 38 fps 320x240 video sequence

20

Pedestrian detection system

FPGA board Receives input images from

host PC, and returns results of pedestrian detection

Xilinx Virtex-5 FPGA LX330T PCI Express endpoint DDR2 memory

Host PC Transfers images captured

by a camera, and displays detection results

CPU: Intel Core i7 3.2GHz Camera: USB webcam

(640x480 resolution)

PCI Express

Detection result




Conclusion High-performance and efficient

hardware architecture for CoHOG-based pedestrian detection is proposed Effectively exploits parallelism in CoHOG

algorithm→ 186 parallel processing is realized

Drastically reduces circuit area (memory and multipliers) by proposing simultaneous execution of histogram calculation and SVM prediction

Achieves more than 100 times faster processing by FPGA implementation than CPU→ Capable for real-time processing on 38 fps 320x240 video sequence

Parallel Implementation of Pedestrian Tracking Using Multiple Cues on GPGPU

Outline

Introduction Pedestrian Tracking using Multiple

Cues Parallel Implementation on NVIDIA

GPU Conclusion

Outline



GPU Conclusion

Introduction

Pedestrian recognition Detection Tracking

Detection Tracking

Combination of 2 steps

Scan entire image

Track the pedestrians over

the frames

Input image

Introduction

Pedestrian Tracking Particle Filter

HSV color histogram (K. Okuma et.al., ECCV2004)

Simple background Complex background

HSV histogram within the rectangle

Succeed to track Fail to track

Introduction

Color information

HSV histogram HSV histogram

Shape information

Combining both color and shape information

Red shirt

Red carGray

gnd. Gray gnd.

Introduction

The contributions of this paper New pedestrian tracking algorithm

using both color and shape information based on particle filters

Parallel implementation on GPGPU for real-time processing

Outline



GPU Conclusion

Particle Filter (pedestrian tracking)

Current frame (time t-1)

Particle

Prediction

MeasurementRe-sampling (time t)

Scatter particles

Measure the pedestrian likelihood

Eliminate low likelihood particles and replicate high likelihood particles.

Particle Filter (pedestrian tracking)

Current frame

Particle

Prediction

MeasurementRe-sampling

To define pedestrian likelihood,we useShape information…HOG featureColor information…HSV histogram

Histograms of Oriented Gradients

Represent object shape information

HOG Feature space

Pedestrian

Non-pedestrian

Discriminant borderHOG

Calculate gradient orientation

Aggregate gradient orientation of each block

Map the vector on the feature space

Learn beforehand by SVM

HSV Histogram

HSV color space

Represent object color information Convert an input image into a

HSV image Calculate a HSV hist. Calculate a Bhattacharyya dist.

Input imageHSV histogram HSV feature space

Reference HSV hist.

Bhattacharyya distance

HSV

Hue

Saturation

Value

Pedestrian tracking using multiple cues

HOG feature space

Pedestrian

Non-pedestrian

MeasurementPrediction

HSV feature space

Reference HSV hist.

HSVHOG

Pedestrian likelihood

)()1()( HSVHOG gccf

Weighted coefficient [0,1]

Existing algorithm

Tracking results

HOG+HSV (our proposed algorithm)

HSV only (K. Okuma et.al., ECCV2004)

HOG only

Outline



GPU Conclusion

NVIDIA GPU architecture

Streaming multiprocessors (SM)

32-bit scalar processors (SP)

Shared memory Read only cache Device memoryIn case of Tesla C1060,•4GB Device memory•30 streaming multiprocessors (total 240 SPs)•1.3 GHz processor clock

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Device memory

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Implementation strategy

Run measurement process on GPU. Almost 99% computation time

Current frame Prediction

Re-sampling

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Device memory

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Measurement


Allocate each particle on SM Independent process of each particle


Re-sampling

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Device memory

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Measurement


Exploit pixel level parallelism on SPs Sync. among SPs is fast.


Re-sampling

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Device memory

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

SP SP

SP SP

SP SP

SP SP

Shrd mem

Cache

SM

Measurement

HSV likelihood calculation

Input imageHSV histogram HSV feature space

Reference HSV hist.

Bhattacharyya distance

HSV

Allocate each particle calculation to the SM

Sum all the histogramsCalculate the

Bhattacharyya dist.

Transfer the results to the CPU memory

Calculate HSV histogram on SPs per line

HOG likelihood calculation

HOG Feature space

Pedestrian

Non-pedestrian

Discriminant borderHOG

Allocate each particle calculation to the SM

Calculate grad. and angle on SPs

Calculate the distance to the discriminant border

Sum histograms

Calculate HOG histogram on SPs per some pixels

Transfer the results to the CPU memory

Processing time

GPU: NVIDIA Tesla C1060 Number of multiprocessors: 30 Total number of scalar processors:

240 Comparing Intel Core i7 965 @ 3.2

GHz

Core i7 Tesla C1060

0

20

40

60

80

100

120

140

processing time per frame[ms]

13.9 times faster

113.6 fps

Outline



GPU Conclusion

Conclusion

Pedestrian tracking algorithm using HSV and HOG features is proposed

Real-time processing can be achieved by the parallel implementation using NVIDIA GPU

Report subject (not mandatory)

What do you think about the advance of signal processing on embedded systems in the future? Please submit the report by email to

[email protected]. Please write your student ID and

name. Deadline: Feb 3rd 17:00

mailto:[email protected]

レポート課題 ( 必須ではない )

組込みシステムにおける信号処理の今後について自由に述べよ ( 応用でも、やりたいことでも何でも OK) 提出先 [email protected] ID と名前をメール本文に明記すること。締切 2/3 17:00

mailto:[email protected]