Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Accelerated Multiple Region Evaluation for
Human Motion Tracking
David Concha, Raúl Cabido, Antonio S. Montemayor, Juan José Pantrigo {david.concha, raul.cabido, antonio.sanz, juanjose.pantrigo}@urjc.es
http://www.gavab.es/capo
In this work we present a study about different NVIDIA CUDA
approaches to the problem of the evaluation of a region of interesting
(ROI) pixels in an image. This problem is usually integrated as part of
other higher level methods, such as image retargeting, completion,
video summarization, object detection, visual tracking, etc. Because
of these problems evaluate thousands of ROIs, in many cases
performance is usually far from being interactive.
In a visual tracking context, interesting pixels of an image (target
candidates) are usually recovered after a segmentation process. In
order to track these targets a temporal estimation filter is used
evaluating the binary image through these ROIs. This evaluation is
usually responsible of a high percentage of the computational cost of
the overall tracking method. In a general case, ROIs can be
represented as translated, rotated and scaled bounding boxes of
different sizes, presenting complex memory access patterns (many
pixels, bad memory alignment, etc.). Therefore, some approaches to
reduce computation have gained popularity (like the Integral Image
for axis-aligned ROIs [Viola 2004]) although none for general ROIs
and taking into account different technologies of the GPU
architecture.
1. Introduction 2. Multiple ROI evaluation on GPU
3. Study case
2.1. OpenGL+Cg
ROIs weights are computed by creating a grid of quad
primitives (one for each ROI). The rasterizer returns the
interpolated coordinates and fetches the ROIs
automatically. Then, a multipass 2D reduction is applied
retrieving the weight of each ROI.
Pros: exploit hardware interpolation capabilities
Cons: features added in CUDA compute devices are not exploited
2.3 CUDA+CUDPP/Thrust
Given a ROI configuration each CUDA thread computes a
ROI texture coordinate, fetches a ROI texel and writes the
value into global memory. After the kernel execution, ROIs
are coded in aligned memory, and ROIs evaluation are
done using CUDPP or Thrust primitives without any
previous data rearrangement.
Pros: exploit CUDA features and hardware interpolation capabilities. Data
rearrangement is not required.
Cons: many memory accesses required
2.2. OpenGL+CUDPP/Thrust
ROIs are rendered using OpenGL API, stored in texture
memory and mapped into the address space of CUDA.
Weighting is done using CUDPP/Thrust reduction
algorithm primitives.
Pros: exploit CUDA features and hardware interpolation capabilities
Cons: ROIs are not stored in aligned memory, a previous sort operation
has to be applied and thus including a penalty factor.
2.4 CUDA
This approach reduce the number of memory accesses by
fetching ROIs and computing their weights using only one
kernel. Thus, each thread is responsible of computing a
single weight. Reading a ROI configuration from global
memory, all texture coordinates are calculated. A global
operation like sum-reduction can be computed by
accumulating values stored in these positions in a register.
Pros: exploit hardware interpolation and CUDA features, memory aligned,
reduce number of memory accesses.
Cons: fewer number of ROIs to compute low ocuppancy.
As an application of multiple ROI evaluation on GPU, we tackle an
eight DOFs visual tracking problem. In a particle framework the
number of required particles grows with the size of the state-space.
Then, high-dimensional problems like human motion tracking require
the evaluation of a large number of rotated and scaled segments to
keep the target tracked.
A commodity 2008 GPU (Geforce GTX260) can process up to 70
frames per second, evaluating more than 24k different sized and
rotated segments for 640x480 video resolutions.
Interoperability
Sort
ROIs mapped into the address space of CUDA
ROIs stored in contiguous memory blocks
…
…
OpenGL rendered ROIs
… …
Global memory
CUDA threads
Memory writes
ROI 1 ROI 2
Input frame
Texture fetches
[Viola 2004] P. Viola and M. J. Jones. Robust Real-Time Face Detection.
International Journal of Computer Vision, 57(2):137-154, 2004
This research was partially supported by the Spanish Ministry of Education and Science CICYT TIN2011-28151and the NVIDIA Professor Partnership Program.