Upload
sendtomerlin4u
View
38
Download
0
Embed Size (px)
Citation preview
Comparison of Dense Stereo Matching Metrics for
Real Time Applications
Abstract— Stereo Matching is one of the classical problems
in computer vision. The stereo matching problem is to
compute the disparity map for the reference image using
two or more images of the same scene. This work is
particularly interested in local stereo matching methods,
which generally have low computation complexity and less
storage requirement; and therefore they are suitable for
real-time and embedded implementations. The class of
algorithms which has been selected among several is the
class of correlation based stereo algorithms because they
are the only ones that can produce sufficiently dense range
maps with an algorithmic structure which lends itself
nicely to fast implementations because of the simplicity of
the underlying computation. The proposed work tries to
compare various block matching similarity measures like
Sum of Absolute Difference (SAD), Sum of Squared
Difference (SSD) and Normalized Cross-Correlation
(NCC) for calculating depth maps. The result shows that
NCC provides a close match to ground truth by reducing
error and noises when compared to SAD and SSD.
Index Terms — Disparity Map, Epipolar Constraint, Stereo
Correspondence, Stereo Vision.
I. INTRODUCTION
HE word "stereo" comes from the Greek word "stereos"
which means firm or solid. With stereo vision you see an
object as solid in three spatial dimensions width, height and
depth--or x, y and z. It is the added perception of the depth
dimension that makes stereo vision so rich and special. Stereo
matching has been, and continues to be one of the most active
research topics in computer vision. The task of stereo
matching algorithm is to analyse the images taken from a
stereo camera pair, and to estimate the displacement of
corresponding points existing in both images in order to
extract depth information (inversely proportional to the pixel
displacement) of objects in the scene. The displacement is
measured in number of pixels and also called Disparity;
disparity values normally lie within a certain range, the
Disparity Range, and disparities of all the image pixels form
the disparity map, which is the output of a stereo matching
process. An example with the Teddy benchmark image set is
shown in Figure 1. In the figure, the disparities are visualized
as gray scale intensities, and the brighter the grayscale, the
closer (to the stereo cameras) the object. Therefore the
disparity map encodes the depth information of each pixel, and
once we infer the depth information by means of stereo
matching, we are able to obtain the 3D information and
reconstruct the 3D scene using triangulation. Since stereo
matching provides depth information, it has great potential
uses in 3D reconstruction, stereoscopic TV, navigation
systems, virtual reality and so on.
a) b) c)
Fig. 1 An Example for Disparity Map (a) Image taken by the left camera. (b)
Image taken by the right camera. (c) The ground truth disparity map associated
with the left image.
Many stereo algorithms make use of the epipolar constraint,
meaning that for a pixel in the left image the corresponding
point in the right image lies on the same horizontal line, the
epipolar line. This strong constraint is used to reduce the
search space of the correspondence algorithms that calculates
depth maps.
In the past two decades, various stereo matching algorithms
have been proposed and they were summarized and evaluated
by Scharstein and Szeliski [1]. In his notable work, these
proposed stereo matching algorithms are categorized into two
major types: local area based methods and global optimization
based methods. In local methods, the disparity evaluation at a
given pixel is based on similarity measurement performed in a
finite window. The similarity metric is defined by a matching
cost and the all cost in the local window is often aggregated to
provide a more reliable and robust result. On the other hand,
global methods define global cost functions and solve an
optimization problem. Global algorithms typically do not
perform an aggregation step, but rather seek a disparity
assignment that minimizes a global cost function.
In this work we are particularly interested in local stereo
matching methods, which generally have low computation
complexity and less storage requirement; and therefore they
are suitable for real-time and embedded implementations.
Merlin George, Student Member, IEEE, and Rejimol Robinson R.R
T
II. BLOCK MATCHING
The block matching method is one of the most popular local
methods because of its simplicity in implementation. The basic
idea of block matching for stereo correspondence is as follows:
to estimate the disparity of a point in the left image, we define
a reference block surrounding this point; and then, find the
closest matched block, within a search range in the right image,
using a pre-specified matching criterion; thus, the relative
displacement between the reference block and the closest
matched block constitutes the disparity of the point being
evaluated. In this work, matching criteria used for comparison
are the Sum of Absolute Differences (SAD), the Sum of
Squared Differences (SSD) and the Normalized Cross-
Correlation (NCC).
Normalized Cross-Correlation (NCC) is the standard
statistical method for determining similarity. Its normalization,
both in the mean and the variance, makes it relatively
insensitive to radiometric gain and bias. The sum of squared
differences (SSD) metric is computationally simpler than
cross-correlation, and it can be normalized as well. In addition
to NCC and SSD, many variations of each with different
normalization schemes have been used. One popular example
is the sum of absolute differences (SAD), which is often used
for computational efficiency [3].
III. MATCHING METRICS
The proposed work tries to compare various block matching
similarity measures like Sum of Absolute Difference (SAD),
Sum of Squared Difference (SSD) and Normalized Cross-
Correlation (NCC) for calculating depth maps. These are
shown in the Table 1.
A. Sum of Absolute Differences(SAD)
Sum of Absolute Differences (SAD) is one of the simplest
of the similarity measures which is calculated by subtracting
pixels within a square neighbourhood between the reference
image I1 and the target image I2 followed by the aggregation
of absolute differences within the square window, and
optimization with the winner-take-all (WTA) strategy [1]. If
the left and right images exactly match, the resultant will be
zero.
B. Sum of Squared Differences(SSD)
In Sum of Squared Differences (SSD), the differences are
squared and aggregated within a square window and later
optimized by WTA strategy. This measure has a higher
computational complexity compared to SAD algorithm as it
involves numerous multiplication operations.
TABLE I
BLOCK MATCHING METRICS USED FOR COMPARISON
Match Metric Definition
Sum of Absolute
Differences(SAD)
SAD(x,y,d) = , - -d, |
Sum of Squared
Differences(SSD)
SSD(x,y,d) = , - -d,2
Normalized Cross-
Correlation(NCC) NCC(x,y,d) =
– –
C. Normalized Cross-Correlation(NCC)
Normalized Cross Correlation is even more complex to both
SAD and SSD algorithms as it involves numerous
multiplication, division and square root operations. But the
result shows that it gives the best disparity map compared to
SAD and SSD.
IV. RESULTS AND DISCUSSIONS
In this section, we present some experimental results on
teddy stereo pairs with ground truth from the Middlebury
Stereo Vision page. In this work, teddy stereo image pair was
taken for the study because it is rich in depth discontinuity.
Sum of Absolute Differences (SAD) is easier and faster to
compute than Sum of Squared Differences (SSD) and
Normalized Cross-Correlation (NCC). But from table II it is
noted that Normalized Cross-Correlation (NCC) gives more
accurate disparity map when compared to Sum of Absolute
Differences (SAD) and Sum of Squared Differences (SSD).
Also Normalized Cross-Correlation (NCC) reduces the error
and noise of the disparity map since the calculation averages
the noise of each pixel. Error has been calculated for different
window sizes. It is clear from table III that Normalized Cross-
Correlation (NCC) provides a close match to ground truth by
reducing the noises created in Sum of Absolute Differences
(SAD) and Sum of Squared Differences (SSD).
TABLE II
COMPARATIVE PERFORMANCE OF ALGORITHMS ON TEDDY STEREO IMAGE
PAIR
TABLE III
DISPARITY MAP COMPARISON OF TEDDY STEREO IMAGE PAIR
Method Disparity Map
3x3 5x5 7x7
SAD
Image Method Error
Teddy
Window Size
3x3 5x5 7x7
SAD 4.3420e+
004
4.3151e+
004
4.3029e+
004
SSD 4.3286e+
004
4.3076e+
004
4.2977e+
004
NCC 4.2908e+
004
4.2502e+
004
4.2398e+
004
SSD
NCC
V. CONCLUSIONS AND FUTURE WORK
In general, SAD is easier to compute and is less sensitive to
outliers than other measures. Stereo by SAD correlation has
proven a robust and reliable tool in moderately complex
environments. In this work it is proved that Normalized Cross-
Correlation (NCC) provides a close match to ground truth and
also the error computed is much less when compared to Sum
of Absolute Differences (SAD) and Sum of Squared
Differences (SSD). But the computing time taken by NCC is
much higher than SAD and SSD. So our future work in this
area is to develop an efficient NCC-based stereo matching
algorithm which works faster than conventional Normalized
Cross Correlation (NCC).
REFERENCES
[1] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-
frame stereo correspondence algorithms. International journal of
computer vision, 47(1):7-42,2002.
[2] Daniel Scharstein, Richard SZeliski, ―A taxonomy and evaluation of
dense two-frame stereo correspondence algorithms,‖International
Journal of Computer Vision,vol. 47,no.1,pp.7–42,2002.
[3] Myron Z. Brown, Darius Burschka, and Gregory D. Hager, ―Advances
in computational stereo,‖ IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 25, no. 8, pp.993–1001,2003. (2002)
[4] E. Salari and J. Strong. On the reliability of correlation based stereo
matching. In IEEE Int. Conf. on Systems Engineering,pages 559–561,
1990.
[5] T. Kanade and M. Okutomi: A Stereo Matching Algorithm with an
Adaptive Window: Theory and Experiments, PAMI, vol. 16, no. 9
(1994) 920-932.
[6] S.T. Barnard and M.A. Fischler, ―Computational Stereo,‖ ACM
Computing Surveys, vol. 14, pp. 553-572, 1982
[7] Birchfield and C. Tomasi, ―Depth Discontinuities by Pixel-to-Pixel
Stereo,‖ Technical Report STAN-CS-TR-96-1573, Stanford Univ.,
1996.
[8] O. Faugeras, B. Hotz, H. Matthieu, T. Vieville, Z. Zhang, P. Fua,
E.Theron, L. Moll, G. Berry, J. Vuillemin, P. Bertin, and C. Proy,―Real
Time Correlation-Based Stereo: Algorithm, Implementations and
Applications,‖ INRIA Technical Report 2013, 1993.
[9] S. Birchfield and C. Tomasi, ―Depth Discontinuities by Pixel-to-Pixel
Stereo,‖ Proc. IEEE Int’l Conf. Computer Vision, pp. 1073-1080,1998.
[10] http://vision.middlebury.edu/stereo/data/...
Merlin George received the B.Tech degree in Computer Science and
Engineering from M.G University, Kottayam, in
2006 and now an M.Tech student in computer
Science and Engineering at Kerala University,
Thiruvananthapuram. Her field of interests include
stereo matching, 3D reconstruction, and
computational photography. She is a student member
of the IEEE.
Rejimol Robinson R.R received B.Tech degree in Computer Science and
Engineering from the University of Kerala in 1999
and M.Tech in Computer Science with specialization
in Digital Image Computing from the same
university in the year 2007.She is currently working
as a Senior Lecturer in Computer Science and
Engineering of the University of Kerala. Her research
interest area include Digital Image Processing,
Pattern Recognition, Network Security, Intrusion
Detection System