Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
A COMPUTATIONAL FRAMEWORK FOR PERFORMANCECHARACTERIZATION OF 3-D RECONSTRUCTION
TECHNIQUES FROM SEQUENCE OF IMAGES
By
Ahmed EidM.Sc., EE, Mansoura University, 1999
A DissertationSubmitted to the Faculty of the
Graduate School of the University of Louisvillein Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy
Department of Electrical and Computer EngineeringUniversity of Louisville
Louisville, Kentucky
December 2004
ii
A COMPUTATIONAL FRAMEWORK FOR PERFORMANCECHARACTERIZATION OF 3-D RECONSTRUCTION
TECHNIQUES FROM SEQUENCE OF IMAGES
By
Ahmed EidM.Sc., EE, Mansoura University, 1999
A Dissertation Approved on
by the Following Reading and Examination Committee:
Aly Farag, Ph.D., Dissertation Director
Georgy Gimel’farb, Ph.D.
John Naber, Ph.D.
Peter Quesada, Ph.D.
Udayan Darji, Ph.D.
Xiangqian Liu, Ph.D.
DEDICATION
This dissertation is dedicated to
my mother
and
my wife, Asmaa
for their patience, understanding, and support.
iii
ACKNOWLEDGMENTS
I would first like to thank my advisor, Dr. Aly Farag, for his guidance and support
during this course of study. I am indebted to Dr. Farag, who has had a tremendous influence
on my career as a researcher. He has taught me a lot about effectively conducting research.
Many thanks to my other committee members Dr. Georgy Gimel’farb, from the
University of Auckland, Dr. John Naber, Dr. Peter Quesada, Dr. Udayan Darji, and Dr.
Xiangqian Liu, all of whom carefully read drafts of my dissertation and gave me valuable
comments, suggestions, and corrections. I greatly appreciate their time and flexibility.
Thanks especially to Dr. Gimel’farb for conducting valuable discussions about this research
during his sabbatical leave in the CVIP Laboratory.
I would like to thank all the members of the CVIP Laboratory at the University of
Louisville for their tremendous support, especially Chuck Sites, the system administrator.
Special thanks to Dr. Moumen Ahmed, who has been a great friend and teacher. I would
also like to thank all my friends for making my stay in Louisville enjoyable. Many thanks
to everyone who helped me during the first days of my stay in Louisville.
Finally, I would like to thank my family for their love, support, and confidence
without which this dissertation would not have been possible.
iv
ABSTRACT
A COMPUTATIONAL FRAMEWORK FOR PERFORMANCE CHARACTERIZATION
OF 3-D RECONSTRUCTION TECHNIQUES FROM SEQUENCE OF IMAGES
Ahmed Eid
December 13, 2004
This dissertation addresses the problem of performance characterization of 3-D re-
construction techniques from a sequence of images. Although many 3-D reconstruction
techniques have been proposed in other literature, the work done to quantify their perfor-
mance is quite insufficient from the computational point-of-view. The qualitative evalu-
ation methods are the dominant among all other methods. Most of the current computa-
tional methods are depending on unrealistic data sets, and/or applicable to certain types of
algorithms. This, in turn, has led to the presence of unpopular and limited-use evaluation
approaches. Certainly, this situation does not serve the goal of having standard, on-shelf
methodologies that are able to quantify the performance of existing and future 3-D recon-
struction techniques.
In this dissertation, we try to rectify this situation by proposing a unified computa-
tional framework for performance characterization of 3-D reconstruction techniques. The
framework is three-fold. First, we introduce a new design for an experimental test-bed for
the performance evaluation of 3-D reconstruction techniques. The setup integrates the func-
tionality of 3-D laser scanners and CCD cameras. The setup provides accurate, general-use,
automatically generated and registered dense ground truth data and their corresponding in-
v
tensity data. The system bridges a gap in the evaluation research that is suffering from a
lack of such data sets.
Second, we introduce a new 3-D registration technique dedicated to the evaluation
problem. The 3-D registration is an important pre-evaluation request to get referenced
evaluations. The proposed technique uses the image silhouettes instead of the actual 3-D
reconstruction under-test. This makes the registration results independent of the quality of
the reconstruction under-test. This feature is the major advantage of the proposed registra-
tion technique over the conventional techniques.
Third, we propose different computational evaluation methodologies and corre-
sponding measuring criteria. These testing methodologies are independent of the 3-D re-
construction under-test. The methodologies are applied to the space carving technique as a
common 3-D reconstruction technique to characterize its performance. Several concluding
remarks on the space carving performance are provided.
Applications of the proposed framework other than performance tracking and diag-
nosis, as provided in the space carving case study, include system design and data fusion.
We propose a draft design to a 3-D modeling vision system based on the evaluation pro-
vided for the space carving technique. Moreover, a method for data fusion of laser-based
and camera-based reconstructions is presented.
We believe that presenting this framework to the computer vision community will
help measure the progress in the 3-D modeling research and provide diagnosis tools for the
current and the future 3-D reconstruction techniques. To maximize the benefits from this
work, the data sets used throughout this research will be provided for the public use.
vi
TABLE OF CONTENTS
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiLIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xLIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiNOMENCLATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivCHAPTER
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
A. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2B. 3-D Reconstruction Techniques: An Overview . . . . . . . . . . . . . 3
1. Stereo Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 32. Volumetric Representation Approaches . . . . . . . . . . . . . . 6
a. Shape from Silhouettes Approach . . . . . . . . . . . . . . . 6b. Voxel Coloring (VC) Approach . . . . . . . . . . . . . . . . 7c. Space Carving (SC) Approach . . . . . . . . . . . . . . . . . 7d. Generalized Voxel Coloring (GVC) Approach . . . . . . . . 9
C. The Need for this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 10D. The Contribution of this Work . . . . . . . . . . . . . . . . . . . . . . 12E. The Organization of the Dissertation . . . . . . . . . . . . . . . . . . . 15
II. DATA ACQUISITION AND PREPARATION TECHNIQUES . . . . . . . 16
A. Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16B. System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18C. Background Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . 19D. Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23E. Setup Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28F. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
III. A NOVEL TECHNIQUE FOR 3-D DATA REGISTRATION AS A PRE-
EVALUATION STEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A. 3-D Data Registration . . . . . . . . . . . . . . . . . . . . . . . . . . 33B. 3-D Data Registration Through Silhouettes (RTS) . . . . . . . . . . . . 39
1. An Overview of the Approach . . . . . . . . . . . . . . . . . . . 39
vii
2. The Registration Procedure . . . . . . . . . . . . . . . . . . . . 40a. A Two-step Minimization . . . . . . . . . . . . . . . . . . . 42b. Occluding Contours as Replacements to the Silhouettes . . . 43c. An Evaluation Criterion for the RTS Approach . . . . . . . . 45
C. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 45D. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
IV. PERFORMANCE EVALUATION: METHODOLOGIES AND MEASURES 58
A. Classification of Evaluation Techniques . . . . . . . . . . . . . . . . . 59B. Local Quality Assessment (LQA) Methodology . . . . . . . . . . . . . 61
1. Performance Evaluation Procedure . . . . . . . . . . . . . . . . 622. Statistical Modeling of the Quality Index . . . . . . . . . . . . . 65
C. Image Reprojection (IR) Test . . . . . . . . . . . . . . . . . . . . . . . 691. Image Quality Measures . . . . . . . . . . . . . . . . . . . . . . 692. The IR Test Procedure . . . . . . . . . . . . . . . . . . . . . . . 72
D. Silhouette-Contour Signature (SCS) Test Methodology . . . . . . . . . 721. Shape Histogram Signature . . . . . . . . . . . . . . . . . . . . 752. The Error Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 763. Boundary Signature . . . . . . . . . . . . . . . . . . . . . . . . 76
E. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
V. Experimental Evaluation of the Space Carving Technique: A Case Study . . 87
A. Shape Recovery by Space Carving . . . . . . . . . . . . . . . . . . . . 87B. Experimental Evaluation of Space Carving . . . . . . . . . . . . . . . 88
1. The Effect of the Number of Input Images . . . . . . . . . . . . 89a. LQA Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89b. IR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90c. SCS Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93d. Evaluation Remarks . . . . . . . . . . . . . . . . . . . . . . 93
2. Effect of the Camera Pose . . . . . . . . . . . . . . . . . . . . . 94a. LQA Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95b. IR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95c. SCS Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99d. Evaluation Remarks . . . . . . . . . . . . . . . . . . . . . . 99
3. Effect of the Photo-consistency Threshold . . . . . . . . . . . . . 100a. LQA Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100b. IR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105c. The SCS Test . . . . . . . . . . . . . . . . . . . . . . . . . . 105d. Evaluation Remarks . . . . . . . . . . . . . . . . . . . . . . 106
4. Effect of Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065. Effect of the Initial Volume Resolution . . . . . . . . . . . . . . 108
C. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
viii
VI. APPLICATIONS (POST-EVALUATIONS) . . . . . . . . . . . . . . . . . . 120
A. A 3-D Fusion Methodology . . . . . . . . . . . . . . . . . . . . . . . 1211. The Closest Point Test . . . . . . . . . . . . . . . . . . . . . . . 1222. The Closest Contour Test . . . . . . . . . . . . . . . . . . . . . 1233. The Fusion Decision . . . . . . . . . . . . . . . . . . . . . . . . 1234. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 124
B. System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125C. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
VII. Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . 133
A. Contribution to Data Acquisition and System Design . . . . . . . . . . 133B. Contribution to the 3-D Data Registration . . . . . . . . . . . . . . . . 134C. Contribution to the Performance Evaluation Methodologies and Mea-
suring Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135D. Contribution to the Experimental Evaluation of 3-D Reconstruction
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135E. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136F. Future Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
1. Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . 1372. 3-D Data Registration . . . . . . . . . . . . . . . . . . . . . . . 1373. Testing Methodologies and Measures . . . . . . . . . . . . . . . 1384. The Performance of Space Carving Technique . . . . . . . . . . 138
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139APPENDIX
I. PROJECTIVE GEOMETRY . . . . . . . . . . . . . . . . . . . . . . . . . 144
A. 2-D Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 144a. Points and Lines in P2 . . . . . . . . . . . . . . . . . . . . . 144b. The Projective Plan P2 . . . . . . . . . . . . . . . . . . . . . 146c. 2-D transformations . . . . . . . . . . . . . . . . . . . . . . 146d. Hierarchy of Transformations . . . . . . . . . . . . . . . . . 147
B. 3-D Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 150e. Representation of Points in P3 . . . . . . . . . . . . . . . . . 150f. Representation of Planes in P3 . . . . . . . . . . . . . . . . 150g. Representation of lines in Π1 . . . . . . . . . . . . . . . . . 153h. Plucker matrices . . . . . . . . . . . . . . . . . . . . . . . . 154
II. CAMERA CALIBRATION . . . . . . . . . . . . . . . . . . . . . . . . . . 155
A. Camera Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155B. Anatomy of the Projection Matrix . . . . . . . . . . . . . . . . . . . . 158
CURRICULUM VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
ix
LIST OF TABLES
TABLE . PAGE1. Convergence of the RTS approach to the desired values. . . . . . . . . . . . 482. The effect of the number of input images on the performance of space carving. 903. The final number of voxels and the run time, on ONYX2 SGI machine, for
36-, 18-, 12- and 9-reconstruction. . . . . . . . . . . . . . . . . . . . . . . . 924. The effect of the camera pose on the performance of space carving. . . . . . 995. The effect of the photo-consistency threshold on the performance of space
carving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076. The effect of noise at different thresholds on the performance of space carving.1077. The run time of the space carving algorithm at different resolutions and dif-
ferent numbers of input images. . . . . . . . . . . . . . . . . . . . . . . . . 1098. Initial specifications of a passive 3-D scanner based on the reconstruction by
the space carving technique. . . . . . . . . . . . . . . . . . . . . . . . . . . 1319. Summary of transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . 152
x
LIST OF FIGURES
FIGURE . PAGE1. Simple stereo configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . 42. Stereo images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53. Camera configuration for the voxel coloring approach. . . . . . . . . . . . . 84. Basic idea of space carving. . . . . . . . . . . . . . . . . . . . . . . . . . . 95. Generalized voxel coloring. . . . . . . . . . . . . . . . . . . . . . . . . . . 106. The system setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207. Modes of operation of the setup. . . . . . . . . . . . . . . . . . . . . . . . . 218. Background subtraction versus intensity threshold. . . . . . . . . . . . . . . 249. Background subtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2510. Camera calibration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2611. System accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3212. An example of the registration under distortion. . . . . . . . . . . . . . . . . 3613. Registration errors with different error criteria and matching strategies. . . . 3714. A difficult 3-D data registration case. . . . . . . . . . . . . . . . . . . . . . 3815. A degenerate case for silhouettes alignment. . . . . . . . . . . . . . . . . . . 4416. Registration Through Silhouettes (RTS) Results. . . . . . . . . . . . . . . . 5017. Registration parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5118. 3-D registration visual results. . . . . . . . . . . . . . . . . . . . . . . . . . 5219. 3-D registration quantitative results. . . . . . . . . . . . . . . . . . . . . . . 5320. Rendered views to show the alignment of the ground truth contours (blue)
and the input image contours (red.) . . . . . . . . . . . . . . . . . . . . . . 5421. The convergence of the registration parameters to the desired values. . . . . . 5522. RTS Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5623. Rendered views to show the alignment, with known parameters, of the ground
truth contours (blue) and the image contours (red). . . . . . . . . . . . . . . 5724. Local Quality Assessment (LQA) methodology. . . . . . . . . . . . . . . . . 6425. The LQA test applied to different two reconstructions registered to the ground
truth data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6826. The IR test applied to a reconstruction of 12 input images. . . . . . . . . . . 7327. IR test visual results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7428. An example of the 17 possible configurations describing the shape boundaries. 7729. A number of 17 possible configurations for three adjacent contour points. . . 7830. Examples of basic geometric shapes. . . . . . . . . . . . . . . . . . . . . . . 8131. Shape signatures for the rectangle and the rotated rectangle shapes. . . . . . 8232. Shape signatures for the circle and the ellipse shapes. . . . . . . . . . . . . . 83
xi
33. Examples of ground truth and measured shapes. . . . . . . . . . . . . . . . . 8434. Shape signatures for shapes in Figure 33 a . . . . . . . . . . . . . . . . . . . 8535. Shape signatures for shapes in Figure 33 b . . . . . . . . . . . . . . . . . . . 8636. LQA test results when the number of input images to the space carving is
changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9137. The quality index for two different reconstructions. . . . . . . . . . . . . . . 9238. IR test results when the number of input images to the space carving is changed. 9639. Rendered cutting-views for different reconstructions when the number of
input images to the space carving is changed. . . . . . . . . . . . . . . . . . 9740. SCS test results when the number of input images to the space carving is
changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9841. LQA test results when the camera pose is changed. . . . . . . . . . . . . . . 10142. IR test results when the camera pose is changed. . . . . . . . . . . . . . . . 10243. Rendered cutting-views for different reconstructions when the camera pose
is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10344. SCS test results when the camera pose is changed. . . . . . . . . . . . . . . 10445. LQA test results when the photo-consistency threshold is changed. . . . . . . 11146. IR test results when the photo-consistency threshold is changed. . . . . . . . 11247. Cutting-view images for different reconstructions at different thresholds. . . . 11348. SCS test results when the photo-consistency threshold is changed. . . . . . . 11449. LQA test results when Gaussian noise is added and the photo-consistency
threshold is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11550. IR test results when Gaussian noise is added and the photo-consistency thresh-
old is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11651. Cutting-view images for different reconstructions at different thresholds with
Gaussian noise is added to the input images. . . . . . . . . . . . . . . . . . . 11752. SCS test results when Gaussian noise is added and the photo-consistency
threshold is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11853. The effect of the initial volume resolution on the space reconstruction quality. 11954. Basic idea of the 3-D fusion methodology. . . . . . . . . . . . . . . . . . . . 12555. Screen captures of a 3-D reconstruction by 3-D laser scanner. . . . . . . . . . 12656. Silhouette images for the 3-D fusion technique. . . . . . . . . . . . . . . . . 12757. Contour images for the 3-D fusion technique. . . . . . . . . . . . . . . . . . 12858. Snapshots for 3-D reconstruction. . . . . . . . . . . . . . . . . . . . . . . . 12859. A draft design to a passive 3-D scanner. . . . . . . . . . . . . . . . . . . . . 13060. Top views for the house object . . . . . . . . . . . . . . . . . . . . . . . . . 13161. Another draft design to a passive 3-D scanner. . . . . . . . . . . . . . . . . . 13262. Representation of points and lines in P2. . . . . . . . . . . . . . . . . . . . 14663. The central projection as a planar projectivity. . . . . . . . . . . . . . . . . . 14764. Computing homographes. (a) The ceiling tiles image at CVIP lab (b) the
rectified image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14865. Computing homographes.(a) Kent School at the University of Louisville, (b)
the rectified image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
xii
66. 2-D transformations. (a) original image (b) after isometry (c) after similarity(d) after affinity (e) after projectivity. . . . . . . . . . . . . . . . . . . . . . 151
67. Modeling of a central projection camera . . . . . . . . . . . . . . . . . . . . 15668. The geometrical interpretation of the projection matrix columns . . . . . . . 16069. The geometrical interpretation of the projection matrix rows. . . . . . . . . . 160
xiii
NOMENCLATURE
The following convention is used throughout the dissertation. Matrices, vectors, and
3-D space points are expressed in bold upper case letters, e.g., P. The 2-D image points are
expressed in bold lower case letters, e.g., x. The elements of the matrices, vectors, points,
and functions are expressed in italic letters. Below is a list of symbols commonly used in
this text.
X 3-D world point
x 2-D image point
Pk projection matrix at view k
Pe probability estimate for background subtraction
Pq quality estimate at the quality index q
D camera extrinsic parameters matrix
K camera intrinsic parameters matrix
R rotation matrix (3×3 orthonormal matrix)
t translation vector (3-vector)
β rotation angle around the Y-axis
M measured data set of the 3-D reconstruction under-test
G ground truth data set
T 3-D Euclidean transformation
E error criterion
χ2 Chi square test measure
SNR Signal to Noise Ratio
Q quality index
Er error ratio
xiv
CHAPTER I
INTRODUCTION
In computer vision, the 3-D scene reconstruction from multiple images is a chal-
lenging and interesting problem to tackle. It is interesting because humans naturally solve
the depth estimation problem in easy and efficient ways. It is a challenge because there is
no single solution, of many different solutions proposed to solve the problem, that has the
completeness of the human solution.
Of course, there are presently good methods and there may be others in the future.
To guide the research towards better solutions, it is important to characterize the perfor-
mance of the existing solutions. How good are these solutions? and What is missing in
order to have better solutions? These are among the questions that should be answered
under the performance evaluation topic.
Unfortunately, although the performance evaluation of 3-D reconstruction tech-
niques is an important topic, it is not usually treated in stand alone research. This makes
many of the evaluation methodologies “algorithmic” in nature, i.e. they are not indepen-
dent of the algorithms under test. This, in turn, has led to the presence of unpopular and
limited-use evaluation approaches. Certainly, this situation does not serve the goal of hav-
ing standard, on-shelf methodologies that are able to quantify the performance of existing
and future 3-D reconstruction techniques.
In this study, we rectify this situation by introducing a unified framework for the
performance evaluation of 3-D reconstruction techniques. The framework includes designs
for an experimental test-bed, performance pre-evaluation/evaluation methodologies, and
quality measures. In addition, we propose different applications of this framework in data
1
fusion of 3-D reconstructions and system design. The ultimate goal of this study is to
have an impact on the progress of the field and to give some kind of standardization to the
evaluation process.
A. The Problem
Formally, we want to solve the following problem:
Given
1- a set M as a non-empty finite set of measured points m such that:
M = {m : m ∈ R3}
where M is generated by the 3-D reconstruction technique X; the technique under test.
2- a ground truth (gold standard) set G as a non-empty finite set of reference points g such
that:
G = {g : g ∈ Rn, n ∈ {1, 2, 3}}
G can be 3-D volume, 2-D images or a 1-D vector of parameters that describe a volume or
images.
Required
Quantify the performance of technique X.
If n �= 3, then we define the set D as the set of data points to be matched with G according
to the transformation or criterion C such that:
D = {d : d = C(m), d ∈ Rn, n ∈ {1, 2}}
If M and G are not aligned to each other, then a registration function or transformation T
is required such that the energy E is minimal, where E can be defined as:
E =∑
i
d2(mi ∈ M, T (gi ∈ G))
2
and d denotes the Euclidean distance.
B. 3-D Reconstruction Techniques: An Overview
In this section, we provide an overview of the common 3-D reconstruction tech-
niques. This includes two main sets of 3-D reconstruction techniques, the stereo and the
volumetric techniques.
1. Stereo Approaches
Stereo vision refers to the ability to infer information on the 3-D structure and dis-
tance of a scene from two or more images taken from different views [1]. From the compu-
tational viewpoint, the stereo system has to solve two problems: the correspondence prob-
lem and the reconstruction problem. The correspondence problem is the most challenging
problem in the stereo vision. To solve the problem, the projections of a scene element in
all/subset of images that see this element should be matched. While the matching process
is difficult, the reconstruction can be solved easily using the depth triangulation.
A simple stereo vision configuration is shown in Figure 1. The system consists of
two coplanar cameras with a baseline distance b; the distance between the optical centers
Ol and Or of the given cameras. To solve the correspondence problem, it is required to
match the projections xl and xr, of the 3-D point X in the left image Il and the right im-
age Ir, respectively. In the simple stereo configuration, the problem is easier because the
y-coordinates of the corresponding image points are equal, then the searching domain will
be restricted only to the x-direction to find the matched positions xl and xr. Once xl and xr
are determined, the disparity Disp = xl − xr can be related to the depth Z of point X by:
Disp = fb
Z(1)
3
FIGURE 1 – Simple stereo configuration.
Typically, stereo algorithms represent shape using the depth map, or the disparity
map, which is also known as the 212-D sketch [2]. The depth map is an image that encodes
the above depth-disparity relation. Figure 2 shows an example of a simple stereo pair and
the corresponding depth map [3].
In the general stereo configuration, the image planes are not coplanar which makes
the correspondence problem more difficult than the simple stereo configuration. If no con-
straints are applied to this configuration, then it is necessary to search for the point xr in the
entire right image to be matched with xl. The search space can be limited to a general line
in the right image if the epipolar constraint [4] is applied. Furthermore, image rectification
algorithms such as [5] can be used to parallelize and align the epipolar lines to the x-axis.
This rectification step reduces the problem to a simple stereo configuration problem.
In general, the stereo approaches operate either by edge-feature matching or area-
based matching. Feature-based stereo approaches are useful because they describe the im-
portant geometry of the object. However, the major problem of most of these approaches
is the low density of the output. A dynamic programming approach [4] matches features
to get dense reconstruction, but it can only be applied on a scanline by scanline basis. Re-
cently, Boykov, et al. [6] and Roy and Cox [7] have solved the inter-scanline problem using
4
(a)
(b)
FIGURE 2 – Stereo images. (a) A pair of stereo images and (b) the corresponding depthmap.
5
graph cuts techniques.
In contrast to the feature-based stereo, the area-based stereo provides dense recon-
structions. Okutumi and Kanade [8] used a variable size correlation window to generate
dense depth maps. However, area-based usually fails when applied to surfaces of large
textureless areas.
Another challenging problem for stereo approaches is the occlusion problem, where
scene elements appear in one image but they are occluded in the other image. Another dif-
ficulty with the stereo approaches is the constraints on the baseline b. Small b, very close
cameras, permits good correspondence, but it affects the depth accuracy. On the other hand,
large values of b make the corresponding problem more difficult, while the estimated depth
becomes more accurate. As we will see in the next section, the volumetric approaches
are not suffering from either the occlusion problem or the baseline problem. We refer the
reader to the survey by [9] for more about the stereo approaches.
2. Volumetric Representation Approaches
Volumetric modeling of a scene assumes there is a known, bounded volume in
which the object of interest lies. The most common approach to represent this volume
is as a regular tessellation of cubes, called voxels, in the Euclidean 3-D space.
a. Shape from Silhouettes Approach This method provides an approximate re-
construction of an object from its silhouette images. A silhouette image is a binary image,
with the value at a point is indicating whether or not the visual ray from the optical center
intersects the object surface in the scene. The best approximate to the object is obtainable
for an infinite number of captured silhouettes from all views surrounding the object. The
best approximation is called the visual hull [10]. Recent implementations of the shape from
silhouettes approach uses the concept of voxel projection used in the following volumetric
techniques. Although, the shape from silhouettes provides approximate surface, it has wide
6
applications in modeling human motion [11].
b. Voxel Coloring (VC) Approach Sietz and Dyer [12] have presented a voxel
coloring approach that traverses a discritized volume of voxels and decides the color con-
sistency of each voxel in all views from which it is visible assuming Lambertian surfaces.
The scene is assumed contained in that volume of voxels in which all voxels are initially
opaque. The voxels that have inconsistent colors are considered transparent. The remain-
ing voxels which are color-consistent are still opaque and represent the scene under recon-
struction. This approach has several advantages over the existing stereo approaches: (i) it
accounts for the occlusion problem, (ii) unlike stereo it does not put any constraints on the
base distances of cameras, (iii) it provides dense reconstructions and finally (iv) it provides
synthetic views of photo-realistic quality for many applications of virtual reality [12].
However, to determine the visibility of each voxel, Sietz and Dyer [12] imposed
what they called the the ordinal visibility constraint on the camera locations. This means
that the positions of the cameras are limited to be on one side of the scene. This makes
the visibility check of each voxel easy since the voxel will be visited in one scan of voxels
in planes that are successively further from the cameras. However, this constraint limits
the use of this approach in cases where a complete model from all directions of the scene
is required. Figure 3 shows the camera configuration imposed by the ordinal constraint.
Other configurations under the ordinal constraint are helpful if the cameras are distributed
above the 3-D object to get a complete model.
c. Space Carving (SC) Approach Kutulakos and Seitz [13] rectified the con-
strained camera positions situation in the voxel coloring approach by proposing the space
carving framework. They proposed the multiple plane-sweep procedure to unconstrain the
camera positions. Typically, these sweeps are along the positive and negative directions of
the three axes. Space carving forces the scans to be near to far of the cameras that see the
voxel under the consistency test. The procedure continues until all voxels achieve the photo
7
FIGURE 3 – Camera configuration for the voxel coloring approach.
consistency requirements, see Figure 4. Kutulakos and Seitz proved that the algorithm finds
the unique color consistent model that is a superset of any consistent model. They called
this unique model the photo hull. The space carving procedure is summarized as follows:
Space Carving Algorithm:
Space carving starts with an initial volume, V, that includes the object(s) to be re-
constructed. This 3-D space is then discretized into a finite set of voxels v1, v2, ..., vn. The
idea is to successively carve (remove) some voxels until the final 3-D shape, V*, agrees
with all the input images.
Step 1: Initialize V.
Step 2:
• Determine the set of visible voxels Vis(V) on the surface of V.
• Project each voxel v on Vis(V) to the different images where v is visible.
• Determine the photo-consistency of each voxel v on Vis(V).
Step 3: If all photo-consistent voxels are found, set V* = V and terminate. Otherwise, set V
= V - {non-photo-consistent v’s} and return to Step 2.
8
(a) (b)
FIGURE 4 – Basic idea of space carving. Voxels are projected to the input images usingtheir respective projection matrices. C1, C2 and C3 represent the optical centers of the threecameras. (a) Consistent voxels are assigned the color of their projections. (b) Inconsistentvoxels are removed from the volume.
d. Generalized Voxel Coloring (GVC) Approach While space carving never
carves voxels it should not, it is likely to produce a model that includes some inconsis-
tent voxels. This happens because the space carving uses only subsets of cameras that see
the voxel although other cameras can see the voxel when some of its surrounding vox-
els are carved as shown in Figure 5. In contrast, the generalized voxel coloring approach
(GVC) [14] is guaranteed to retain every voxel in the final model that is checked for color-
consistency by all images that see it.
The GVC works with arbitrary camera positions, that is why it is a generalization
to the voxel coloring approach. In addition, it enhances the performance of SC by check-
ing the visibility of each voxel through all the images that see it. The GVC maintains a
data structure that indicates for each pixel the address of the closest opaque voxel along
the pixel’s visual ray. This data structure can be updated less frequently after each voxel is
carved with a possibility of adding more iterations.
Fougeras and Keviven [15] have presented a volumetric technique based on the level
set approach. An initial scene-bounding surface represented in voxels evolves towards the
objects in the scene until a matching criterion based on normalized cross correlation is min-
imized. Good survey on the volumetric approaches for 3-D reconstruction from multiple
9
(a) (b)
FIGURE 5 – Generalized voxel coloring. (a) A voxel can be unseen to the camera in theearly carving stages, (b) the voxel can be seen in the final stages.
views can be found in [16, 17].
It can be inferred from the above discussion about the 3-D reconstruction techniques
that the volumetric approaches are promising approaches and have many advantages over
the classical stereo approaches.
C. The Need for this Work
The 3-D reconstruction from a sequence of images finds many applications in mod-
ern computer vision systems such as virtual reality, vision-guided surgeries, autonomous
navigation, medical studies and simulations, reverse engineering, and architectural design.
The very basic requirement of these applications is to find accurate and realistic reconstruc-
tions.
Many 3-D reconstruction approaches have been proposed to fulfill the requirements
of such important applications. However, the work done to validate the performance of
these approaches is quite insufficient. The performance of these approaches is either in-
vestigated visually or mostly computed using synthetic data sets. Since reality is a basic
10
requirement of most computer vision applications, real data sets should be used through
the evaluation phase of the 3-D reconstruction techniques. To quantitatively evaluate a
given 3-D reconstruction, dense ground truth data should be available. However generating
dense ground truth data may be difficult or laborious [18]. To get around the difficulty of
generating dense ground truth data, a sparse ground truth data can be used [19]. However,
testing the 3-D reconstructions using such data does not achieve the requirements needed
by applications such as virtual reality.
Szeliski and Zabih [20] used manually generated dense ground truth data in their
experimental comparison of stereo algorithms. However, most of the these data sets are
generated for fronto-parallel scenes. These types of scenes are considered special cases.
Recently, Scharstein and Szeliski [9] have developed such data sets to include more chal-
lenging surfaces for stereo techniques such as slanted surfaces. These data sets [3] are
represented in the depth map form which is a specific form of the simple stereo configu-
ration. This puts a major limitation on using these data sets for other 3-D reconstruction
techniques or even for general stereo configurations.
Mulligan et al. [21] have presented an experimental setup that provides dense ground
truth data for stereo tele-presence applications. This system is considered the first to pro-
vide dense 3-D ground truth data [21]. Although these data sets are generated in the 3-D
form, they have limited applicability (they are only applicable to the stereo approaches).
Pre-evaluation, 3-D registration techniques should be available to align the ground
truth data with the data under-test. Using conventional registration methods, the outcome
of the registration process would be questionable if the measured data set is a corrupted
version of the ground truth data set. Unfortunately, registration of such data types is an
unavoidable step in most evaluation procedures.
One of the common techniques that is used to solve the registration problem is the
Iterative Closest Point (ICP) technique [22]. The algorithm is simple and efficient, however
11
it needs a good initial estimate, otherwise it will be stuck to a local minima. In addition, the
algorithm is sensitive to the statistical outliers [22]. Many techniques have been introduced
in the literature to provide robustness to the ICP method e.g. [23–25]. However, other dis-
tortion models were not treated by these studies.
Other 3-D registration techniques that rely on the selection of distinct features in the
data under registration could be used instead of the ICP approaches [26–29]. However, the
selection and matching of these features are challenging tasks for these techniques when
corrupted data sets are manipulated. Manual selection and matching of features could be a
solution under these circumstances as in [21].
In general, to solve the evaluation problem in a unified framework three main com-
ponents should be available: (i) an experimental testbed to provide general-use data sets,
(ii) pre-evaluation techniques for preparing data for the evaluation process with minimal
undesirable effects on the given data, and (iii) performance evaluation methodologies and
measures. Having these components within a unified framework could ease the solution
and avoid unnecessary complexities if they were treated separately.
This dissertation provides a unified computational framework for performance char-
acterization of the 3-D reconstruction techniques. It provides new designs for the main
components of the general performance evaluation system. In addition, the applicability
of these designs is examined in three ways: (i) application to the performance evaluation
of a recent common 3-D reconstruction technique, the space carving, (ii) application to the
data fusion of different reconstructions, and (iii) application to the design of a passive 3-D
scanner.
The general objective of this study is to allow for measuring the progress in the
3-D reconstruction research. This helps in quantifying the performance of the existing
techniques, analyzing the errors, and propose solutions to enhance the performance.
12
D. The Contribution of this Work
This dissertation introduces a new design for an experimental setup that integrates
the functionality of laser scanners and CCD cameras. The system is able to collect very
dense ground truth data. The system contains very efficient data acquisition modules that
guarantee generating high quality intensity data sets. These data are then calibrated, seg-
mented, and automatically registered to the ground truth data. These data sets can be used
by different 3-D reconstruction techniques including the stereo and the volumetric based
approaches. A database for such data sets will be available for the public-use to bridge the
gap caused by the unavailability of global experimental data sets.
A novel technique for 3-D data registration is presented. This technique is dedicated
to the evaluation procedures that aim to localize errors in the data under-test. The approach,
unlike the conventional 3-D data registration techniques, does not rely on the presence of
the 3-D reconstruction under test during the registration phase. This gives a major advan-
tage to this approach since the 3-D reconstruction could be of low quality that might add
difficulties to any 3-D registration technique. In addition, if the actual 3-D reconstructions
under test were used in the registration phase, then some errors that the evaluation process
tries to investigate might disappear during the minimization step used by any 3-D registra-
tion technique. The proposed approach employs silhouette images to align the given data
sets. Undistorted silhouette images can be generated easily, hence permitting good data
sets for the registration process. The approach is simple and efficient and can be applied to
any 3-D registration problem assuming the availability of a calibrated sequence of images
describing one of the data sets under registration.
Three testing methodologies are presented. The first test is the Local Quality As-
sessment (LQA) test. This test quantifies the performance of a given 3-D reconstruction
with respect to a reference 3-D reconstruction provided by the 3-D laser scanner. It is
designed to investigate local errors in the given 3-D reconstruction by decimating it into
13
different patches and measure the quality of each patch. This makes the error analysis
much easier and permits the integration of different 3-D reconstruction techniques based
on the results of this test.
An Image Re-projection (IR) testing methodology is presented to cope with the un-
availability of 3-D ground truth data. The test uses the acquired images as the reference of
comparison with the corresponding images re-projected from the given 3-D reconstruction.
This test also measures the applicability of the 3-D reconstruction techniques for virtual
reality problems.
To avoid errors due to the intensity variations and the re-projection process in the
IR test, we propose a Silhouette-Contour Signature (SCS) methodology that extracts shape
features form silhouette and contour images and permits the inclusion of distinct, cutting,
views from the 3-D ground truth data.
A classification criterion for testing methodologies is also presented. Based on this
criterion we can classify the tests that measure the performance of the 3-D reconstruction
techniques into 24 types of tests. This classification will eventually help in getting standard
ranking to reflect the validity of such tests.
An experimental evaluation of the space carving, as a recent common technique for
3-D reconstruction from a sequence of images, is presented. The evaluation procedures
used in this study are based on the presented performance evaluation framework. In this
study, we track the response of the space carving to the changes in the key controlling pa-
rameters of the algorithm.
Two applications for the performance evaluation framework are presented. The first
application is the 3-D data fusion of different 3-D reconstructions. A fusion technique
based on the image contour comparison is presented. The technique rectifies the 3-D re-
construction based on the closeness of its projected contours to the ground truth contours.
The method is used to combine reconstructions generated by a 3-D laser scanner and the
14
space carving technique.
The second application is the system design. A draft design for a passive 3-D scan-
ner is presented. The design is based on the experimental results of evaluating the perfor-
mance of the space carving. The proposed scanner should be able to reconstruct surfaces
that the commercial 3-D laser scanner may not able to reconstruct.
E. The Organization of the Dissertation
The remaining chapters of this dissertation are organized as follows:
• Chapter II: introduces a new design of a testing setup with other related components
such as camera calibration, object segmentation, and system accuracy. The chapter
provides discussions and results of the implemented components.
• Chapter III: introduces a novel 3-D registration methodology dedicated to the evalu-
ation problem.
• Chapter IV: introduces three different methodologies for the performance character-
ization of 3-D reconstruction techniques. Results and discussions are also presented.
• Chapter V: provides a study of the performance evaluation of the space carving tech-
nique when the key controlling parameters of this algorithm are changed.
• Chapter VI: provides applications of the performance evaluation framework, this in-
cludes data fusion of different reconstructions and a draft design of a 3-D scanner.
• Chapter VII: provides the conclusions of this dissertation and future extensions.
• Appendix A: provides a brief introduction to the projective geometry.
• Appendix B: provides a background for the camera calibration process as an impor-
tant component of the proposed system.
15
CHAPTER II
DATA ACQUISITION AND PREPARATION TECHNIQUES
In this chapter, we present a new design for an experimental test-bed for 3-D re-
construction techniques. The setup integrates the functionality of 3-D laser scanners and
CCD cameras. The setup provides accurate, general-use, automatically generated and reg-
istered dense ground truth data and their corresponding intensity data. Designs of object
segmentation and camera calibration submodules are also provided. Moreover, we present
an analytical solution to the system accuracy against calibration errors due to deviations
from the pre-assumed camera-motion mechanism.
A. Previous Work
To quantitatively evaluate a given 3-D reconstruction, dense ground truth data should
be available. However, generating dense ground truth data may be difficult or labori-
ous [18]. To get around the difficulty of generating dense ground truth data, sparse data can
be used [19]. However, testing the 3-D reconstructions using such data does not achieve
the requirements needed by many computer vision applications (e.g. virtual reality, reverse
engineering, or architectural design).
Szeliski and Zabih [20] used manually generated dense ground truth data in their
experimental comparison of stereo algorithms. However, most of the these data sets are
generated for fronto-parallel scenes. These types of scenes are considered special cases.
Recently, Scharstein and Szeliski have developed such data sets to include more challeng-
ing surfaces for stereo techniques such as slanted surfaces [3]. These data sets are repre-
16
sented in the form of depth maps which is a specific form of the simple stereo configuration.
This puts a major limitation on using these data sets for other 3-D reconstruction techniques
or even for general stereo configurations.
Mulligan et al. [21] have presented an experimental setup that provides dense ground
truth data for stereo tele-presence applications. This system is considered the first to pro-
vide dense 3-D ground truth data [21]. Although we propose a similar system of generating
ground truth data, there are several distinctions:
• Mulligan’s setup uses only fixed cameras while we use a rotating camera connected
to the scanner head. This feature makes our setup acquire large number of images
that cover a range of 0− 360o. These data can be used for both stereo and volumetric
approaches not only for stereo approaches as Mulligan’s setup.
• The registration procedure used in Mulligan’s setup is very complicated since the
scanner should reconstruct a calibration pattern, then a stereo approach should do the
same. Matched points between the two reconstructions are selected manually to find
the registration parameters. However, in our setup we automatically register the data
disregarding the 3-D reconstruction technique under test.
• To acquire different views at different rotations of an object by the Mulligan’s setup,
the object should be rotated manually. This provides inaccuracy in the calibration
process. We address this issue in this chapter for our system. We provide an ana-
lytical solution to the upper bound on such errors. For errors less than 0.5 pixels in
the acquired images, the error in the assumed rotation angle should be less than 0.2o.
In our setup, the precision is determined by the motion mechanism of the 3-D laser
scanner. An error of 0.1o is a typical error in commercial 3-D laser scanners.
• In Mulligan’s setup, calibrating the input images is a separate process from the cal-
ibration process used in the registration phase. In addition, a separate scan by the
17
scanner should be performed for each rotation of the object to generate the ground
truth data. However, in our system we scan the object and calibrate the camera only
once.
In general, Mulligan’s setup requires manual adjustments in one step or another during
the data acquisition process. This increases chances of generating inaccurate data and
extending the acquisition time. To overcome these limitations, we introduce a new design
for a 3-D test-bed. The setup provides accurate, general-use, automatically generated and
registered dense ground truth data and their corresponding intensity data. These data sets
are available for the computer vision community through the CVIP laboratory, University
of Louisville, ftp site at ftp://egypt.spd.uofl.edu/pub/Eva Data.
B. System Overview
The proposed system setup consists of a 3-D laser scanner and a CCD camera
mounted on a metal arm with multiple joints that is attached to the scanner head. A mono-
color, usually black, screen is attached to the scanner head facing the CCD camera such that
the screen constitutes a fixed background of the object under reconstruction. The structure
of the mono-color screen and the motion mechanism ensures fixed background that facili-
tates the object segmentation task [30].
The shaft over which the scanner head is mounted, is controlled in terms of speed
and angle of rotation to capture images at specific locations on a circular path. A sequence
of NI images I0, I2, ...INI−1 can be acquired by the calibrated camera. In addition, the
scanner generates a 3-D scan of the object by rotating 360o. This reference model is used
as a ground truth for the evaluation process. The field of view of the camera is set to cover
the same size of the objects as the laser scanner. The system setup and the motion mecha-
nism are shown in Figure 6a and Figure 6b respectively.
The system has two modes of operation:
18
1. continuous mode, where the 3-D scanner is working in its normal operation to gen-
erate a 3-D model of the object, or the rotating camera acquires a video of the object
under concern. This video is used to generate panoramic images for the object, which
can be used as inputs for panoramic stereo techniques such as [31].
2. discrete mode, where images are acquired at pre-defined locations in a circular path.
These images are used as the inputs to different 3-D reconstruction techniques from
a sequence of images.
Figure 7 shows examples of acquired images in discrete mode in (a), and continuous mode
in (b).
Other design aspects of the system such as background subtraction, camera cali-
bration and system accuracy will be presented in the following sections.
C. Background Subtraction
Efficient object segmentation permits good 3-D modeling and reduces the outliers
in the final model. For fair evaluation, we should make sure that the segmentation is not
a degradation factor in the overall performance of the vision technique. For this reason,
we propose a hardware solution by attaching a mono-color screen to the rotating shaft of
the 3-D scanner. This fixes the background of the acquired images and facilitates object
extraction. However, light variations and reflections could violate the mono-color assump-
tion. Therefore, we propose a secondary solution by applying a background subtraction
technique. The proposed technique is a modification to the algorithm by Elgammal [32]
that is used to subtract the background from successive frames in video sequence assuming
a fixed background scene.
In the proposed algorithm, a sequence of background images are acquired before-
hand, then the data images are acquired. A probabilistic model for the difference between
19
(a)
(b)
FIGURE 6 – The system setup. A CCD camera is mounted on the 3-D scanner head. Ascreen is attached to the scanner head against the camera to a guarantee fixed backgroundfor the test object. (a) Snapshot of the system, and (b) the system diagram.
20
(a)
(b)
FIGURE 7 – Modes of operation of the setup. (a) Discrete; sequence of images and (b)Continuous; Panoramic images.
21
background pixels is assumed: Gaussian of zero mean and covariance Σ. The probability
density estimate of the difference distribution Pe(xd − xb) is defined as
Pe(xd − xb) =1
(2π)1.5|Σ|0.5 exp(−0.5(xd − xb)tΣ−1(xd − xb)) (2)
where xd and xb denote the data and the background pixel intensities respectively.
If we assume that the RGB components of color images are independent with dif-
ferent σ2j for the jth color component, then
Σ =
⎡⎢⎢⎢⎢⎣
2σ21 0 0
0 2σ22 0
0 0 2σ23
⎤⎥⎥⎥⎥⎦ (3)
therefore, the density estimation can be written as
Pe(xd − xb) =3∏
j=1
1
(4π)1.5σj
exp(−1
4
(xdj− xbj
)2
σ2j
) (4)
Using this probability estimate, the pixel is considered a foreground pixel if Pe(xd−
xb) < Tp where Tp is a global threshold. The value of Tp is selected based on the histogram
of Pe(xd − xb) values.
Further processing may be needed if some foreground pixels are removed. A me-
dian filtering step can be used to restore such voids, but it can produce blurred colors and
edges. Fortunately, such blurred colors can be restored from the original image.
Figure 8a shows an image for an eagle-object. An intensity threshold is applied to
the image in Figure 8a to get the result shown in Figure 8b. As shown from the figures,
the background is not completely removed. Increasing the intensity threshold can enhance
the background removal, however with the possibility of removing foreground pixels as
shown in Figure 8c. A background image is captured as shown in Figure 8d and the back-
ground subtraction technique is applied to the image in Figure 8a. The segmentation result
is shown in Figure 8e. As shown, the background is removed with minimal errors in the
22
foreground. A value of the probability threshold Tp = 2 × 10−6 is used based on the his-
togram shown in Figure 8f.
Another example is shown in Figure 9. The figure shows results for a house-object
and the subsequent steps for background removal, with Tp = 5 × 10−6, and the makeup
using the median filter.
D. Camera Calibration
As the scanner head rotates, the attached camera should be calibrated at each new
position [33]. Geometric camera calibration [1, 34, 35] is a fundamental step in any vision
system that relies on quantitative measurements of the observed scene. Camera calibration
is the process of determining the internal camera geometric and optical characteristics (in-
trinsic parameters), and the 3-D position and orientation of the camera frame relative to a
chosen world coordinate system (extrinsic parameters).
Camera calibration is usually performed using calibration patterns [36]. A com-
mon calibration pattern consists of two white perpendicular planes, printed with orthogonal
grids of equally spaced black squares as shown in Figure 10a. The 3-D coordinates of the
vertices of each square in our chosen world coordinate frame are known. The pixel coordi-
nates of the projections of the vertices on the image plane can be determined as shown in
Figure 10b. The world-image point matches of the pattern can now be used to determine
P0, the projection matrix at the home position.
Assume that the projection matrix P0, defined up to an arbitrary scale factor, is:
P0 =
⎡⎢⎢⎢⎢⎣
p11 p12 p13 p14
p21 p22 p23 p24
p31 p32 p33 p34
⎤⎥⎥⎥⎥⎦ (5)
and given Nmatch 3-D points, Nmatch > 6, of an object and the corresponding 2-D points
in its projected image, the 11 unknowns of P0 can be computed.
23
(a) (b) (c)
(d) (e)
0 0.5 1 1.5 2 2.5 3 3.5 4
x 10−6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Pe
Nor
mal
ized
his
togr
am
(f)
FIGURE 8 – Background subtraction versus intensity threshold. (a) an original image (b)segmentation using an intensity threshold, (c) with greater value of the intensity threshold(d), a background image, (e) segmentation using background subtraction, and (f) histogramof the probability estimate values.
24
(a) (b)
(c) (d)
(e)
FIGURE 9 – Background subtraction. (a) an original image (b) a background image, (c)results after background subtraction with Tp = 6 × 10−6, (d) results after application ofmedian filter and (e) recovery of original colors to fix blurring introduced by the medianfilter.
25
(a) (b)
FIGURE 10 – Camera calibration. (a) the calibration pattern (b) selected points on thepattern’s image.
Since the relation between a 3-D point and its corresponding 2-D point is
w
⎡⎢⎢⎢⎢⎣
x
y
1
⎤⎥⎥⎥⎥⎦ =
⎡⎢⎢⎢⎢⎣
p11 p12 p13 p14
p21 p22 p23 p24
p31 p32 p33 p34
⎤⎥⎥⎥⎥⎦
⎡⎢⎢⎢⎢⎢⎢⎢⎣
X
Y
Z
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(6)
where w is a scalar, then for each pair i of corresponding points, the image points are:
xi =p11Xi + p12Yi + p13Zi + p14
p31Xi + p32Yi + p33Zi + p34
(7)
yi =p21Xi + p22Yi + p23Zi + p24
p31Xi + p32Yi + p33Zi + p34
(8)
which can be arranged in 2Nmatch linear equations in the matrix unknowns in the form
WPc = 0 (9)
26
where,
W =
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
X1 Y1 Z1 1 0 0 0 0 −x1X1 −x1Y1 −x1Z1 −x1
0 0 0 0 X1 Y1 Z1 1 −y1X1 −y1Y1 y1Z1 y1
X2 Y2 Z2 1 0 0 0 0 x2X2 −x2Y2 −x2Z2 x2
0 0 0 0 X2 Y2 Z2 1 −y2X2 −y2Y2 −y2Z2 −y2
· · · · · · · · · · · ·
· · · · · · · · · · · ·
· · · · · · · · · · · ·
XN YN ZN 1 0 0 0 0 −xNXN −xNYN −xNZN −xN
0 0 0 0 XN YN ZN 1 −yNXN −yNYN −yNZN −yN
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦
(10)
and Pc = (p11, p12, ........, p33, p34)T . Then the unknowns can be recovered by using the
singular value decomposition of W as:
W = USVT (11)
The solution is the eignvector V corresponds to the smallest eignvalue in the main diagonal
of S. We usually use Nmatch > 6 even it is possible to use Nmatch = 6 since we have 11
unknowns, however an over-determined solution is desirable in the case of noisy measure-
ments.
To obtain high accuracy in estimating the projection matrix, a good initial guess
of the solution is needed to employ a nonlinear error minimization technique. The used
technique is based on Robert’s method [37].
After calibrating the camera at the initial position, a new projection matrix is needed
at each new position in the circular path. To compute the new projection matrix, we simply
need to update the extrinsic parameters of the camera. The projection matrix relates the
world coordinates with the image coordinates. It is composed of two matrices representing
the intrinsic, K, and extrinsic, D, parameters of the camera as described by Equations (108)
27
and (110), respectively in Appendix II. The intrinsic parameters of the camera are not
changing from one view to another since we use the same camera with the same optical
settings. We assume pure rotation of the camera around the Y-axis, so the translational and
the rotational components remain unchanged except for the rotation angle β around the
Y-axis. The extrinsic matrix is then updated at each new position in the circular path due
to the following formula:
Dk = Dk−1Rβk, (12)
where k = 1, 2, ......, NI−1 and Rβkis a 4×4 matrix representing an Euclidian transforma-
tion with a non-zero parameter βk. The detailed description of the rotation and translation
components of the the matrix D is presented in Appendix II. The transformation matrix
Rβkcan be expressed as:
Rβk=
⎡⎢⎢⎢⎢⎢⎢⎢⎣
cos(βk) 0 sin(βk) 0
0 1 0 0
− sin(βk) 0 cos(βk) 0
0 0 0 1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(13)
For equidistant motion where β1 = β2 = ..... = βNI−1 = β then,
Pk = P0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
cos(kβ) 0 sin(kβ) 0
0 1 0 0
− sin(kβ) 0 cos(kβ) 0
0 0 0 1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(14)
Therefore, a sequence of calibrated images I0, I1, ...INI−1 is generated. These images are
used as the input data to the vision technique under-test.
E. Setup Accuracy
Since the rotation angle of the scanner head is pre-assumed, subsequent errors could
result if actual rotation of the head does not follow the pre-assumed rotation. As a result,
28
we put an upper bound on the rotation angle error. The proposed setup has to be accurate
to the limit of this upper bound, otherwise subsequent errors could affect the accuracy of
the evaluation process [38].
Assuming that the scanner head rotates by an angle β (radians) from the initial
position, the relation between the image coordinates and the world coordinates through the
projection matrix at a new position can be written as follows:
⎡⎢⎢⎢⎢⎣
u
v
w
⎤⎥⎥⎥⎥⎦ = P0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
cos(β) 0 sin(β) 0
0 1 0 0
− sin(β) 0 cos(β) 0
0 0 0 1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
⎡⎢⎢⎢⎢⎢⎢⎢⎣
X
Y
Z
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
= P0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
X cos(β) + Z sin(β)
Y
−X sin(β) + Z cos(β)
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(15)
where u = wx and v = wy. If an inaccuracy of amount �β (radians) is assumed, then
⎡⎢⎢⎢⎢⎣
u + �u
v + �v
w + �w
⎤⎥⎥⎥⎥⎦ = P0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
cos(β + �β) 0 sin(β + �β) 0
0 1 0 0
− sin(β + �β) 0 cos(β + �β) 0
0 0 0 1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
⎡⎢⎢⎢⎢⎢⎢⎢⎣
X
Y
Z
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(16)
= P0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
X cos(β + �β) + Z sin(β + �β)
Y
−X sin(β + �β) + Z cos(β + �β)
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
= P0
⎡⎢⎢⎢⎢⎢⎢⎣
X(cos(β) cos(�β) − sin(β) sin(�β)) + Z(cos(β) sin(�β) + sin(β) cos(�β))
Y
−X(cos(β) sin(�β) + sin(β) cos(�β)) + Z(cos(β) cos(�β) − sin(β) sin(�β))
1
⎤⎥⎥⎥⎥⎥⎥⎦
29
for small values of β (radians) we can use the approximations: sin(Δβ) ≈ Δβ and
cos(Δβ) ≈ 1 then we get:
⎡⎢⎢⎢⎢⎣
u + �u
v + �v
w + �w
⎤⎥⎥⎥⎥⎦ = P0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
X cos(β) + Z sin(β) + �β(−X sin(β) + Z cos(β))
Y
−X sin(β) + Z cos(β) −�β(X cos(β) + Z sin(β))
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(17)
Let fβ(X,Z) = X cos(β) + Z sin(β) and gβ(X,Z) = −X sin(β) + Z cos(β), then
⎡⎢⎢⎢⎢⎣
u + �u
v + �v
w + �w
⎤⎥⎥⎥⎥⎦ = P0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
fβ(X,Z)
Y
gβ(X,Z)
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
+ �βP0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
gβ(X,Z)
0
−fβ(X,Z)
0
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(18)
Therefore, ⎡⎢⎢⎢⎢⎣
�u
�v
�w
⎤⎥⎥⎥⎥⎦ = �βP0
⎡⎢⎢⎢⎢⎢⎢⎢⎣
gβ(X,Z)
0
−fβ(X,Z)
0
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(19)
then,
�u = �β(p11gβ(X,Z) − p13fβ(X,Z))
and
�v = �β(p21gβ(X,Z) − p23fβ(X,Z))
since �x ≈ �uw
and �y ≈ �vw
, then
�x =�β(p11gβ(X,Z) − p13fβ(X,Z))
p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34
(20)
and
�y =�β(p21gβ(X,Z) − p23fβ(X,Z))
p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34
(21)
30
to get |�x|, |�y| ≤ 0.5 pixels, then
|�β| ≤ 0.5|p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34|
|p11gβ(X,Z) − p13fβ(X,Z)| (22)
and
|�β| ≤ 0.5|p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34|
|p21gβ(X,Z) − p23fβ(X,Z)| (23)
combining the above two constraints, we get an upper bound on the rotation error, �β, as:
|�β| ≤ min1≤i≤card(M)
(0.5|p31fβ(Xi, Zi) + Yip32 + p33gβ(Xi, Zi) + p34|
max(|ax|, |ay|)
)(24)
where ax = (p11gβ(Xi, Zi) − p13fβ(Xi, Zi)), ay = (p21gβ(Xi, Zi) − p23fβ(Xi, Zi), M is
the 3-D data set under test and card(M) is the cardinality of M.
Equation (24) shows that we can get an upper bound on the rotation error in terms of
camera parameters, the assumed rotation angle and coordinates of 3-D data set to achieve
desired accuracy of generated 2-D images. For different rotation angles, it is clear from
Figure 11 that very little value of error is permitted in the rotation angle before ± 0.5 pixel
error in the image coordinates would happen.
Since the rotation is around the Y-axis, it is quite understandable that the severe
effect on x-coordinate in the image is due to rotation angle error. However, this should not
affect the accuracy in the y-coordinates of the image as it is shown in Figure 11. However,
this effect is mostly due to the slight deviation from the assumption that the rotation is only
around the Y-axis in the world coordinates. Fortunately, the value of the rotation error in
commercial 3-D scanners is less than 0.1 degrees. This value is less than the 0.2 degrees
specified by the upper bound on the rotation error as shown in Figure 11.
F. Summary
In this chapter, we have presented a new design of an experimental setup dedicated
to the performance evaluation tasks. The setup provides accurate, general-use, automat-
ically generated and registered dense ground truth data and their corresponding intensity
31
10 20 30 40 50 60 70 80 900
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
β (degree)
up
per
bo
un
d o
n Δ
β (d
egre
e) y directionx direction
FIGURE 11 – System accuracy. Upper bounds on rotation errors if absolute error of ≤ 0.5pixels in x and y image coordinates is assumed.
data. Two submodules of the setup: the camera calibration and the object segmentation, are
carefully designed to ensure the quality of the generated data. The accuracy of the system
is investigated and a least upper bound on the error of the motion mechanism is provided.
Introducing this system to the vision community will certainly bridge a gap in the
way of creating standard dense ground truth data and input data for the performance evalu-
ation of many 3-D reconstruction techniques. To make this system available for public use,
we publish the collected data for different test objects on the ftp site:
ftp://egypt.spd.uofl.edu/pub/Eva Data.
32
CHAPTER III
A NOVEL TECHNIQUE FOR 3-D DATA REGISTRATION AS APRE-EVALUATION STEP
Data registration is a crucial step in the performance evaluation procedures that aim
at localizing errors in the given measured data. Under the performance evaluation topic,
the given measured data should be accurately registered/aligned to the ground truth data
such that the registration process does not affect the accuracy of the consequent evaluation
steps. Using conventional registration methods, the outcome of the registration process
would be questionable if the measured data set is being a corrupted version of the ground
truth data set. Unfortunately, the registration of such data types is an unavoidable step in
most evaluation procedures.
To cope with this problem, another registration methodology that can go beyond
the conventional ones should be used. Here we propose a novel approach for 3-D data
registration. The performance of the approach is totally independent of the measured data
set, which is probably subjected to distortion, since the approach employs error-free snap-
shots of the 3-D object instead of its measured reconstruction. The key advantage of this
approach is that it keeps the registration process away from being affected by the probably
corrupted data sets, hence it permits confidential evaluation results.
A. 3-D Data Registration
Data registration is a common problem in computer vision. Applications include
object recognition, surface matching, pose estimation, data fusion and our concern, the per-
formance evaluation. The registration process aims at placing the data sets into a common
33
reference frame by estimating the transformation parameters between data sets. The key
problem with any registration technique is that the correspondences between data points
are not known a-priori.
One of the common techniques used to solve the registration problem is the Itera-
tive Closest Point (ICP) technique [22]. The algorithm is simple and efficient however it
needs a good initial estimate otherwise it will be stuck to a local minima. In addition, the
algorithm is sensitive to the statistical outliers [22]. Many techniques have been introduced
in the literature to provide robustness to the ICP method e.g. [23–25]. However, other dis-
tortion models were not treated by these studies.
Other 3-D registration techniques that rely on the selection of distinct features in
the data under registration could be used instead of the ICP approaches [26–29]. However,
selection and matching of these features are challenging tasks for these techniques when
corrupted data sets are manipulated. Manual selection and matching of features could be a
solution under these circumstances as in [21].
We provide an example of 2-D registration of distorted data. In this example, the
2-D shape model is shown in Figure 12a. A copy of this shape is shifted by 20 pixels in
the negative y-direction and clipped from the bottom by 20 pixels as shown in Figure 12b.
We assume that we can match the five numbered-features shown in both figures. The initial
alignment of the two shapes is shown in Figure 12c. The Mean Square Error (MSE) crite-
rion is used to measure the level of alignment when the displacement Δy changes from 0
to 30. The value of the desired displacement is Δydes = 20. Using the MSE criterion, the
optimal displacement Δyopt is shown to be 30 pixels which gives the alignment result in
Figure 12d.
When the absolute difference criterion is used instead of MSE, the correct align-
ment is reached at Δyopt = ydes = 20 pixels. The result of using the absolute difference
is shown in Figure 12e. We repeated this experiment using only four features excluding
34
feature number 5. Using the absolute error criterion, many optimal solutions are detected
at Δyopt ∈ {20, 21, ....., 39, 40}. This adds ambiguity to the alignment process. The regis-
tration result for Δyopt = 40 is shown in Figure 12f.
If we chose to use the distance-based matching, such as closest point matching using
MSE [22], rather than feature-based matching, the result is shown to be similar to that
in Figure 12d, which is not the desired solution. Plots of different errors with different
matching criteria are shown in Figure 13.
We conclude from the previous counter example that the convergence to the desired
registration solution, either by feature-based approaches or distance-based approaches, may
not be guaranteed if one member of the data sets under-registration is subjected to distor-
tion.
In fact, the 3-D registration of low quality data sets is not a simple task. For in-
stance, the registration of the model data in Figure 14a to the measured data in Figure 14b
could be a difficult task for any 3-D registration technique whether it is feature-based or
distance-based. However, the registration results shown in Figure 14c can be reached by
the proposed Registration Through Silhouettes (RTS) technique [39].
The proposed approach does not rely on the presence of the 3-D reconstruc-
tion under-test during the registration phase. This gives an advantage to our approach,
since the 3-D reconstruction could be of low quality that might add difficulties to any 3-D
registration technique. In addition, if the actual 3-D reconstruction under test was used
in the registration phase, then some errors that the evaluation process tries to investigate
might disappear during the minimization step used by any 3-D registration technique. The
proposed approach employs silhouette images to align the given data sets. Undistorted sil-
houette images can be generated easily, hence permitting good data sets for the registration
process.
Image silhouettes have been used for many computer vision applications such as
35
2
1
3
4
5 x
y
1
2 3
4 5
(a) (b)
(c) (d)
(e) (f)
FIGURE 12 – An example of registration under distortion. (a) the original shape, (b) thedistorted shape (clipped bottom), (c) initial registration of shapes in (a) and (b). Optimalregistration results by matching the five features 1-5 indicated in (a) and (b) by minimiz-ing: (d) the Mean Square Error (MSE) (incorrect result) and (e) the absolute error (correctresult). (f) An optimal registration result by matching the four features 1-4 and minimizingthe absolute error (incorrect result).
36
0 5 10 15 20 25 30 35 400
100
200
300
400
500
600
700
800
Δ y
MS
E
0 5 10 15 20 25 30 35 408
10
12
14
16
18
20
22
24
26
28
Δ y
Ab
solu
te E
rror
(a) Δyopt = 30 (b) Δyopt = Δydes = 20
0 5 10 15 20 25 30 35 4010
12
14
16
18
20
22
24
26
28
30
Δ y
Ab
solu
te E
rror
0 10 20 30 40 50 600
100
200
300
400
500
600
700
Δ y
MS
E
(c) Δyopt ∈ {20, 21, ..., 39, 40} (d) Δyopt = 30
FIGURE 13 – Registration errors with different error criteria and matching strategies. (a)mean square error with five-points matching, (b) absolute error with five-points matching,(c) absolute error with four-points matching, (d) mean square error with the closest distancematching.
37
(a) (b)
(c)
FIGURE 14 – A difficult 3-D data registration case. (a) ground truth data (b) corruptedmeasured data, (c) 3-D data registration using the proposed Registration Through Silhou-ettes (RTS) technique.
38
shape recovery [10], texture mapping [40], pose estimation [41], and camera calibration [42].
Since silhouettes are insensitive to colors and they can encode useful information about the
3-D pose, they are used in the proposed approach.
We consider the 3-D registration step the core of any evaluation work that employs
3-D ground truth data provided that it does not affect the accuracy of the evaluation pro-
cess itself. Employing an efficient 3-D registration technique could make the design of
evaluation methodologies a straightforward task. In addition, it helps along with a suit-
able evaluation methodology in localizing errors in the 3-D reconstruction under test. This
localization step is necessary for diagnosis and data fusion post-evaluation techniques.
B. 3-D Data Registration Through Silhouettes (RTS)
Since our goal is to evaluate the quality of a given data set M of measured points
generated by a given 3-D reconstruction technique X, the ground truth data set G should be
aligned with M.
1. An Overview of the Approach
Since we evaluate a 3-D reconstruction M from a calibrated sequence of images,
a set Sin of silhouettes can be generated. In addition, we use G to generate another set
of silhouettes, SG , at the same views as the set Sin. In the ideal case when M and G are
initially registered, Sin and SG are aligned. However, in most cases a certain transformation
T is needed to align G with M. Applying T iteratively to G to get SG such that the error
between Sin and SG is minimal will lead to getting the best T that brings G and M into
match.
As a formal 3-D rigid registration problem, the goal is to find the transformation
T (R, t) where R is a 3 × 3 rotation matrix with 3 degrees of freedom (DOF): θX , θY , and
39
θZ and t is a 3-D translation vector that has 3 DOF: tX , tY , and tZ such that the energy E is
minimal where,
E =∑
i
d2(mi ∈ M, T (R, t)(gi ∈ G)) (25)
and where d denotes the Euclidean distance. Since M is not an ideal reconstruction and the
minimization could be difficult to be performed directly in the 3-D coordinates, we reduced
the problem into 2-D minimization through silhouettes. We assume that M is not available
at the registration phase but its calibrated silhouette set Sin is available. We generate SG by
projecting G to the same views as of Sin.
2. The Registration Procedure
For each iteration i = 1, ...., Nmax, where Nmax is the maximum number of itera-
tions, the registration parameters (θiX , θi
Y , θiZ , tiX , tiY , tiZ), are used to find the transformed
set Gi+1, where:
Gi+1 = T (Ri, ti)Gi (26)
In general,
T (R, t) =
⎡⎢⎣ R t
0T 1
⎤⎥⎦ (27)
For each set Gi, a corresponding set of silhouettes SGiof cardinality Ns is generated where:
SGi= {sl
Gi: sl
Gi⊂ SGi
, l = 1, ....., Ns} (28)
40
for each point I lGi
(xlGi
, ylGi
) ∈ slGi
and gli = (X l
Gi, Y l
Gi, Z l
Gi) ∈ Gi which is visible at view l,
the following relation holds at the proper projection matrix Pl such that:
cli
⎡⎢⎢⎢⎢⎣
xlGi
ylGi
1
⎤⎥⎥⎥⎥⎦ = Pl
⎡⎢⎢⎢⎢⎢⎢⎢⎣
X lGi
Y lGi
Z lGi
1
⎤⎥⎥⎥⎥⎥⎥⎥⎦
(29)
where c is a scalar value and
I lGi
(k1, k2) =
⎧⎪⎨⎪⎩
L1, if k1 = xlGi
and k2 = ylGi
;
L2, otherwise.(30)
where, L1, and L2 are two gray levels, 1 ≤ k1 ≤ Nh, 1 ≤ k2 ≤ Nw, and Nh × Nw is the
cardinality of slGi
.
For a sequence of Ns input images I l, a corresponding set of silhouettes Sin can be extracted
as:
Sin = {slin : sl
in ⊂ Sin, l = 1, ....., Ns} (31)
such that for each point I lin(k1, k2) ∈ sl
in
I lin(k1, k2) =
⎧⎪⎨⎪⎩
L1, if I l(k1, k2) is a silhouette point;
L2, otherwise.(32)
The error criterion Ei is defined as:
Ei =1
NsNhNw
Ns∑l=1
Nh∑k1=1
Nw∑k2=1
[I lin(k1, k2) − I l
Gi(k1, k2)]
2 (33)
then an optimization algorithm is needed to find the solution of
min(E |R,t) → minR,t
(E) (34)
a minimization procedure is described in the next section.
41
a. A Two-step Minimization We use a Genetic Algorithm (GA) [43] to mini-
mize Equation (33). To apply GA to our registration problem, we encoded the transfor-
mation parameters as genes. Each parameter is encoded by 16 bits. The genes are formed
by concatenating six binary coded parameters, the angles of rotation; θX , θY , θZ and the
translation components tX , tY , and tZ . The crossover operation occurs at multi points
along the gene with probability pc = 0.95. A mutation rate of 0.01 is usually used. Since
GA maximizes an objective function, we used the following objective function F to be
maximized:
F =1
E (35)
Since GA is a global search method that converges at infinity, we used it only to get
a primary solution to a local search method. Here we used the Nelder-Mead (NM) simplex
as a local search method. It is important to note that other suitable optimization techniques
can replace the (GAs + simplex) solution such as simulated annealing, etc. without affect-
ing the validity of the RTS technique.
In the presented algorithm, the convergence of the optimization techniques and
hence the performance of RTS technique depends on the selection of Sin and to what ex-
tent the silhouettes are distinct. Symmetric objects that have similar silhouettes are of no
interest under the evaluation topic. Simply, if needed they can be synthetically generated.
In practice, a subset of Sin that consists of 4 orthogonal silhouettes provides enough con-
straints on the shape of moderate complexity objects. In general, at least two silhouettes
generated by two non-collinear cameras should be used to avoid the degenerate case where
corresponding points from unregistered objets lie on the same optical ray. As shown in
Figure 15 the geometric distance between 3-D points X1 and X2 is vanished in image of
camera Oa.
Proposition 1. At least two silhouettes generated by two non-collinear cameras should be
used by the RTS approach.
42
Proof : (using Figure 15)
Assume that X1 �= X2 and X1 and X2 lie on the optical ray Oax or equivalently line
L1 (a degenerate case for camera Oa). Assume that another degenerate case can happen
in camera Ob, i.e. X1, X2 ∈ L3. Then this means that either X1 = X2, contradiction, or
L2 ≡ L3, i.e. cameras Oa and Ob are collinear.
This proves that the degenerate cases, where corresponding points from the unreg-
istered data sets lie on the same optical ray, can not happen simultaneously in two cameras
unless they are collinear. Therefore, to avoid such cases at least two silhouette images
generated by two non collinear cameras should be used.
b. Occluding Contours as Replacements to the Silhouettes The object occlud-
ing contours can be used instead of the silhouettes in our approach to reduce the redundancy
in the silhouette images. A sequence of pre-processing operations such as image filtration
and edge detection are applied to the sets Sin and SG to generate sets of contour images:
Cin and CG , respectively. These sets are defined as:
Cin = {clin : cl
in ⊂ Cin, l = 1, ....., Ns} (36)
such that for each point J lin(x, y) ∈ cl
G
J lin(x, y) =
⎧⎪⎨⎪⎩
J lin(xc, yc) = 1, if J l
in(x, y) is a contour point;
0, otherwise.(37)
and
CG = {clG : cl
G ⊂ CG, l = 1, ....., Ns} (38)
such that for each point J lG(x, y) ∈ cl
G
J lG(x, y) =
⎧⎪⎨⎪⎩
J lG(xc, yc) = 1, if J l
G(x, y) is a contour point;
0, otherwise.(39)
for each point J lin(xc, yc)|j ∈ {J l
in(xc, yc)}we find the closest point, as in [22], J lG(xcp, ycp)|j ∈
{J lG(xc, yc)} for j = 1, ....., Nc = card({J l
in(xc, yc)}). Then the error criterion E can be
43
FIGURE 15 – A degenerate case for silhouettes alignment. The points X1 and X2 are twocorresponding 3-D points in space. To align these points using silhouettes, a non-zerogeometric distance between their image points, x1 and x2 should be detected (as in cameraOb). However, if X1 and X2, lie on the same optical ray (as in cameras Oa and Oc), thentheir projections degenerate to a single point, hence a zero geometric distance is detected.To avoid such case, at least two silhouettes images from non-collinear cameras should beused.
44
written as:
E =1
NsNc
Ns∑l=1
Nc∑j=1
d2(J lin(xc, yc)|j, J l
G(xcp, ycp)|j) (40)
The occluding contours are more geometrically descriptive than the silhouettes,
however they need additional preprocessing operations such as image filtration and edge
detection. Errors in extracting such contours could affect the convergence of the registra-
tion process. Throughout our implementation of the RTS algorithm we use the occluding
contours in the second optimization step using the simplex method, since computing the
occluding contours and using the closest point criterion could increase the computation
complexity of the first optimization step by the genetic algorithm.
c. An Evaluation Criterion for the RTS Approach The RTS approach is a self-
evaluating approach. The average distance dlav for each image l defined as:
dlav =
1
Nc
Nc∑j=1
d(J lin(xc, yc)|j, J l
G(xcp, ycp)|j) (41)
can be used as a measure of the quality of the RTS approach. An error distance dexp =√
2 is
expected due to truncation errors of image re-projection and the preprocessing approaches
of image filtration and edge detection. A range of dav ∈ [√
2 2√
2] is considered an
expected range for good registration. This can be used as a stopping criterion for the ap-
proach, as well.
The distance error can be expressed as error ratio in dB using what we call the coincidence
index (CI) as follows:
CI = 20 log
√2
dav
(42)
with expected range of good quality of [0 − 6] dB.
C. Results and Discussion
In this section, we present experimental results for the RTS approach. A number
of 36 images are acquired for a house-object. Out of these images, we select 4 images
45
of consequent orthogonal views at angles of 0, 90, 180, and 270 as shown in Figure 16a,
from left to right respectively. Their silhouettes are shown in Figure 16b. A reference 3-D
model for the house-object is generated by the 3-D laser scanner. Only 10% of 3-D points
of the scanner model is used to generate the silhouettes by projecting the model using the
projection matrices in Equation (14). These silhouettes are generated at the same views of
Figure 16b as shown in Figure 16c.
The search parameters are initialized with random values. A Genetic Algorithm
(GA) with crossover rate of 0.95, mutation rate of 0.01 and a population size of 20 is ap-
plied to the given sets of silhouettes. The silhouettes of the 3-D scanner model resulted
after running 100 generations/iterations of GA are shown in Figure 16d. The search pa-
rameters gained from GA step were applied to the next optimization step using the local
search algorithm. 100% of scanner data are used in this step to get accurate results. The
results after running the simplex method for 150 iterations are shown in Figure 16e.
The convergence of the search parameters is plotted for the six parameters in Fig-
ure 17. It is clear from the Figure that GA provided good approximation for the parameters
after less than 50 iterations and the simplex method has refined the solution after less than
100 iterations.
To validate these results we plotted the ground truth data, before applying the RTS
algorithm, in the same plot with the 3-D reconstruction of the house-object generated by
a technique X. The reconstruction by technique X plays no role in the registration pro-
cess, but we plotted it to show the relative positions of the reference frames of the two
reconstructions before applying the RTS technique as shown in Figure 18a. To show the
alignment after applying RTS, the two reconstructions are plotted in Figure 18b. As a stan-
dard way of showing the registration results, a mixed reconstruction is shown in Figure 18c
where the blue patches are generated by the scanner and the red patches are generated by
the technique X.
46
The average distance dav and coincidence index CI are used to quantify the regis-
tration results. The dav and CI values are computed at each view of the input images set as
shown in Figures 19a and 19b, respectively. Some values of dav and their equivalent CI
values are beyond the expected range of good quality as indicated by the values at views
No. 10, 11, 12, and 13. This situation is predicted since the√
2-distance value, which
is assumed as a reference error, is an expected value not an exact value. The maximum
error distance (dav = 2.13 pixels) or the minimum coincidence index (CI = −3.54dB) is
detected at view No. 21.
The final contour at view No. 21, the maximum-error view, is shown in Figure 20.
At this view, a slight error is noticed which is due to inaccuracy of estimating the angle
θZ . Some errors are expected at certain views that were not used through the optimization
phase, as in view No. 21. Increasing the number of silhouettes in the optimization step
would reduce the overall error, however, with the expense of increasing the overall exe-
cution time. The statistics of the average distance of this experiment show a mean value
of 1.63 pixels and standard deviation (std) of 0.19 pixels. These statistical values indicate
good quality registration.
Another experiment on house data sets is performed. A copy of the registered
ground truth data is transformed by known translation and rotation parameters. The RTS
approach is applied to the transformed data set. Only 5% of the scanner data are used
through the optimization phase using the genetic algorithm. The genetic crossover rate is
set to 0.95 and the mutation rate is 0.01 while the population size is set to 30. The second
optimization step is performed using the simplex method applied to the occluding contour
images. After 100 generations/iterations by the genetic algorithm followed by 150 iter-
ations using the simplex method, the RTS approach was able to converge to the desired
parameters. Figure 21 shows the convergence of the registration parameters to the desired
values indicated by the dashed lines.
47
TABLE 1CONVERGENCE OF THE RTS APPROACH TO THE DESIRED VALUES.
Parameter θX (rad) θY (rad) θZ (rad) tX (mm) tY (mm) tZ (mm)
Initial values 0.025 0 0.025 -5 10 -10
Desired values 0 -0.5 0.01 0 20 -20
Final values 0.0008 -0.4610 0.0106 -0.0308 21.5558 -20.3151
Table 1 shows the initial, desired and final values of the registration parameters.
Slight deviations of the final values of θY and tY parameters from the desired values are
noticed. These deviations are due to the sticking of the simplex approach to a local min-
ima, which is a known disadvantage of the local search methods. The average distance and
the coincidence index measures are plotted for this experiment before and after applying
the RTS approach as shown in Figure 22. This shows the error reduction after applying
the RTS approach. Slight degradation in the mean value of the average distance is noticed
due to errors in θY and tY compared to the previous experiment. Visual results for align-
ment of the scanner contours and the image contours before and after applying the RTS
approach are shown in Figure 23. Note the deviation of θY and tY from the desired values
(Y-direction of 3-D space is the same as of the y-direction of the image).
In general, the accuracy of the registration depends on the number of the distinct
silhouettes used in the optimization phase. Of course, the greater the number the more ac-
curate the results, however in the GA optimization step, the greater the number, the greater
the time required to estimate the objective function especially when a large population is
assumed. That is why we used lower percentages of the scanner data to reduce the run time
in the GA optimization step.
48
D. Summary
In this chapter, a novel technique for 3-D data registration is presented. This tech-
nique is dedicated to the evaluation procedures that aim at localizing errors in the data
under-test. The proposed approach does not rely on the presence of the 3-D reconstruction
under test during the registration phase. This gives a major advantage to this approach,
since the 3-D reconstruction could be of low quality. Such low quality situation is expected
to introduce difficulties to any 3-D registration technique. In addition, if the actual 3-D
reconstruction under test was used in the registration phase, then some errors that the eval-
uation process tries to investigate might disappear during the minimization step used by
any 3-D registration technique. The proposed approach employs silhouette images to align
the given data sets. Undistorted silhouette images can be generated easily, hence permitting
good data sets for the registration process. The approach is simple and efficient as shown
by the experimental results presented in this chapter.
49
(a)
(b)
(c)
(d)
(e)
FIGURE 16 – Registration Through Silhouettes (RTS) Results. (a) Input images: from leftto right, four input images at 0, 90, 180, and 270 angles, respectively. (b) Four silhouettesfrom input images at 0, 90, 180, and 270 angles. (c) Initial silhouettes re-projected fromthe 3-D model generated by the 3-D laser scanner at same angles as in (b) using 10% of thescanner data (d) final silhouettes after applying the genetic algorithm, (e) final silhouettesafter applying the simplex method starting with the parameters generated by the geneticalgorithm.
50
50 100 150 200 250−0.08
−0.07
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0
Iteration No.
θ x (ra
d)
genetic simplex
50 100 150 200 250−0.26
−0.24
−0.22
−0.2
−0.18
−0.16
−0.14
−0.12
−0.1
−0.08
Iteration No.
θ y (ra
d)
genetic simplex
(a) (b)
50 100 150 200 250−0.09
−0.08
−0.07
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0
Iteration No.
simplex genetic
θ z (ra
d)
50 100 150 200 25024
26
28
30
32
34
36
simplex
Iteration No.
genetic
t x (m
m)
(c) (d)
50 100 150 200 250−70
−68
−66
−64
−62
−60
−58
t y (m
m)
Iteration No.
simplex genetic
50 100 150 200 25035
36
37
38
39
40
41
42
43
44
Iteration No.
simplex genetic
t z (m
m)
(e) (f)
FIGURE 17 – Registration parameters. (a) The rotation around X-axis, (b) the rotationaround Y-axis, (c) the rotation around Z-axis, (d) the translation in X-direction, (e) thetranslation in Y-direction, (f) the translation in Z-direction.
51
(a) (b)
(c)
FIGURE 18 – 3-D registration visual results. (a) Unaligned 3-D reconstructions, (b) afterregistration using the RTS technique, and (c) selected patches from each reconstruction.
52
5 10 15 20 25 30 35
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
View No.
Ave
rage
Dis
tanc
e (p
ixel
)
(a)
5 10 15 20 25 30 35−7
−6
−5
−4
−3
−2
−1
0
1
View No.
Coi
ncid
ence
Inde
x (d
B)
(b)
FIGURE 19 – 3-D registration quantitative results. (a) The average distance and (b) Coin-cidence Index (CI).
53
(a) (b)
(c) (d)
FIGURE 20 – Rendered views to show the alignment of the ground truth contours (blue)and the image contours (red.) (a) CI = 0.65dB, (b) CI = −1dB, (c) CI = −1.9dB, and(d) CI = −3.5dB.
54
0 50 100 150 200 250−0.01
0
0.01
0.02
0.03
0.04
0.05
Iteration No.
θ X (
rad)
0 50 100 150 200 250−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0
Iteration No.
θ Y (
rad)
(a) (b)
0 50 100 150 200 2500.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
Iteration No.
θ Z (
rad
)
0 50 100 150 200 250−6
−5
−4
−3
−2
−1
0
1
Iteration No.
t X (
mm
)
(c) (d)
0 50 100 150 200 2500
5
10
15
20
25
30
Iteration No.
t Y (m
m)
0 50 100 150 200 250−30
−25
−20
−15
−10
−5
0
Iteration No.
t Z (
mm
)
(e) (f)
FIGURE 21 – The convergence of the registration parameters to the desired values. (a)The rotation around X-axis, (b) the rotation around Y-axis, (c) the rotation around Z-axis,(d) the translation in X-direction, (e) the translation in Y-direction, (f) the translation inZ-direction.
55
5 10 15 20 25 30 351
2
3
4
5
6
7
8
9
View No.
Ave
rage
Dis
tanc
e E
rror
(pix
el)
BeforeAfter
(a)
5 10 15 20 25 30 35−16
−14
−12
−10
−8
−6
−4
−2
0
2
View No.
Coi
ncid
ence
Inde
x (d
B)
BeforeAfter
(b)
FIGURE 22 – RTS Evaluation. (a) average distance (b) Coincidence Index (CI).
56
(a) (b)
(c) (d)
FIGURE 23 – Rendered views to show the alignment, with known parameters, of theground truth contours (blue) and the image contours (red). (a) initial alignment whereCI = −13.5dB, (b) initial alignment at orthogonal view where CI = −14dB, (c) finalalignment at the same view in (a) with CI = −1.9dB, and (d) final alignment at the sameview in (b) with CI = 0.6dB.
57
CHAPTER IV
PERFORMANCE EVALUATION: METHODOLOGIES AND MEASURES
Motivated by the objective of standardizing the evaluation process, we propose a
classification criterion. Based on this criterion we can classify the tests that measure the
performance of the 3-D reconstruction techniques into four sets: the operating conditions,
the complexity of data analysis, the generality of the measure, and the position of the test-
ing point. This classification will eventually help in providing a standard rank of different
performance evaluation tests, which is an important factor of deciding to what extent we
trust the results provided by a certain testing methodology.
In addition, we propose three performance evaluation methodologies (tests). The
first test is the Local Quality Assessment (LQA) test. This test quantifies the performance
of a given 3-D reconstruction with respect to a reference 3-D reconstruction provided by the
3-D laser scanner. It is designed to investigate local errors in the given 3-D reconstruction
by decimating it into different patches and measuring the quality of each patch. This makes
the error analysis much easier and permits the integration of different 3-D reconstruction
techniques based on the results of this test.
In contrast to the LQA test, we propose an Image Re-projection (IR) test based on
the assessment of the image quality. The test does not rely on the availability of explicit
ground truth data which, in general, are difficult to generate. The test uses the acquired
images as the reference of comparison with corresponding images, re-projected from the
given 3-D reconstruction. This test also measures the applicability of the 3-D reconstruc-
tion techniques for virtual reality problems.
To avoid errors due to color variations and the re-projection process in the IR test,
58
we propose a Silhouette-Contour Signature (SCS) methodology that extracts shape fea-
tures from silhouette and contour images and permits the inclusion of distinct, cutting,
views from the 3-D ground truth data.
A. Classification of Evaluation Techniques
Seeking for a standardization to the evaluation problem components, we propose
a classification criterion based on which we can classify different tests that measure the
performance of 3-D reconstruction techniques. Based on this classification, it will be easy
to qualify new proposed tests and identify the goals and benefits of applying such tests to
vision techniques. In addition, the classification could lead to a clue about the importance
of applying these tests.
The proposed classification is based on four sets: the operating conditions set A,
the complexity of data analysis set B, the generality of measures set C, and the position of
the test point set D:
Operating Conditions Set: based on the operating conditions we can identify two types
of tests:
• Dynamic tests: in this type, the test is performed under different conditions of light-
ing, interference, calibration, and object complexity. These tests should measure the
immunity of the vision technique to variations.
• Static tests: in this type, the test is performed under constant conditions. Actually,
these tests investigate the basic functionality of the vision technique.
Complexity of Data Analysis Set: tests could be quantitative or qualitative:
• quantitative tests: massive data are analyzed by these tests, statistical analysis can
be a part of these tests. A test is said to be quantitative if the data set under-test Mj ,
59
where Mj ⊂ M, has cardinality > βc card(M) where βc > 0.5.
• qualitative tests: the objective of these tests is to provide a quick figure of merit of
the performance of the vision technique under test. In this case, Mj has cardinality
< γc card(M) where γc < 0.5.
Generality of the Measure Set: measures could be global or local, hence we have two
type of tests:
• Global tests: these tests provide a single measure of the overall performance of the
vision technique under test. Such types of tests are of great importance because they
give a final decision on the technique’s performance.
• Local tests: these tests investigate the local errors provided by the vision technique.
Using local measures provided by the test, enhancement of the technique’s perfor-
mance could be possible.
Position of the Test Point Set: data can be tested in a form of 3-D data, a form that results
after applying a certain transformation to the 3-D data, or a form that requires a certain
transformation or criterion to get the 3-D data form. Based on this form we have three
types of tests:
• Type I tests: these tests are applied directly to the data set M. This means that
transformation C is unity. These tests are highly trusted because they work directly
on 3-D data sets avoiding errors introduced by such transformations.
• Type I+: unlike type I , these tests are applied to the data set D generated by applying
the transformation C to data set M. Errors should be predicted due to this additional
transformation step. As a result, these tests may underestimate the performance of
the given technique under-test. An example of this type is testing the data in the form
of 2-D intensity images.
60
• Type I−: like type I+, these tests are applied to the measured data set, however a
step before getting the data set M. Overestimation of the performance is predicted
when using these types of tests, because we test data in a form prior to the 3-D form.
An example of this type is testing the data in the form of disparity maps, the form of
data that needs further transformation or criterion to get the 3-D data form.
Based on the preceding classification a number of
card(A) × card(B) × card(C) × card(D)
different tests can be accomplished under this classification. The next proposition general-
izes the above formula.
Proposition 2. For disjoint test sets X1,X2, .....,Xk there exists:
card(X1) × card(X2) × .... × card(Xk) number of tests.
Proof : it is a generalization to the above formula.
According to the above classification, there are 2× 2× 2× 3 = 24 different types of tests.
B. Local Quality Assessment (LQA) Methodology
This section describes a proposed methodology for the performance characteriza-
tion of 3-D reconstruction techniques [44]. The given ground truth data and the measured
data are supposed to be registered to each other. A bounding box that contains the given
data is discritized into a number of surface patches, or voxels. A quality index is assigned
to each voxel based on the centroid and deviation from centroid measures applied to data
enclosed by that voxel. Statistical measures are applied to extract global measures from the
quality indices of the given data.
61
1. Performance Evaluation Procedure
Since the measured data, M, and G ′= T (G) have been aligned to each other, a
performance evaluation methodology can be applied to both sets to measure the similarity
between them.
Let U be a superset of M⋃
G ′that is upper bounded by point ub and lower bounded
by point lb.
Xm is a set of uniformly distributed 3-D points xjm, j = 1, 2, ..., Nm, in the space bounded
by ub and lb, and Nm is a user defined parameter.
Assume that M can be expressed as:
M =⋃j
M j, j = 1, 2, ..., Nm (43)
where M j is defined as:
M j = {m : m ∈ M, xjm − ΔX ≤ m ≤ xj
m + ΔX} (44)
where ΔX = (Δx, Δy, Δz), and Δx, Δy, and Δz are elementary distances in the space
whose values are determined by Nm, ub, and lb as:
ΔX =1
2 3√
Nm
(ub − lb) (45)
Similar definitions of G ′and G
′j are as follows:
G ′=⋃j
G′j, j = 1, 2, ..., Nm (46)
and
G′j = {g′
: g′ ∈ G ′
, xjm − ΔX ≤ g
′ ≤ xjm + ΔX} (47)
For each data subset pair (M j, G′j), we define a quality index Qj . First, we compute the
centroid of each subset assuming each 3-D point has unity mass:
CMj =1
card(M j)
card(Mj)∑i=1
(mi ∈ M j) (48)
62
and
CG′j =1
card(G′j)
card(G′j)∑
i=1
(g′i ∈ G
′j) (49)
where, card denotes the cardinality, then we compute the deviation of each subset as:
DMj =
√√√√ 1
card(M j) − 1
card(Mj)∑i=1
d2(mi ∈ M j, CMj) (50)
and
DG′j =
√√√√ 1
card(G′j) − 1
card(G′j)∑
i=1
d2(g′i ∈ G′j, CG
′j) (51)
where d denotes the distance. Define the centroid distance as:
Cjd = d(CMj , CG
′j) (52)
and the deviation ratio as:
RjD =
DMj
DG′j
(53)
then, the quality index Qj is defined as:
Qj =2Rj
D
(RjD)2 + 1
[1 − Cjd
Cmax
] (54)
where Cmax = 2√
(Δx)2 + (Δy)2 + (Δz)2.
The quality index Q has a dynamic range of [0 1] with Q=1 associates with the
highest quality. The quality index consists of two parts: the deviation index = 2RD
(RD)2+1,
shown in Figure 24a, and the centroid index = 1− Cd
Cmax, shown in Figure 24b. The highest
value is reached when the maximum similarity of the measured data and the ground truth
data is achieved. This happens when the the deviation ratio is 1 and the centroid distance
is zero. We also assume maximum similarity if both subsets of the pair (M,G ′) are empty,
however if only one subset is empty, a zero value of Q is assumed.
The definition of the Q measure is based on a feature matching criterion. This gives
an advantage to this measure over the closest distance measures, since it is insensitive to
63
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
RD
Devia
tio
n In
dex
(a)
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Centroid distance Cd
Cen
tro
id In
dex
(b)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5
q
f Q(q
)
α=0.1,β=0.1
α=2,β=2
α=1,β=3 α=3,β=1
(c)
FIGURE 24 – Local Quality Assessment (LQA) methodology. (a) the deviation index, (b)the centroid index, and (c) beta distribution at different shaping parameters.
64
the resolution of the reconstructions under-test [21].
Examples for values of Nm are 1, 8, 27, .... In general Nm = i3, where i is a positive
integer. The smaller the value of Nm, the more tendency of Q to average errors. On the
other hand, the greater the value of Nm, the more sensitivity of Q to the outliers.
Since Q describes the quality of different patches or voxels in the reconstruction
under-test, it can be used to fuse different reconstructions by selecting the best Q of cor-
responding patches from different reconstructions. This finds many applications in areas
such as data fusion.
2. Statistical Modeling of the Quality Index
Sometimes, we need a global description to the quality of a given reconstruction
for comparison purposes. Since it is difficult to get point-to-point comparison, we use the
values of Q to compare reconstructions on patch-to-patch basis. Here we use three methods
of comparison:
• Correlation Coefficient
• Chi-Square test
• Beta modeling
The first two methods are used to get global descriptions to the relative quality of different
reconstructions, while the beta modeling method is used to provide a global description of
the absolute quality.
Correlation Coefficient: The correlation coefficient ρq1q2 of the random variables
Q1 and Q2 is defined as:
ρq1q2 =Cq1q2
σq1σq2
(55)
−1 ≤ ρq1q2 ≤ 1, |Cq1q2 | ≤ σq1σq2
65
where Cq1q2 is the covariance and σq1 and σq2 are the standard deviations of Q1 and Q2
respectively.
Chi-Square Test: The Chi-Square test is used to compare two binned data sets and
determine if they are drawn from the same distribution function:
χ2(HQ1 , HQ2) =N∑
j=1
[C1 · HQ1(j) − C2 · HQ2(j)]2
HQ1(j) + HQ2(j)(56)
where HQ1 and HQ2 denote two histograms with N bins, and
C1 =
√NHQ2
NHQ1
, C2 =1
C1
NHQ1=
N∑i=1
HQ1(i), NHQ2=
N∑i=1
HQ2(i)
The values of X 2(HQ1 , HQ2) have the range 0-1 with near 0 values indicating better match-
ing or higher similarity between two reconstructions.
The above two methods are useful in the tracking of the performance of a certain algorithm
in response to different controlling parameters.
Beta Modeling: The general formula for the probability density function of the beta
distribution is:
fQ(q) =(q − a)α−1(b − q)β−1
B(α, β)(b − a)α+β−1(57)
where α and β are the shape parameters, a and b are the lower and upper bounds, respec-
tively of the distribution, a ≤ q ≤ b; α, β > 0 and B(α, β) is the beta function.
Since Q has a dynamic range from 0 to 1, we set a = 0, b = 1 in the above formula.
Figure 24c shows different plots of fQ(q) with different values of the shaping parameters
α and β. This figure shows the flexibility of the beta distribution in providing different
probability density functions of different shapes. This makes the beta distribution a logical
choice of modeling the quality index Q. The Maximum Likelihood Estimation (MLE) pa-
rameters α̂ and β̂, extracted from a random sample of size n of the random variable Q are
66
defined as:
α̂ = q̄
[[q̄(1 − q̄)
s2
]− 1
](58)
β̂ = (1 − q̄)
[[q̄(1 − q̄)
s2
]− 1
](59)
where q̄ stands for the sample mean and s2 represents the biased sample variance. We use
these estimators to find the quality estimate Pq(Q ≥ q) at different values of q as:
Pq(Q ≥ q) = 1 − 1
B(α̂, β̂)
∫ q
0
tα̂−1(1 − t)β̂−1dt (60)
the value of Pq(Q ≥ q) provides an estimate of the quality of a given reconstruction. The
higher the values of Pq(Q ≥ q), maximum is 1, at higher values of q, maximum is 1, the
most probable that the reconstruction is of high quality. This measure, in contrast to the
above two measures, gives both absolute and relative quality assessment.
Based on the classification of the testing methodologies, the LQA test is dynamic,
quantitative, local, and type I test. It is dynamic, because it does not put any constraints
on conditions of acquiring the data under test. It is quantitative, because it permits massive
analysis of the data under test. It is local, because it provides different quality values to
different subsets of the examined data. It is type I test, because it is applied directly to the
3-D data under test without using any transformation criteria.
An example for the Q values of two different reconstruction, M1 and M2, referenced
to the same 3-D ground truth data, using Nm = 216, is shown in Figure 25a. The M1
reconstruction is similar to the one shown in Figure 14a while the M2 is similar to the
one shown in Figure 14b. Comparing the values of Q measure for each patch of these
reconstructions, the M1 has registered higher quality values than M2. This also can be
shown by the quality estimate Pq(Q ≥ q) in Figure 25b. A specific value of q such as 0.9
can be chosen to get a specific estimate.
Since the LQA testing methodology assumes the availability of 3-D ground truth
data registered to the measured data, the popularity of this test could be limited. Other
67
20 40 60 80 100 120 140 160 180 2000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Voxel No.
Quali
ty In
dex Q
M1
M2
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
P q(Q≥
q)
M1
M2
(b)
FIGURE 25 – LQA test applied to different two reconstructions registered to the groundtruth data. (a) The quality index at each patch, and (b) the corresponding quality estimate.
68
evaluation methods are presented in the next sections to deal with the unavailability of such
ground truth data.
C. Image Reprojection (IR) Test
In cases where a 3-D ground truth is not available, the input images can be used as a
reference for comparison. Unlike [18] we use calibrated images, hence we have the ability
to re-project the given 3-D data into new different views without using prediction tech-
niques that may introduce errors in the generated new views. In addition, this technique is
general for any 3-D reconstruction technique, since we start from the measured 3-D points
not any other transformed form such as the disparity maps [45, 46].
Since image rendering finds many applications in virtual reality, we measure the
ability of the 3-D reconstruction technique to generate almost real images by the re-projection
process. Therefore, we test the re-projected images under an image quality framework un-
like the method in [14].
1. Image Quality Measures
Considering D and G as 2-D images, the signal to noise ratio (SNR) and peak signal
to noise ratio (PSNR) are used as quality measures.
SNR and PSNR are mean-squared (l2-norm) error measures [47]. SNR is defined
as the ratio of average signal power to average noise power. For an M × N image
SNR(dB) = 10 log10
( ∑i,j g(i, j)2∑
i,j (g(i, j) − d(i, j))2
)(61)
for 1 ≤ i ≤ M and 1 ≤ i ≤ N where g(i, j) denotes pixel (i, j) of the standard image and
d(i, j) denotes pixel (i, j) of the data image. PSNR is defined as the ratio of peak signal
69
power to average noise power
PSNR(dB) = 10 log10
(p2
mMN∑i,j (g(i, j) − d(i, j))2
)(62)
where pm is the maximum peak-to-peak swing of the image gray levels (255 for 8-bit
images).
Z. Wang et al. [48] proposed an Image Quality Measure (IQM) which models the
image degradation as structural distortion instead of errors. This quality measure, IQM, is
defined as:
IQM =4σgd(g)(d)
(σg2 + σd
2)((g)2 + (d)2)(63)
where
g = 1MN
∑MNi=1 gi, d = 1
MN
∑MNi=1 di
σg2 = 1
MN−1
∑MNi=1 (gi − (g))2, σd
2 = 1MN−1
∑MNi=1 (di − (d))2
σgd = 1MN−1
∑MNi=1 (gi − (g))(di − (d))
The dynamic range of IQM is [-1,1]. The best value 1 is achieved if and only if gi =
di for i = 1, 2, .....,MN . This quality index models any distortion as a combination of
three different factors: loss of correlation, mean distortion, and variance distortion. The
definition of the quality index can be written as a product of three components:
IQM =σgd
σgσd· 2(g)(d)
(g)2+(d)2 · 2σgσd
σg2+σd
2
The first component is the linear correlation coefficient between G and D, whose dynamic
range is [-1,1]. The second component, with a value range of [0,1], measures how close the
mean values are between G and D. It equals one if and only if g = d. The third component
measures how similar the variances of the signals are. Its range of values is also [0,1],
where the best value is achieved if and only if σg = σd.
Other measures can be used to assess the quality of given images such as the fuzzy
70
image metric (FIM) [52]. This measure is supposed to resemble some features of subjec-
tive assessments of humans rather than the objective assessments provided by the above
measures. However, it is not determined how this measure provides such subjective assess-
ments.
Although PSNR is a common measure of image quality, it may be biased. This is
because most of the image is background and most of the errors occur in the foreground, the
object, and these errors are compared to only one signal value, the peak. This apparently
reflects high signal to noise ratio. This gives the advantage to SNR measure over PSNR
measure. The IQM has the advantage that it provides a specific range of quality, so it
clearly reflects the level of similarity between the ground truth images and the re-projected
images. On the other hand, the SNR measure does not provide crisp values. Values of the
SNR measure greater than 10 dB are roughly considered good quality values, since they
mean that at least the signal power is 10 times the noise (distortion) power.
Figures 26a and 26b show the SNR and IQM values of a certain reconstruction at
12 different views. These figures show the similarity between the two measures in quan-
tifying the quality at each view of the reconstruction under test. An example of a ground
truth image is shown in Figure 27a. The corresponding re-projected image to the image in
Figure 27a is shown in Figure 27b. At this view, values of SNR � 11 and IQM � 0.95
are registered. The value of the IQM measure is more indicative than the value of the SNR
measure as shown by this example. However, the SNR value could provide an indication
about the signal level and the error level. A difference-image that encodes the absolute
error between the images in Figures 27a and 27b is shown in Figure 27c. This image can
provide a sense of the error compared to the signal at this view as provided by the SNR
measure.
71
2. The IR Test Procedure
The procedure of the proposed IR test is summarized as follows:
1. Apply the vision algorithm under test to the acquired sequence of images, sets G, to
generate the data set M
2. Apply Equations (6) and (14) to generate sets D of re-projected images
3. Apply Equation (61), (62), or (63).
4. Average the values obtained in the previous step to get a global measure.
Based on classification presented in section A, the IR test is a quantitative, dynamic, global,
and type I+ test. This means that the IR test of lower rank than the LQA test since it tests
the data in the transformed domain, images, not in the original domain, 3-D space. Errors
due to such transformations are expected (that is why it is classified as I+ test), hence the
lower rank of the IR test.
D. Silhouette-Contour Signature (SCS) Test Methodology
In the IR test, errors in color re-projection can affect accuracy of the measure. In
addition, colors in the 3-D model under-test could not be exact the same colors in the
original images. This variation is due to the color processing during the reconstruction
process or to the use of different sensors for the reconstruction. So, in this section we
propose a testing methodology that employs image silhouettes and their corresponding
contours. This feature lets us to add distinct views captured from different views than
the input views. These views can be generated synthetically using the 3-D ground truth
data. Shape features can be extracted from the re-projected silhouettes and their contours,
then compared to the similar features from the input silhouettes in addition to the distinct,
cutting, views.
72
1 2 3 4 5 6 7 8 9 10 11 125
6
7
8
9
10
11
View No.
SNR
(dB
)
(a)
1 2 3 4 5 6 7 8 9 10 11 120.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
View No.
IQM
(b)
FIGURE 26 – The IR test applied to a reconstruction of 12 input images. (a) The SNRvalues, and (b) the IQM values at each view.
73
(a) (b)
(c)
FIGURE 27 – IR test visual results. (a) An input image, (b) a re-projected image at thesame view in (a), and the difference image between (a) and (b) where darker pixels meanlower error and brighter pixels means larger error.
74
1. Shape Histogram Signature
Using silhouette images, histograms for image rows and columns can be generated.
Since the silhouette image is a binary image, the shape pixels can be counted horizontally
and vertically, then row and column histograms can be generated respectively. For the
image silhouette Is defined as:
Is(k1, k2) =
⎧⎪⎨⎪⎩
L1, if Is(k1, k2) is a silhouette point;
L2, otherwise.(64)
the row histograms Hr, and the column histogram Hc can be computed as follows:
Hr(k1) =Nw∑
k2=1
L2 ⊕ Is(k1, k2) (65)
and
Hc(k2) =
Nh∑k1=1
L2 ⊕ Is(k1, k2) (66)
respectively, where ⊕ denotes the XOR operation. To measure the similarity between the
shape histograms for the projected silhouettes; Hpr , Hp
c and the input silhouettes; H inr , H in
c
we use the χ2 measure in Equation 56, then we can define the dimension similarity measure
DIM as:
DIM = max(χ2(Hpr , H in
r ), χ2(Hpc , H in
c )) (67)
Our interpretation to the shape histograms is that:
• the row histogram computes the effective width of the shape at each row in the sil-
houette image
• the column histogram computes the effective height of the shape at each column in
the silhouette image
hence, the name of the dimension-similarity measure.
75
2. The Error Ratio
The error ratio can be used to get general description of the error if the projected
silhouette, Is, is compared to the input silhouette, Iin as:
Er =
∑∑Is ⊕ Iin∑∑L2 ⊕ Iin
(68)
Er computes the ratio of false matches between the input and projected silhouette to the
total number of silhouette points in the input silhouette.
3. Boundary Signature
Contours can be used to generate boundary signatures. The contour describes the
geometric features of the object. Many signatures for object contours have been proposed
for the purposes of object recognition and registration [53, 54]. However, most of these
signatures are designed to be invariant to certain transformation parameters to fit the re-
quirement of these applications. For the evaluation purposes, the invariant condition can be
relaxed, since the ground truth images and the re-projected images are already registered.
The shape signature that we propose depends on detecting certain shape configurations in
the ground truth images. Similar configurations should be detected in the re-projected im-
ages, otherwise non-zero error should be detected.
For each three adjacent points p1 = [x1, y1]T , p2 = [x2, y2]
T , and p3 = [x3, y3]T
on image contour as shown in Figure 28, where p2 is the middle point, two angles can be
computed to determine almost unique configuration as:
Ψ = cos−1(−−→p2p1 · −−→p2p3
||−−→p2p1|| · ||−−→p2p3||) (69)
where −−→p2p1 and −−→p2p1 are two vectors, and
Φ = tan−1(y3 − y1
x3 − x1
) (70)
76
FIGURE 28 – An example of the 17 possible configurations describing the shape bound-aries.
Combining these angles, 17 different configurations can be detected at image contours.
These configurations are listed in Figure 29. By detecting and counting these shape config-
urations in a ground truth contour image and the corresponding re-projected contour image,
two histograms H ina and Hp
a can be generated. Then, the matching measure χ2(Hpa , H in
a ) is
computed.
Combining the shape histogram and the boundary signature a χ2s measure can be
computed as:
χ2s = ωχ2(Hp
a , H ina ) + (1 − ω)DIM (71)
where ω, 0 < ω < 1, is a controlling parameter. The dynamic range of χ2 is [0,1], where 0
represents the perfect match value.
Figure 30 shows different shapes in both silhouette and contour forms. The col-
umn, the row, and the angle histograms for the rectangular shape and its rotated version are
shown in Figures 31a, 31b, and 31c respectively. The column and row histograms can be
interpreted as the effective height and width of the shape respectively. This gives the ad-
vantages to these shape descriptors of tracking the changes in the sizes of the tested shapes
as reflection to the changes of tested 3-D reconstruction. Changing the orientation of the
rectangle is also detected by the angle histogram as shown in Figures 31c.
The histograms for the circular and elliptic shapes are shown in Figure 32. Use-
ful information about the circle diameter and the major and minor axes of the ellipse can
be extracted from the column and row histograms in Figure 32a and 32b. Also, note the
77
(1) Ψ = π/4,Φ = −π/2 (2) Ψ = π/4,Φ = 0 (3) Ψ = π/4,Φ = π/2
(4) Ψ = π/2,Φ = −π/2 (5) Ψ = π/2,Φ = −π/4 (6) Ψ = π/2,Φ = 0
(7) Ψ = π/2,Φ = −π/4 (8) Ψ = π/2,Φ = −π/2 (9) Ψ = 3π/4,Φ ≈ 0.353π
(10) Ψ = 3π/4,Φ ≈ 0.15π (11) Ψ = 3π/4,Φ ≈ −0.15π (12) Ψ = 3π/4,Φ ≈ −0.353π
(13) Ψ = π,Φ = π/2 (14) Ψ = π,Φ = π/4 (15) Ψ = π,Φ = 0
( (16) Ψ = π,Φ = −π/4 (17) Ψ = π,Φ = −π/2
FIGURE 29 – A number of 17 possible configurations for three adjacent contour points.
78
differences between the angle histogram for the circle and the ellipse in Figure 32c. The
more flatness in the elliptic shape, is reflected by the increased count of configuration #15
(horizontal line) compared to the circular shape.
Examples of re-projected contours from a real reconstruction are shown in Fig-
ure 33. The re-projected contours (red) are compared to the ground truth contours (blue)
using the shape histograms. The similarity of the re-projected contour and the ground truth
contour in Figure 33a is reflected by the similarities of the shape histograms in Figure 34.
In addition, the dissimilarity of the re-projected contour and the ground truth contour in
Figure 33b is reflected by the dissimilarities of the shape histograms in Figure 35.
In general, the presented shape histograms can provide signatures for different
shapes and detect the changes in the boundaries of these shapes. Although the contours pro-
vide useful information about the shape geometry, little variations in these contours due to
the preprocessing techniques, such as edge detection, can affect the angle histogram. How-
ever, this may not affect measuring the similarity of the ground truth and the re-projected
shapes if the same edge detection operator is applied to both shapes.
E. Summary
In this chapter, three testing methodologies are presented. The first test is the Local
Quality Assessment (LQA) test. This test quantifies the performance of a given 3-D recon-
struction with respect to a reference 3-D reconstruction provided by the 3-D laser scanner.
It is designed to investigate local errors in the given 3-D reconstruction by decimating it
into different patches and measuring the quality of each patch. This makes the error analy-
sis much easier and permits the fusion of different 3-D reconstruction techniques based on
the results of this test.
An Image Re-projection (IR) testing methodology is presented to cope with the un-
availability of the 3-D ground truth data. The test uses the acquired images as the reference
79
of comparison with the corresponding images, re-projected from the given 3-D reconstruc-
tion. This test also measures the applicability of the 3-D reconstruction techniques for
virtual reality problems.
To avoid errors due to color variations and the re-projection process in the IR test,
we propose a Silhouette-Contour Signature (SCS) methodology. The test extracts shape
features form the silhouette and contour images and permits the inclusion of distinct, cut-
ting, views from the 3-D ground truth data.
A classification criterion for testing methodologies is also presented. Based on this
criterion we can classify the tests that measure the performance of the 3-D reconstruction
techniques into 24 types of tests. This classification will eventually help in getting standard
ranking to reflect the validity of such test.
80
FIGURE 30 – Examples of basic geometric shapes: rectangle, rotated rectangle, circle andellipse .
81
0 50 100 150 200 250 3000
20
40
60
80
100
120
140
160
Column No.
Eff
ecti
ve H
eigh
t
rect.rotated rect.
(a)
0 50 100 150 200 250 3000
20
40
60
80
100
120
140
160
180
Row No.
Eff
ecti
ve W
idth
(p
ixel
s)
rect.rotated rect.
(b)
(c)
FIGURE 31 – Shape signatures for the rectangle and the rotated rectangle shapes in Fig-ure 30. (a) The column histogram, (b) the row histogram, and (c) the boundary histogram.
82
0 50 100 150 200 250 3000
50
100
150
200
250
300
Column No.
Eff
ecti
ve H
eigh
t (p
ixel
s)
CircleEllipse
(a)
0 50 100 150 200 250 3000
50
100
150
200
250
300
Row No.
Eff
ecti
ve W
idth
(p
ixel
s)
CircleEllipse
(b)
(c)
FIGURE 32 – Shape signatures for the circle and the ellipse shapes in Figure 30. (a) Thecolumn histogram, (b) the row histogram, and (c) the boundary histogram.
83
(a)
(b)
FIGURE 33 – Examples of ground truth (blue) and measured (red) shapes. (a) almost sim-ilar shapes, and (b) partially similar shapes.
84
0 50 100 150 200 250 3000
20
40
60
80
100
120
140
160
180
200
Column No.
Eff
ecti
ve H
eigh
t (p
ixel
s)
ground truthmeasured
(a)
0 50 100 150 2000
20
40
60
80
100
120
140
160
180
Row No.
Eff
ecti
ve W
idth
(p
ixel
s)
(b)
(c)
FIGURE 34 – Shape signatures for shapes in Figure 33 a. (a) The column histogram, (b)the row histogram, and (c) the boundary histogram.
85
0 50 100 150 200 250 3000
50
100
150
200
250
Column No.
Eff
ecti
ve H
eigh
t (p
ixel
s)
(a)
0 50 100 150 2000
20
40
60
80
100
120
140
Row No.
Eff
ecti
ve W
idth
(p
ixel
s)
ground truthmeasured
(b)
(c)
FIGURE 35 – Shape signatures for shapes in Figure 33 a. (a) The column histogram, (b)the row histogram, and (c) the boundary histogram.
86
CHAPTER V
Experimental Evaluation of the Space Carving Technique: A Case Study
Space Carving is a common technique for the 3-D reconstruction from a sequence
of images. The key advantage of the space carving is that it has relaxed many constraints on
the commonly used stereo techniques and effectively solved the occlusion problem; a major
problem in the stereo vision. In this chapter, we provide an experimental evaluation of the
space carving technique. Based on the framework presented in this work, the effects of
key parameters of space carving on its performance are examined. In addition, evaluation
remarks are presented to draw conclusions about the performance of the space carving
technique.
A. Shape Recovery by Space Carving
The space carving technique exploits the fact that points on Lambertian surfaces are
color-consistent, i.e. they have the same color in all images that can see them. The method
starts with an arbitrary number of calibrated images of a scene and an initial volume of ar-
bitrary resolution such that the volume encloses the captured scene. Each volume element
(voxel) in the initial volume is projected to the set of images from which it is visible. If
the voxel is projected onto inconsistent colors in the images then it is carved, otherwise it
is retained and assigned a color. The algorithm stops when all examined voxels pass the
photo-consistency check, i.e. when there are no more voxels to carve.
The performance evaluation of the space carving algorithm (or its variants) is mostly
treated in the literature in a qualitative sense [13, 49], or using synthetic data [50]. Brief
87
quantitative evaluation using real data is presented in [51]. Based on our proposed frame-
work for performance evaluation, we introduce an extensive evaluation to the space carving
algorithm. We study the effects of different key parameters of the space carving technique.
We emphasize in this study the arbitrary-tuned parameters and the ideal assumptions used
in the space carving algorithm.
Specifically, we study the effects of the arbitrarily selected number of input images,
the camera pose and the initial volume resolution on the performance of space carving. The
effects of the ideal assumption of Lambertian surfaces and the noise level on the validity of
the photo-consistency check are examined as well. The testing methodologies presented in
the previous chapter will be applied to space carving technique through this study.
B. Experimental Evaluation of Space Carving
In this section we apply the proposed testing methodologies to the space carving
approach as a case study. We study the effects of the following on the performance of
space carving approach:
• the number of input images
• the distribution of cameras (or camera pose)
• the effect of selecting the photo-consistency-check threshold
• the effect of noise
• the effect of the resolution of the initial volume
meanwhile we watch how the proposed methodologies can converge to the same conclusion
about the performance of the space carving approach.
88
1. The Effect of the Number of Input Images
We study the effect of the input number of images on the space carving perfor-
mance using four sets of images: a superset of 36 images and three subsets of the superset
with 18, 12, 9 input images. An initial volume of 241×241×241 voxels with dimensions
1.25×1.25×1.25 (mm)3 is used through this experiment. The space carving is applied to
each set of images and the output is examined using three types of tests: Local Quality
Assessment (LQA), Image Re-projection (IR), and Silhouette-Contour Signature (SCS) as
follows.
a. LQA Test The RTS registration technique is used to align the output of space
carving to the ground truth data using four input silhouettes, then the LQA test is applied
to the registered data sets. The values of the quality index, Q, for Nm = 216, are calcu-
lated for reconstructions generated from 36, 18, 12, and 9 inputs. Histograms for the Q
values for each reconstruction are shown in Figure 36a. As shown from the figure, there
is a noticeable difference between the 9-reconstruction and all other reconstructions in this
example. To get a quantitative measure, we find the probability estimate Pq(Q ≥ q) of the
given reconstructions based on samples of their Q values. The quality estimate of the 36-,
18-, 12-, and 9-reconstruction are plotted in Figure 36b. To get a specific and standard mea-
sure we use q = 0.9 and hence P0.9(Q ≥ 0.9) to indicate the level of quality of the given
reconstruction. For example, the 36-reconstruction achieves P0.9(Q ≥ 0.9) = 0.59 which
indicates that almost 60% of the surface patches have quality index equal to or greater than
0.9 in a probabilistic sense. The P0.9 values for the other reconstructions are summarized
in Table 1. These values indicate the lower quality of the 9-reconstruction than the others
that scored close values.
The reason for the degradation of the 9-reconstruction can be indicated from the
final number of voxels in each competitive reconstruction. Table 3 shows that the 9-
reconstruction has 6,000 more voxels than the 36-reconstruction. This indicates that the
89
TABLE 2THE EFFECT OF THE NUMBER OF INPUT IMAGES ON THE PERFORMANCE OF
SPACE CARVING.
No. of input images 36 18 12 9
P0.9(Q ≥ 0.9) 0.5927 0.5583 0.5426 0.4189
μsnr 9.1243 9.0617 9.0634 7.3907
μχ2 0.0141 0.0171 0.0178 0.0289
μEr 0.0854 0.0997 0.1063 0.1815
9-reconstruction has a larger size, i.e. it experiences a fattening problem. More interpreta-
tions to this problem can be extracted from Figure 37.
The quality index values for each patch in both 36- and 9-reconstruction are shown
in Figure 37. The 9-reconstruction scored many quality indices of zero-value while the
36-reconstruction scored values greater than zero at the same patch. This means that the 9-
reconstruction patches are in complete mismatch with the ground truth patches. This leads
to the conclusion that at these zero-value patches either the ground truth reconstruction pro-
vides empty set while the 9-reconstruction is not or vice versa. Since the 36-reconstruction
has a good match with the ground truth data at these patches and has a lower size than the
9-reconstruction, then the 9-reconstruction is expected to be extended beyond the ground
truth reconstruction, hence it experiences a fattening problem. This fattening effect can be
visually seen from the results of the IR and SCS tests in the following sections.
b. IR Test The IR test is applied to the same reconstructions. A number of 36
re-projected images are computed and compared to the original images to find the values of
the SNR measure. Figure 38 shows the SNR values for each view for different number of
input images. As shown in this figure, the SNR values are almost the same for the 36, 18,
and 12 cases, however they are lower in the case of 9 images. Table 2 shows the mean, μsnr
of the SNR values in each case. The value of μsnr for the 9-reconstruction has the lowest
90
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
Pq(Q
≥ q)
3618129
(b)
FIGURE 36 – LQA test results when the number of input images to the space carving ischanged. (a) the histograms of the quality index at different input images and (b) thequality estimate.
91
0 20 40 60 80 100 120 140 160 180 200−0.5
0
0.5
1
1.5
Patch No.
Q
369
FIGURE 37 – The quality index for two different reconstructions. Two plots for the qualityindex for a reconstruction using 36 input images and a reconstruction using 9 input images.
TABLE 3THE FINAL NUMBER OF VOXELS AND THE RUN TIME, ON ONYX2 SGI
MACHINE, FOR 36-, 18-, 12- AND 9-RECONSTRUCTION.
No. of input images 36 18 12 9
Final No. of Voxels 81535 83233 83076 87591
Run time (minutes) 126.8 55.7 39.7 26.8
92
value among the other reconstructions.
Figures 38b, and 38c show two re-projected images at the same views for the 9- and
12-reconstruction, respectively. These re-projected images are subtracted from the orig-
inal images. The result of subtracting the images is the absolute-error encoded-images
where darker pixels indicate lower error value. The difference images are shown in Fig-
ures 38d and 38e for 9- and 12-reconstruction, respectively. As shown in Figure 38d, the
9-reconstruction has a larger size than the 12-reconstruction. This proves the consistency
between the visual assessment and the quantitative results provided by the LQA and the IR
tests.
c. SCS Test Since the SCS test depends only on the silhouette images and their
corresponding contours, not on intensity images, a cutting view can be generated from the
registered ground truth data given by the 3-D laser scanner. The cutting view is supposed to
provide distinct information about the object under concern. A top/bottom image to the ob-
ject is considered distinct since it is projected to a perpendicular plane to the input images.
Figures 39a, b, c, and d show cutting contour images projected from the 36-, 18-, 12-, and
9-reconstruction respectively. The blue contours in these images represent the ground truth
contour while the red contours represent the under test contours. Note how the fattening
effect is clear in the 9-reconstruction case.
The χ2 and the Er values at each view for each reconstruction under-test are shown
in Figures 40a and 40b respectively. Both measures indicate higher values for the 9-
reconstruction than the other reconstructions. The mean values, μχ2 and μEr , of these
measures are summarized in Table 1. Note the higher values of μχ2 and μEr for the 9-
reconstruction case, which indicate lower quality reconstruction.
d. Evaluation Remarks The house-object used in the current experiment has
large homogenous, same intensity, areas. So, the 9 images used in this experiment are
not enough to guide the space carving to carve more inconsistent voxels, hence the larger
93
size reconstruction. In other words, voxels reconstructed at incorrect depths are shown
to be consistent since they are projected to pixels of the same intensity due to the large
homogenous areas of the object.
On the other hand, using 36 images did not enhance the reconstruction much more
than the 12- and 18- reconstructions. This means that adding more input images could not
enhance the reconstruction if they are not able to put more constraints on the shape under
reconstruction. In this case, redundant input images should be detected then removed to
permit faster reconstruction. As shown in Table 3, the 36-reconstruction needs almost
three times more run time than the 12-reconstruction, even the enhancement in the output
reconstruction is marginal. The following can be concluded about the performance of space
carving due to the effect of the number of input images:
• Lower number of input images to the space carving could result in fatter reconstruc-
tions since the low number of images provides less constraints on the shape hence
the algorithm stops carving. This effect is maximized when the object under recon-
struction has large homogenous areas.
• Higher number of input images can help the space carving to provide good approx-
imation to the shape only if these images provide enough constraints on the shape.
Otherwise, some of these images are considered redundant.
2. Effect of the Camera Pose
As concluded from the above discussion, for the house-object, 9 images may not
be enough for better quality reconstruction, because the object has large homogenous areas
that need more input images to help the space carving putting constraints on the shape of
the object. However, we can enhance the 9-reconstruction slightly if we redistribute the
camera positions.
94
a. LQA Test Since we have a superset of 36 images we can divide them into 4
sets: S1, S2, S3, and S4 where each set has 9 images, then apply the space carving technique
to each set. The histograms of the Q values of the four sets and the quality estimates are
shown in Figure 41a and 41b respectively. From the histograms we note that the set S3 has
the best result among all other 9-reconstruction sets. Better results are scored for S2 and S4
over S1. Quantitative results using the probability estimate P0.9 for each 9-reconstruction
set are shown in Table 4.
Noting that the set S1 was used in the comparison with the 12-, 18-, and 36- recon-
structions in the previous experiment, better reconstructions could be reached if the views
used in the reconstruction are changed even when the number of views is kept unchanged.
Visual results would help understanding this effect, when displaying with the results of the
IR and the SCS tests.
b. IR Test The SNR values are computed for each view whether it is used in the
reconstruction under test or not. Figure 42a shows the SNR values for each 9-reconstruction
set. These results show some variations in the SNR values for the reconstruction under-test
at each view. Some reconstructions have better SNR values at certain views but worse at
other views. This means that the output reconstruction is view-dependent, i.e. based on
the views used in this reconstruction some parts of the object can be reconstructed better
than the others. The same well-reconstructed parts of the object in the previous reconstruc-
tion may not be well-reconstructed if other views were used for the current reconstruction.
However, on the average we can select the best reconstruction among all others.
Table 4 shows that the S3-reconstruction has the best μsnr among all other recon-
structions in this experiment. Figures 42b and 42c, show re-projected images at one view
for the S3- and S4-reconstruction, respectively. The difference images are shown in Fig-
ures 42d and 42 for the S3- and S4-reconstruction, respectively. Compared with Figure 38d
for the S1-reconstruction, the fattening effect has been slightly reduced for S4 set and nearly
95
5 10 15 20 25 30 350
2
4
6
8
10
12
14
16
Image No.
SNR
(dB
)
3618129
(a)
(b) (c)
(d) (e)
FIGURE 38 – IR test results when the number of input images to the space carving ischanged. (a) the SNR measure values at different views for different reconstructions, (b) Arendered view of the 9-image reconstruction, (c) A rendered view of the 12-image recon-struction, (d) A difference image between the rendered view in (b) and the original imageat the same view, and (e) A difference image between the rendered view in (c) and theoriginal image at the same view.
96
(a) (b)
(c) (d)
FIGURE 39 – Rendered cutting-views (red) for different reconstructions when the numberof input images to the space carving is changed. The reference cutting view is shown inblue. (a) 36-image reconstruction, (b) 18-image reconstruction, (c) 12-image reconstruc-tion, and (d) 9-image reconstruction.
97
5 10 15 20 25 30 350
0.02
0.04
0.06
0.08
0.1
0.12
View No.
χ2
3618129
(a)
5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
View No.
Er
3618129
(b)
FIGURE 40 – SCS test results when the number of input images to the space carving ischanged. (a) χ2 test measure and (b) the error ratio measure.
98
TABLE 4THE EFFECT OF THE CAMERA POSE ON THE PERFORMANCE OF SPACE
CARVING.
9-reconstruction Set S1 S2 S3 S4
P0.9(Q ≥ 0.9) 0.4189 0.4316 0.4908 0.4300
μsnr 7.3907 7.9933 8.4108 7.4498
μχ2 0.0289 0.0257 0.0206 0.0273
μEr 0.1815 0.1484 0.1281 0.1704
reduced for S3 set. The cutting-view, from top, can provide a clue about the size changes of
the 9-image reconstructions used in this experiment. This will be shown among the results
provided by the SCS test in the next section.
c. SCS Test Using the cutting view in addition to the 36 input views and com-
puting the χ2 and Er measures at each view, we can get the same conclusion about the
quality of 9-reconstructions sets as given by the previous tests. The cutting view for each
reconstruction is plotted with the reference cutting-view as shown in Figure 43. Note how
the fattening effect strongly appears in some parts of the object and is reduced at the same
parts when the input views are changed.
Figures 44a and 44b show that at the cutting view, the χ2 and Er values are higher.
That is because the cutting view shows the fattening appeared in all side views. The overall
assessment by the χ2 and Er measures is shown in Table 4. From the results in the table,
the S3-reconstruction scored the best values among the other 9-reconstructions. This also
shows the consistency of the applied tests in judging the quality of the given reconstruc-
tions.
d. Evaluation Remarks One of the advantages of the space carving algorithm
is that it permits arbitrary camera positions. However, different arbitrary views could lead
to different quality reconstructions. Here we conclude that:
99
• Different camera distributions provide different reconstructions. So, “arbitrary” should
not be used as an absolute word when selecting the camera positions for the space
carving technique. Some camera positions can provide good reconstruction but oth-
ers may not.
• The geometry and shape features should be considered when selecting the poses of
cameras to the space carving technique. Assigning more cameras to featureless areas
and cameras that capture the geometric features of the shape could help getting better
reconstructions.
3. Effect of the Photo-consistency Threshold
The standard space carving technique uses a global threshold to determine the
photo-consistency of pixels. The variance is computed for pixels that are candidate pro-
jections of a given voxel. The voxel is carved when the variance is greater than a global
threshold (Th). So, selecting the value of this threshold affects the whole performance of
the space carving technique. The smaller the value of Th, the more the space carving is
discarding voxels from the output reconstruction. The larger the value of Th, the more the
space carving is retaining voxels that may incorrectly increase the size of the output recon-
struction. The three types of tests are applied to the space carving at different threshold
values to show the threshold effect on the performance of space carving.
a. LQA Test Space carving is applied to a set of 36 images and the threshold
values are changed from 30 to 100. The histograms of the quality index values for selected
reconstruction are shown in Figure 45a. It shows that Th=40 is a critical value. This can be
shown in Figure 45b. The 40-reconstruction shows the best quality among the others. The
30-reconstruction has the worst reconstruction because the space carving wrongly classi-
fied some voxels as photo-inconsistent then carved them. For Reconstructions of threshold
100
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
Pq(Q
≥ q)
S1
S2
S3
S4
(b)
FIGURE 41 – LQA test results when the camera pose is changed. (a) the histograms of thequality index at sets of 9-image reconstructions and (b) the quality estimate.
101
5 10 15 20 25 30 350
2
4
6
8
10
12
14
16
Image No.
SNR
(dB
)
S1
S2
S3
S4
(a)
(b) (c)
(d) (e)
FIGURE 42 – IR test results when the camera pose is changed. (a) SNR measure value fordifferent 9-image reconstructions. Re-projected images of a 3-D reconstruction by spacecarving given 9 input images of: (b) set S4, and (c) set S3. Difference images between there-projections of 3-D reconstruction and the input images at the same view, given: (d) setS4 and (e) set S3. The fattening effect is reduced as shown in (e).
102
(a) (b)
(c) (d)
FIGURE 43 – Rendered cutting-views (red) for different reconstructions when the camerapose is changed using sets: (a) S1, (b) S2, (c) S3, and (d) S4.
103
5 10 15 20 25 30 350
0.02
0.04
0.06
0.08
0.1
0.12
View No.
χ2
S1
S2
S3
S4
(a)
5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
View No.
Er
S1
S2
S3
S4
(b)
FIGURE 44 – SCS test results when the camera pose is changed. (a) The χ2 test measureand (b) the error ratio measure.
104
more than 40, the space carving wrongly added inconsistent voxels to the output recon-
struction. That is why the degradation in the quality increases with the increase of the
threshold value. The values of P0.9 measure for selected thresholds are shown in Table 5.
This means that the optimal threshold value can be found between Th=30 and Th=50 and
is expected to be closer to 40. Visual results are also provided in the next sections to show
the threshold effect.
b. IR Test The SNR values are computed for each reconstruction as shown
in 46a. It shows that the best quality is registered for the 40-reconstruction. This is also
shown by the values of μsnr in Table 4. Difference images are shown in Figures 46b, c,
d, and e for the 30-, 40-, 50-, and 100-reconstruction, respectively. The over-carving is
clearly shown in Figure 46b for the 30-reconstruction, while the under-carving is shown in
Figures 46d, and e for the 50-, and 100-reconstruction, respectively. The 40-reconstruction
is over-carved at some areas, so a little increase of the threshold value over 40 may fix
the reconstruction at these areas, however with the possibility of under-carving other areas.
This means that the choice of the optimal value of the threshold is a tricky task.
c. The SCS Test The cutting-views of the reconstructions under-test are shown
in Figure 47. The 30-reconstruction is shown to be strongly damaged while, the 50- and
100-reconstruction are shown to be over-sized.
Figures 48a and 48b show the values of the χ2 and Er at each view. It is shown
that the value of the χ2 measure at the cutting-view, view No. 37, is higher while the Er
is lower at the same view for the 30-reconstruction. This is because the angle signature
is dominating the value of the χ2 measure since there are many curvature dissimilarities
with the ground truth at this view. However, the Er measures count the erroneous pixels
disregarding the curvature. By the same reason, we can explain why the μχ2 value in
Table 5 indicates better quality for the 50-reconstruction than the 40-reconstruction. The
over-carved in the 40-reconstruction is penalized much by the χ2 measure than the Er
105
measure because of the curvature dissimilarities in the 40-reconstruction.
d. Evaluation Remarks The space carving algorithm is based on the Lamber-
tian assumption. However, this assumption is ideal, i.e. it is practically invalid. To cope
with this invalidity, a certain threshold should be set to manage the carving process. How-
ever, the choice of an optimal value of this threshold is tricky and the inaccurate choice
may lead to over-carved or under-carved the output reconstruction. Treating the photo-
consistency threshold under a probabilistic framework can reduce its effect on the perfor-
mance of the space carving as shown in [51].
4. Effect of Noise
Zero-mean Gaussian noise of standard deviation of 1% of the intensity value at each
pixel is added to the input images. Again, the threshold value is varied from 30-100.
In this experiment we track the combined effect of noise and threshold on the per-
formance of the space carving. The histograms in Figure 49a and the quality index values
plotted in Figure 49b show that the 30-reconstruction is strongly affected. The degradation
in the quality happened because of the noise, which turned some voxels from being con-
sistent to being inconsistent. This effect is also shown by Figures 50a and Figures 52a and
b. Visual results are shown in Figure 50b and Figure 51a. On the other hand, the noise has
slightly enhanced the 40- and the 50-reconstruction as shown by the different measures in
Table 6 compared to Table 5. This is somewhat a logical result of adding noise since it
changes the color distribution of the images. So, it can change the status of the voxel from
being photo-consistent to being photo-inconsistent and vice versa.
Evaluation Remark Noise affects the performance of the space carving technique.
It changes the status of a voxel from being inconsistent to being consistent and vice versa.
The value of the threshold can be changed to cope with noise problems, however with
the risk of accepting inconsistent voxels when higher thresholds are used. On the other
106
TABLE 5THE EFFECT OF THE PHOTO-CONSISTENCY THRESHOLD ON THE
PERFORMANCE OF SPACE CARVING.
Threshold (Th) 30 40 50 100
P0.9(Q ≥ 0.9) 0.4318 0.6177 0.5963 0.4838
μsnr 4.3757 10.6051 10.0889 7.8691
μχ2 0.0678 0.0138 0.0122 0.0185
μEr 0.2973 0.0663 0.0767 0.1250
TABLE 6THE EFFECT OF NOISE AT DIFFERENT THRESHOLDS ON THE PERFORMANCE
OF SPACE CARVING.
Threshold (Th) 30 40 50 100
P0.9(Q ≥ 0.9) 0.3770 0.6413 0.6097 0.4848
μsnr -0.8645 10.6290 10.2561 7.8251
μχ2 0.1754 0.0135 0.0120 0.0188
μEr 0.6198 0.0615 0.0706 0.1235
107
hand, noise could be helpful if it were to “Lambertianize” the input images. Enhancing
the histograms of the input images can lead to better photo-consistent/photo-inconsistent
check disregarding the validity of the Lambertian assumption. In other words, an image
processing step can be applied to the input images to enforce the Lambertian assumption.
5. Effect of the Initial Volume Resolution
Three different initial volume resolutions are examined in this experiment with cu-
bic voxel dimension δ = 1.25 mm, higher resolution, δ = 2.00 mm, and δ = 2.50 mm.
The LQA test is applied to the output reconstruction of each resolution. The quality esti-
mate values for these reconstructions are shown in Figure 53a. As shown in the figure, the
output reconstructions at the given resolution achieve almost the same quality. This means
that at these resolutions, the geometric features of the 3-D shape are almost the same, hence
the close quality estimates. Visually, there are differences, as shown in Figures 53c, d, and
e compared to Figures 53b. But this also confirms that the geometric features of the object
are still preserved. This leads to the conclusion that if the geometric features of an ob-
ject are preserved at somewhat lower resolutions, then the resultant could be of acceptable
quality. Accepting low-resolution reconstructions will save much of the run time needed
for higher-resolution reconstructions. The run time for reconstructions at different resolu-
tions is shown in Table 7.
Evaluation Remark If the geometric features of an object are preserved at somewhat
lower resolutions, then the resultant could be of acceptable quality with the gain of lower
run time.
108
TABLE 7THE RUN TIME OF THE SPACE CARVING ALGORITHM AT DIFFERENT
RESOLUTIONS AND DIFFERENT NUMBERS OF INPUT IMAGES.
Parameter δ=1.25 mm δ=2.00 mm δ=2.50 mm
initial No. of voxels 13997521 3442951 1771561
No. of Input Images =36
final No. of voxels 81535 32361 20063
Run time (minutes) 126.8 18.5 7.8
No. of Input Images =18
final No. of voxels 83233 33310 20459
Run time (minutes) 55.7 8.6 3.9
No. of Input Images =12
final No. of voxels 83076 33305 20606
Run time (minutes) 39.7 6.4 2.6
No. of Input Images =9
final No. of voxels 87591 35069 21519
Run time (minutes) 26.8 4.3 1.8
109
C. Summary
In this chapter, an experimental evaluation of the space carving technique is pre-
sented. The evaluation procedures used in this study are based on the presented perfor-
mance evaluation framework. In this study, we track the response of the space carving to
the changes in the key controlling parameters of the algorithm.
The number of input images to the space carving algorithm is a key parameter. This
study has shown that a minimum number of input images should be applied to the algo-
rithm to achieve acceptable results. This number is dependent on the geometric features
and textures of the object under reconstruction. Higher number of images may lead to
better reconstructions only if the added images introduce constraints on the shape of the
object. In addition, the distribution of the cameras that capture the scene should not be
totally arbitrary, since different distributions can provide different-quality reconstructions.
The photo-consistency check threshold is another key parameter. The selection of
this threshold is tricky. Incorrect selection to this parameter could lead to over- or under-
carved reconstructions. “Lambertianizing” the input images could provide a way to avoid
tuning such tricky parameter.
The resolution of the initial volume, hence the resolution of the output reconstruc-
tion, can be coarse if the geometric features of the output reconstruction are preserved. This
permits getting fast reconstructions, hence the applicability to real time systems.
Similar studies can be applied to other 3-D reconstruction techniques to characterize
their performance, since the presented framework is independent of the 3-D reconstruction
technique under-test.
110
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
Pq(Q
≥ q)
Th=30Th=40Th=50Th=60Th=70Th=80Th=90Th=100
(b)
FIGURE 45 – LQA test results when the photo-consistency threshold is changed. (a) thehistograms of the quality index at different thresholds and (b) the quality estimate.
111
5 10 15 20 25 30 35−10
−5
0
5
10
15
Image No.
SNR
(dB
)
30405060708090100
(a)
(b) (c)
(d) (e)
FIGURE 46 – IR test results when the photo-consistency threshold is changed. (a) the SNRmeasure values at different views for different reconstructions at different thresholds. Arendered view of a reconstruction using: (b) Th=30 (c) Th=40, (d) Th=50, and (e) Th=100.
112
(a) (b)
(c) (d)
FIGURE 47 – Cutting-view image for different reconstructions at different thresholds. (a)Th=30, (b) Th=40, (c) Th=50, and (d) Th=100.
113
5 10 15 20 25 30 350
0.02
0.04
0.06
0.08
0.1
0.12
View No.
χ2
30405060
(a)
5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
View No.
Er
304050100
(b)
FIGURE 48 – SCS test results when the photo-consistency threshold is changed. (a) χ2 testmeasure and (b) the error ratio measure.
114
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
Pq(Q
≥ q)
Th=30Th=40Th=50Th=60Th=70Th=80Th=90Th=100
(b)
FIGURE 49 – LQA test results when Gaussian noise is added and the photo-consistencythreshold is changed. (a) the histograms of the quality index at different thresholds and (b)the quality estimate.
115
5 10 15 20 25 30 35−10
−5
0
5
10
15
Image No.
SNR
(dB
)
30405060708090100
(a)
(b) (c)
(d) (e)
FIGURE 50 – IR test results when Gaussian noise is added and the photo-consistencythreshold is changed. (a) the SNR measure values at different views for different recon-structions at different thresholds. A rendered view of a reconstruction using: (b) Th=30 (c)Th=40, (d) Th=50, and (e) Th=100.
116
(a) (b)
(c) (d)
FIGURE 51 – Cutting-views image for different reconstructions at different thresholds withGaussian noise is added to the input images. (a) Th=30, (b) Th=40, (c) Th=50, and (d)Th=100.
117
5 10 15 20 25 30 350
0.05
0.1
0.15
0.2
0.25
View No.
χ2
304050100
(a)
5 10 15 20 25 30 350
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
View No.
Er
304050100
(b)
FIGURE 52 – SCS test results when Gaussian noise is added and the photo-consistencythreshold is changed. (a) χ2 test measure and (b) the error ratio measure.
118
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
Pq(Q
≥ q)
δ =1.25 mmδ =2.00 mmδ =2.50 mm
(a)
(b) (c)
(d) (e)
FIGURE 53 – Effect of the initial volume resolution on the space reconstruction quality.(a) the quality estimate at different resolutions, (b) one out of 36 input images of the. Re-projected images at different resolutions of 3-D reconstruction by space carving with voxelsare represented by their centers at: (c) δ=1.25 mm, (d) δ = 2.00 mm and (e) δ = 2.5 mm.
119
CHAPTER VI
APPLICATIONS (POST-EVALUATIONS)
The current 3-D laser scanners can provide good reconstructions. However, the
laser projection is not guaranteed on all surfaces especially those that exhibit occlusion
problems. In addition, the standard methods for extracting range data from optical triangu-
lation scanners are accurate only for planar objects of uniform reflectance illuminated by
an incoherent source. Using these methods, curved surfaces, discontinuous surfaces, and
surfaces of varying reflectance cause systematic distortions of the range data. Coherent
light sources such as laser introduce speckle artifacts that further degrade the data [55].
In this chapter, we integrate the output reconstructions by a 3-D laser scanner and
the space carving technique such that the output reconstruction contains best features of
both reconstructions. To achieve this goal, we employ performance evaluation methodolo-
gies to investigate the local quality of each reconstruction assuming full-alignment of the
two reconstructions.
Here, we introduce a simple fusion algorithm that uses the contours of a given 3-D
object under-reconstruction to guide the fusion decision task. The fusion decision in the
proposed technique is a challenging problem since there is no reference 3-D reconstruction
that may guide the fusion process. The object contours extracted from the given images of
that object are used to take the fusion decision, we call them the Ground Truth Contours
(GTC). Similar contours of the given object are also extracted from the 3-D reconstruc-
tions under-fusion, we call them the Measured Contours (MC). The 3-D surface patches
that have the closest MC to the corresponding GTC are selected in the final 3-D recon-
struction [56].
120
The system design is another application of the performance evaluation framework.
A draft design for a 3-D scanner is presented. Specifications for the draft scanner can be
computed after an evaluation phase of the scanner components. The evaluation remarks
from the first evaluation phase are then used to redesign the scanner. The evaluation-
redesign cycle may be repeated to get the final design.
A. A 3-D Fusion Methodology
Assume that there are two 3-D reconstructions Ω1 and Ω2 that are fully aligned and a
quality index per patch/voxel is assigned to each pair of patches of the two reconstructions.
Assume that one of the given reconstructions is derived from a set I of calibrated images
of cardinality N .
A sequence of pre-processing techniques such as image segmentation, filtration and
edge detection are applied to the set I to generate a set C of contour images defined as:
C = {cl : cl ⊂ C, l = 1, ....., N}, ωl ⊂ cl (72)
where ωl is the contour at view l. Using the projection matrix at each view as sets I and
C, the two reconstructions Ω1 and Ω2 are projected to the same view and the generated
silhouette images are then processed to generate the contour images C1 and C2 where
C1 = {cl1 : cl
1 ⊂ C1, l = 1, ....., N}, ωl1 ⊂ cl
1 (73)
and
C2 = {cl2 : cl
2 ⊂ C2, l = 1, ....., N}, ωl2 ⊂ cl
2 (74)
respectively.
A synthetic set of images K is generated such that
K = {kl : kl ⊂ K, (ωl ∪ ωl1 ∪ ωl
2) ⊂ kl, l = 1, ....., N} (75)
121
and
kl(x, y) =
⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩
L1, if cl(x, y) ∈ ωl;
L2, if cl1(x, y) ∈ ωl
1;
L3, if cl2(x, y) ∈ ωl
2;
L4, otherwise.
(76)
where L1, L2, L3, and L4 are different gray levels, 1 ≤ x ≤ Nh, 1 ≤ y ≤ Nw, and Nh×Nw
is the cardinality of kl.
Each image kl is uniformly divided into image windows W lj , where j = 0, ....., Nm−
1 and Nm is the number of windows W lj in image kl. A number Nc < Nm of windows W l
j
that contain the contour subsets ξjω, ξj
ω1, and ξj
ω2, as shown in Figure 54a, are elected for the
closest point and the closest contour tests as follows.
1. The Closest Point Test
For each point piw ∈ ξj
ω, i = 1, ......, card(ξjω) where card(ξj
ω) is the cardinality of
ξjω, the closest points, pci
w1and pci
w2are calculated as:
pciw1
= pcr1w1
(77)
such that
d(piw, pr1
w1) = min
h∈{1,....card(ξjω1
)}d(pi
w, phw1
) (78)
and
pciw2
= pcr2w1
(79)
such that
d(piw, pr2
w2) = min
h∈{1,....card(ξjω2
)}d(pi
w, phw2
) (80)
where d denotes the Euclidean distance.
122
2. The Closest Contour Test
To determine which of ξjω1
and ξjω2
is closer to ξjω, the average distances dav(ξ
jω, ξj
ω1)
and dav(ξjω, ξj
ω2) are calculated as:
dav(ξjω, ξj
ω1) =
1
card(ξjω)
card(ξjω)∑
i=1
di(piw, pci
w1) (81)
and
dav(ξjω, ξj
ω2) =
1
card(ξjω)
card(ξjω)∑
i=1
di(piw, pci
w2) (82)
then the closest contour segment is the one that has the minimum of dav(ξjω, ξj
ω1) and
dav(ξjω, ξj
ω2).
It is important to note that other methods that can extract information about the
shape of the contours, e.g. gradient methods, can be used to determine the closest contour
in addition to the closest Euclidean distance as described above.
3. The Fusion Decision
In 3-D space, the surface segments corresponding to the the 2-D contours are de-
termined during the object projection phase. Therefore the surface segment corresponding
to the closest contour is already known in the 3-D space. As shown in Figure 54b, the
3-D segment Ξ0Ω1
is corresponding to the closest contour segment ξ0ω1
in Figure 54a. A
cubic voxel V0 that has the same centroid as Ξ0Ω1
is constructed to include Ξ0Ω1
. The surface
patches/voxels inside V0 are elected from the 3-D reconstruction Ω1 to be in the final re-
construction. The process is repeated for each contour and surface segment to reconstruct
the final output.
123
4. Experimental Results
An experiment is performed on the fusion of the 3-D reconstructions by the 3-D
scanner and the space carving technique. Figure 55a shows screen captures of the 3-D
reconstruction by the 3-D scanner for a bear object. Projecting sharp and thin laser lines
on the surface of this bear is not guaranteed, hence the errors in the output reconstruction.
Filling gaps in the surface can fix some errors in the smoothed parts, the back of the bear,
while for complex parts that have discontinuities, the front of the bear, the filling does not
provide significant enhancements as shown in Figure 55b.
The space carving technique is applied to a number of 12 images of the bear. A sam-
ple of the input images is shown in Figure 56a. A ground truth silhouette image extracted
from Figure 56a is shown in Figure 56b. The measured silhouette images at the same view
as in Figure 56b extracted by projecting the 3-D reconstructions by space carving and the
3-D laser scanner are shown in Figure 56c and Figure 56d, respectively. The measured
silhouettes indicate differences between the reconstructions by space carving and the 3-D
laser scanner.
Figure 57a shows a contour image for the ground truth contour (GTC), white, ex-
tracted from the silhouette image in Figure 56b and a measured contour (MC), black, ex-
tracted from the silhouette in Figure 56c. A similar image is shown in Figure 57b, however
the MC is extracted from the silhouette in Figure 56d. These contour images give a clue
about which reconstruction at this view and at specified surface segment is closest to the
desired reconstruction. Some contour segments at the back of the bear in Figure 57a and
b show that the reconstruction by the 3-D scanner is closer to the desired reconstruction,
however at the top of the bear’s head the reconstruction by the space carving is the closer.
The proposed 3-D fusion technique is applied to the given reconstructions of the
bear by the 3-D scanner, shown in Figure 55a, and by the space carving shown in, Fig-
ure 58a. The fusion results are shown in Figure 58b. As shown in the figure the fusion pro-
124
(a)
(b)
FIGURE 54 – Basic idea of the 3-D fusion methodology. (a) the ground truth contour(GTC) and the measured contours (MC) form two different reconstructions, and (b) the 3-Dreconstructions at certain view from which the above contours in Figure 54a are extracted.
cess can enhance the 3-D reconstruction of a given object by selecting well-reconstructed
surface segments from each reconstruction and integrate them into one reconstruction. It is
important to note that for objects that have concavities, the fusion decision can not be taken
based on the closest contour method. This represents a limitation of this technique.
B. System Design
One of the applications of the performance evaluation is the system design. A draft
design can be assumed, then a sequence of performance evaluation methodologies are ap-
125
(a)
(b)
FIGURE 55 – Screen captures of a 3-D reconstruction by 3-D laser scanner. (a) withoutfilling the gaps (b) after filling the gaps.
126
(a) (b)
(c) (d)
FIGURE 56 – Silhouette images for the 3-D fusion technique. (a) An example of inputimage to the space carving technique, (b) a silhouette image extracted from the image inFigure 56a, (c) a silhouette image extracted from the reconstruction by space carving at thesame view as in Figure 56b, and (d) a silhouette image extracted from the reconstructionby a 3-D laser scanner at the same view as in Figure 56b.
127
(a) (b)
FIGURE 57 – Contour images for the 3-D fusion technique. (a) A contour image shows theground truth contour (GTC), white, extracted from the silhouette image in Figure 56b anda measured contour (MC), black, extracted from the silhouette in Figure 56c and (b) sameimage as in Figure 57a however the MC is extracted from the silhouette in Figure 56d.
(a) (b)
FIGURE 58 – Snapshots for 3-D reconstruction. (a) the reconstruction by space carving,and (b) the fusion of the scanner and space carving reconstructions based on the closestcontour method.
128
plied to the system output. The effect of changing some system parameters can be tracked
using the evaluation methods. Adjustments and modifications can be applied to the draft
design to enhance the system performance. The design-evaluate cycle can be repeated until
the optimal design is reached.
The experimental test-bed presented in Chapter II can be used as a stand alone pas-
sive 3-D scanner if we exclude the function of the laser as shown in Figure 59. The space
carving is employed to find the 3-D reconstruction of the object under concern. From the
evaluation results presented in the previous chapter for the space carving technique, we can
set the initial specifications of the scanner as given in Table 8.
The values in Table 8 may not be the optimal parameters. As shown from the cut-
ting views captured for the space carving reconstructions in the previous chapter, more
images should be added from other views. A top view for the house object is shown in
Figure 60a. A screen capture for a space carving reconstruction is shown in Figure 60b.
These two views show that the top of the house is not sharply reconstructed, because all the
input images were from side views. Adding a top camera CCD 2, as shown in Figure 61,
can provide more constraints on the shape of the top part of the house-object. The value of
the angle θ should be selected to permit overlapping between the images acquired by CCD
1 and CCD 2. Another cycle of evaluation should be performed to test this design. The
design-evaluate cycle should be repeated until satisfactory results are reached.
C. Summary
In this chapter, two examples of the post-evaluation processes are presented. The
first example is a technique for the 3-D fusion of different 3-D reconstructions based on a
closest contour criterion. The fusion decision is taken based on the evaluation of the quality
of the reconstructions under-fusion. The output reconstruction of this fusion procedure is
assumed to have better quality than a single reconstruction.
129
FIGURE 59 – A draft design to a passive 3-D scanner.
Another post-evaluation process is the system design. A draft design for a passive
3-D scanner is presented based on the evaluation results in the previous chapter. Modifica-
tions are applied to the draft design to enhance the performance of the system.
130
TABLE 8INITIAL SPECIFICATIONS OF A PASSIVE 3-D SCANNER BASED ON THE
RECONSTRUCTION BY THE SPACE CARVING TECHNIQUE.
Parameter min typical max
δ (mm) 2.50 1.25 -
Number of input images 12 36 -
Th 40 45 50
FIGURE 60 – Top views for the house object. (a) original image and (b) screen capture forthe 3-D reconstruction.
131
FIGURE 61 – Another draft design to a passive 3-D scanner.
132
CHAPTER VII
Conclusions and Future Directions
The 3-D reconstruction from sequence of images finds many applications in mod-
ern computer vision such as virtual reality, vision-guided surgeries, autonomous navigation,
medical studies and simulations, reverse engineering, and architectural design. The very
basic requirement of these applications is to find accurate and realistic reconstructions.
While many 3-D reconstruction approaches are proposed to achieve the above re-
quirement, there is still a lack of standard and widely accepted methodologies of quantify-
ing the performance of these approaches.
Motivated by the fact that the performance evaluation process plays an important
role in guiding and measuring the progress in the field which, in turn, will lead to im-
provements in both theory and applications of 3-D reconstruction research, we introduce a
computational framework for the performance characterization of 3-D reconstruction tech-
niques from sequence of images.
In this work, we proposed a unified computational framework to rectify the situa-
tion of the lack of global ground truth data sets and the testing methodologies applicable
to different 3-D reconstruction approaches. The contributions of this dissertation can be
stated as follows.
A. Contribution to Data Acquisition and System Design
This dissertation introduces a new design for an experimental setup that integrates
the functionality of laser scanners and CCD cameras. The system is able to collect very
133
dense ground truth data and their corresponding intensity images. The system contains very
efficient data acquisition modules that guarantee generating high quality intensity data. The
intensity data set is calibrated, segmented, and automatically registered to the ground truth
data. These data sets can be used by different 3-D reconstruction techniques, including the
stereo and the volumetric based approaches. This unique feature of this setup motivates us
to build an evaluation database that includes ground truth data, input images, calibration
parameters of the camera, and the 3-D registration parameters. This database will be avail-
able for the public-use to bridge the gap caused by the unavailability of global experimental
data sets.
B. Contribution to the 3-D Data Registration
A novel technique for 3-D data registration is presented. This technique is dedi-
cated to the evaluation procedures that aim at localizing errors in the data under-test. The
approach, unlike the conventional 3-D data registration techniques, does not rely on the
presence of the 3-D reconstruction under test during the registration phase. This gives a
major advantage to this approach, since the 3-D reconstruction could be of low quality
that might add difficulties to any 3-D registration technique. In addition, if the actual 3-D
reconstructions under test were used in the registration phase, then some errors that the
evaluation process tries to investigate might disappear during the minimization step used
by any 3-D registration technique. The approach employs silhouette images to align the
given data sets. Undistorted silhouette images can be generated easily, hence permitting
good data sets for the registration process. The approach is simple and efficient and can be
applied to any 3-D registration problem assuming the availability of a calibrated sequence
of images describing one of the data sets under registration.
134
C. Contribution to the Performance Evaluation Methodologies and MeasuringCriteria
Three testing methodologies are presented. The first test is the Local Quality As-
sessment (LQA) test. This test quantifies the performance of a given 3-D reconstruction
with respect to a reference 3-D reconstruction provided by the 3-D laser scanner. It is
designed to investigate local errors in the given 3-D reconstruction by decimating it into
different patches and measure the quality of each patch. This makes the error analysis
much easier and permits the integration of different 3-D reconstruction techniques based
on the results of this test.
An Image Re-projection (IR) testing methodology is presented to cope with the un-
availability of 3-D ground truth data. The test uses the acquired images as the reference
of comparison with corresponding images, re-projected from the given 3-D reconstruction.
This test also measures the applicability of the 3-D reconstruction techniques for virtual
reality problems.
To avoid errors due to color variations and the re-projection process in the IR test,
we propose a Silhouette-Contour Signature (SCS) methodology that extracts shape features
from silhouette and contour images and permits the inclusion of distinct cutting views from
the 3-D ground truth data.
A classification criterion for testing methodologies is also presented. Based on this
criterion we can classify the tests that measure the performance of the 3-D reconstruction
techniques into 24 types of tests. This classification will eventually help in getting a stan-
dard ranking to reflect the validity of such test.
D. Contribution to the Experimental Evaluation of 3-D Reconstruction Techniques
An experimental evaluation of the space carving, as a recent common technique for
3-D reconstruction from a sequence of images, is presented. The evaluation procedures
135
used in this study are based on the presented performance evaluation framework. In this
study, we track the response of the space carving to the changes in the key controlling pa-
rameters of the algorithm.
The number of input images to the space carving algorithm is a key parameter. This
study has shown that a minimum number of input images should be applied to the algo-
rithm to achieve acceptable results. This number is dependent on the geometric features
and textures of the object under reconstruction. A higher number of images may lead to
better reconstructions only if the added images introduce constraints on the shape of the
object. In addition, the distribution of the cameras that capture the scene should not be
totally arbitrary, since different distributions can provide different-quality reconstructions.
The photo-consistency check threshold is another key parameter. The selection of
this threshold is a tricky. Incorrect selection to this parameter could lead to over- or under-
carved reconstructions. “Lambertianizing” the input images could provide a way to avoid
tuning such a tricky parameter.
The resolution of the initial volume, hence the resolution of the output reconstruc-
tion, can be coarse if the geometric features of the output reconstruction are preserved. This
permits faster reconstructions, hence the applicability to real time applications.
Similar studies can be applied to other 3-D reconstruction techniques to characterize
their performance, since the presented framework is independent of the 3-D reconstruction
technique under-test.
E. Applications
Two applications for the performance evaluation framework are presented. The first
application is the 3-D data fusion of different 3-D reconstructions. A fusion technique
based on the image contour comparison is presented. The technique rectifies the 3-D re-
construction based on the closeness of its projected contours to the ground truth contours.
136
The method is used to combine reconstructions generated by a 3-D laser scanner and the
space carving technique.
The second application is the system design. A draft design for a passive 3-D scan-
ner is presented. The design is based on the experimental results of evaluating performance
of the space carving. The proposed scanner should be able to reconstruct surfaces that the
commercial 3-D laser scanner may not be able to reconstruct.
F. Future Extension
In future work, we will investigate the following extensions to the proposed frame-
work.
1. Data acquisition
• Add more cameras to provide cutting views. Instead of using the ground truth data
(which need to be registered to the measured data) to provide cutting views for the
SCS test, one or more cameras can be added to the evaluation setup to provide such
views. However, this will add extra work for the camera calibration process. Design
of a multi-planar calibration pattern to calibrate this cluster of cameras, in addition
to the main camera, can facilitate the calibration process.
• Extend the evaluation database to include different data sets for different complexity
test-objects.
2. 3-D Data Registration
• Find a relation between the error in the projected silhouettes and the actual error in
the 3-D space to expedite the registration process.
137
• Using the color clue in the registration process. Maximizing the mutual information
between the two different sources of colors provided by the CCD camera and the 3-D
laser scanner, can be a solution to aligning images from different modalities.
• Investigate using deterministic optimization approaches such as graph cut instead of
the genetic algorithms.
3. Testing Methodologies and Measures
• The SCS methodology can be extended to the 3-D space. Matching signatures from
the 3-D ground truth surfaces and the measured surfaces can provide a way of inves-
tigating the quality of a given reconstruction.
• Investigate using subjective measures for the IR test. As these measures aim to simu-
late the human sensing of visual cues, they are difficult to design and computationally
expensive.
4. The Performance of Space Carving Technique
• Investigate using methods such as the invariant features or the optical flow to deter-
mine the minimum number of images, given a superset of input images, required by
the space carving technique to enhance the output reconstruction.
• Investigate using image processing enhancement techniques to Lambertianize the
input images for the space carving to enhance the photo-consistency check.
In general, the proposed framework can be applied to different 3-D reconstruction tech-
niques from a sequence of images. Similar studies to that of space carving can be applied
to the level set approaches for shape recovery. Evaluation of stereo techniques based on
3-D ground truth data can also be a future study.
138
REFERENCES
[1] E. Trucco and A. Verri, Introductory Techniques for 3-D Computer Vision, PrenticeHall Inc., 1998.
[2] D. Marr and T. Poggio “Cooperative computation of stereo disparity”, Science, 194,pp. 283-287, Sep. 1976.
[3] http://cat.middlebury.edu/stereo/data.html
[4] H. Baker, and T. Binford,“Depth from edge and intensity based stereo”, Proceedingsof Seventh International Joint Conference on Artificial Intelligence, Vancouver, 1981,pp. 631-636.
[5] C. Loop and Z. Zhang, “Computing rectifying homographies for stereo vision,”Peoceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR’99), vol. I, Ft. Collins, CO, June 23-25, 1999, pp. 125-131.
[6] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy and minimizationvia graph cuts,” Proceedings of IEEE International Conference On Computer Vision(ICCV’99), Kerkyra, Greece, Sep. 20-27, 1999, pp. 377-384.
[7] S. Roy and I. J. Cox, “A maximum-flow formulation of the N-camera stereo corre-spondence problem,” Proceedings of International Conference on Computer Vision(ICCV’98), Bombay, India, Jan. 4-7, 1998, pp. 492-499.
[8] M. Okutomi and T. Kanade, “A multiple baseline stereo,” IEEE Transactions on Pat-tern Analysis and Machine Intelligence, vol. 15, no. 4, pp. 353-453, April 1993.
[9] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereocorrespondence algorithms”, International Journal for Computer Vision, 47(1):7-42,May 2002.
[10] A. Laurentini, “The visual hall concept for silhouette-based image understanding,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 2, pp.150-162, Feb. 1994.
[11] G. Cheung, T. Kanade, J-Y. Bouguet, and Holler, “A real time system for robust 3Dvoxel reconstruction of human motion,” Proceedings of IEEE Conference on Com-puter Vision and Pattern Recognition, vol. 2, South Carolina, June 13-15, 2000, pp.714-720.
139
[12] S. Seitz and C. Dyer, “Photorealistic scene reconstruction by voxel coloring,” Pro-ceedings of Computer Vision and Pattern Recognition Conference (CVPR’97), PuertoRico, June 17-19, 1997, pp. 1067-1073.
[13] K. Kutulakos and S. Seitz. “Theory of shape by space carving,” Proceedings of IEEEInternational Conference on Computer Vision, Proceedings of IEEE InternationalConference On Computer Vision (ICCV’99), Kerkyra, Greece, Sep. 20-27, 1999, pp.307-314.
[14] W. Culbertson, T. Malzbender, and G. Slabaugh, “Generalized voxel coloring,” Inter-national Workshop on Vision Algorithms, Corfu, Greece, 1999, pp. 100-115.
[15] O. Faugeras and R. Keriven, “Variational principles, surface evolution, PDE’s, levelset methods and the stereo problem”, IEEE Transactions on Image Processing, vol. 7no. 3 pp. 336-344, 1998.
[16] C. Dyer, “Volumetric scene reconstruction from multiple views”, In L.S. Davis, editor,Foundations of Image Understanding, pp. 469-489. Kluwer, Boston, 2001.
[17] G. Slabaugh, B. Culbertson, T. Malzbender, and R. Schafer, “A survey of methodsfor volumetric scene reconstruction from photographs”, In K. Mueller and A. Kauf-mann, editors, Proceedings of the Joint IEEE TCVG and Eurographics Workshop(VolumeGraphics-01), Wien, Austria, June 21-22, 2001, pp. 81-100.
[18] R. Szeliski “ Prediction error as a quality metric for motion and stereo” Proceedingsof IEEE International Conference on Computer Vision (ICCV’99), Kerkyra, Greece,Sep. 20-27, 1999, pp. 781-788.
[19] R. Bolles, H. Baker, and M. Hannah, “The JISCT stereo evaluation”, Proceedings ofDARPA Image Understanding Workshop, 1993, pp. 263-274.
[20] R. Szeliski and R. Zabih “ An experimental copmarison of stereo algorithms,” Inter-national Workshop on Vision Algorithms, Corfu, Greece, 1999, pp. 1-19.
[21] J. Mulligan, V. Isler, and K. Daniilidis “Performance evaluation of stereo for tele-presence,” Proceedings of IEEE International Conference on Computer Vision, vol.II, Vancouver, Canada, July 7-14, 2001, pp. 558-565.
[22] P. J. Besl and N. McKay,“A method for registration of 3-D shapes,” IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 14, no. 2 pp. 239-256, March 1992.
[23] Z. Zhang, “Iterative point matching for registration of free form curves and surfaces,”International Journal of Computer Vision, 13:119-152, 1994.
[24] A. Fitzgibbon,“Robust registration of 2D and 3D points,” Proceedings of British Ma-chine Vision Conference (BMVC’01), vol. II, Manchester, UK, Sep. 10-13, 2001, pp.411-420.
140
[25] D. Chetverikov, D. Svirko, D. Stepanov, and P. Kresk,“The trimmeed iterative clos-est point algorithm,” Proceedings of International Conference of Pattern Recognition(ICPR’02), vol. III, Quebec, Canada, Aug. 11-15, 2002, pp. 545-548.
[26] S. Yamany and A. Farag,“Surface Signatures: An orientation independent free-formsurface representation scheme for the purpose of objects registration and matching,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1105-1120,2002.
[27] C. S. Chua and R. Jarvis,“Point signatures: A new representation for 3d object recog-nition,” International Journal of Computer Vision 25:63-85, 1997.
[28] C. Dorai and A. K. Jain,“Cosmos-a representation scheme for 3d free form objects,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 8, pp. 1115-1130,1997.
[29] A. Johnson and M. Helbert, “Surface matching for object recognition in complexthree-dimensional scenes,” Image and Vision Computing, vol. 16, pp. 635-651, 1998.
[30] A. Eid, S. Rashad and A. Farag, “Validation of 3D reconstruction from sequence ofimages,” Proceedings of the International Conference on Signal Processing, PatternRecognition, and Applications (SSPRA’02), Crete, Greece, June 25-28, 2002, pp. 375-380.
[31] S. Seitz, J. Kim,“The space of all stereo images,” Proceedings of Eighth IEEE Inter-national Conference on Computer Vision (ICCV’01), vol. II, Vancouver, Canada, July7-14, 2001, pp. 558-565.
[32] A. Elgammal, D. Harwood, and L. Davis, “Non-parametric model for backgroundSubtraction,” 6th European Conference on Computer Vision (ECCV’00), vol. II,Dublin, Ireland, June 26-July 1, 2000, pp. 751-767.
[33] A. Eid, S. Rashad and A. Farag, “A general purpose platform for 3D reconstructionfrom sequence of images,” Proceedings of Fifth International Conference on Infor-mation Fusion (IF’02), vol. I, Annapolis, MD, July 7-11, 2002, pp. 425-413.
[34] R. Hartley and A. Zisserman, Multiple View Geometry in computer vision, CambridgeUniversity Press., 2000.
[35] O. Faugeras and Q. Luong, The Geometry of Multiple Images, The MIT Press., 2001.
[36] R. Tsai, “An efficient and accurate camera calibration technique for 3d machine vi-sion,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR’86), Miami Beach, FL, 1986, pp. 364-374.
[37] L. Robert. “Camera Calibration without feature extraction,” Computer Vision and Im-age Understanding, 63(2):314-325, March 1996.
141
[38] A. Eid and A. Farag, “Design of an experimental setup for performance evaluation of3-D reconstruction techniques from sequence of images,” Eighth European Confer-ence on Computer Vision (ECCV’04), Workshop on Applications of Computer Vision,Prague, Czech Republic, May 11-14, 2004, pp. 69-77.
[39] A. Eid and A. Farag, “A unified framework for performance evaluation of 3-D recon-struction techniques,” IEEE Conference on Computer Vision and Pattern Recognition(CVPR’04), Workshop on Real-time 3-D Sensors and their Use, Washington DC, June27-July 2, 2004.
[40] H. Lensch, W. Heidrich and H. Seidel, “A silhouette-based algorithm for texture reg-istration and stitching,” Graphical Models vol. 63, no. 4, pp. 245-262, 2001.
[41] A. Agarwal and B. Triggs, “3D human pose from silhouettes by relevance vector re-gression,” Proceedings of IEEE Conference on Computer Vision and Pattern Recog-nition, vol. II, Washington DC., June 27-July 2, 2004, pp. 882-888.
[42] S. Sinha, M. Pollefeys, and L. McMillan, “Camera network calibration from dy-namic silhouettes,” Proceedings of IEEE Conference on Computer Vision and PatternRecognition (CVPR’04), vol. I, Washington DC., June 26-July 2, 2004, pp. 195-202.
[43] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning.Addison-Wesley Publishing Company Inc., 1989.
[44] A. Farag and A. Eid, “Local quality assessment of 3-D reconstructions from sequenceof images: a quantitative approach,” Advanced Concepts for Intelligent Vision Systems(ACIVS’04), Brussels, Belgium, Aug. 31-Sep. 3, 2004, pp. 161-168.
[45] A. Eid and A. Farag, “On the performance characterization of stereo and space carv-ing,” Proceedings of Advanced Concepts for Intelligent Vision Systems (ACIVS’03),Ghent, Belgium, Sep. 2-5, 2003, pp. 291-296.
[46] A. Eid and A. Farag, “On the performance evaluation of 3-D reconstruction tech-niques from a sequence of images,” EURASIP Journal on Applied Signal Processing,to appear 2005.
[47] N. Damera-Venkata, et al. “Image quality assessment based on a degradation model,”IEEE Transactions on Image Processing, vol. 9, no. 4, pp. 636-650, April 2000.
[48] Z. Wang, A. Bovik, and L. Lu, “Why is image quality assessment so difficult,” Pro-ceedings of IEEE International Conference on Acoustics, Speech, and Signal Pro-cessing (ICASSP’02), vol. IV, Orlando, FL, May 13-17, pp. 3313 -3316.
[49] A. Yezzi, G. Slabaugh, A. Broadhurst, R. Cipolla and R. Schafer, “A surface evolutionapproach to probabilistic space carving,” Proceedings of First International Sympo-sium on 3D Data Processing Visualization and Transmission (3DPVT’02), Padova,Italy, June 19-21, 2002, pp. 618-621.
142
[50] A. Broadhurst and R. Cipolla, “A statistical consistency check for the space carvingalgorithm,” Proceeding of Eleventh British Machine Vision Conference (BMVC’00),Bristol, UK, Sep. 11-14, 2000, pp. 282-291.
[51] A. Broadhurst T.W. Drummond and R. Cipolla, “A probabilistic framework for spacecarving,” Proceedings of Eigth IEEE International Conference on Computer Vision,vol. I, Vancouver, Canada, July 7-14, 2001, pp. 388-393.
[52] J. Li, G. Chin, and Z. Chi, “A fuzzy image metric with application to fractal coding,”IEEE Transactions on Image Processing, vol. 11, no. 6, pp. 636-643, June 2002.
[53] A. Sajjanhar and G. Lu, “A comparison for techniques for shape retrieval,” Proceed-ings of International Conference on Computational Intelligence and Multimedia Ap-plications, Monash University, Gippsland Campus, Feb. 9-11, 1998, pp. 854-859.
[54] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition usingshape contexts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no.24, pp. 509-522, 2002.
[55] B. Curless and M. Levoy,“Better optical triangulation through spacetime analysis,”Proceedings of Fifth International Conference on Computer Vision (ICCV’95), Cam-bridge, MA, June 20-23, 1995, pp. 987-994.
[56] A. Eid and A. A. Farag, “On the fusion of 3-D reconstruction techniques,” Proceed-ings of Seventh International Conference on Information Fusion (IF’04), Stockholm,Sweden, June 28-July 1, 2004, pp. 856-861.
143
APPENDIX I
PROJECTIVE GEOMETRY
Euclidean geometry describes our world well. However for the purpose of describ-
ing projections, projective geometry is a more adequate framework. The parallel railroad
tracks are parallel lines in 3-D space, however they are not in their images, and they seem
to intersect at a vanishing point at the horizon. Projective geometry is an extension to
Euclidean geometry, which describes a larger class of transformations than just rotations
and translations, including in particular the perspective projection performed by a camera.
Simply, it makes it possible to describe naturally that phenomenon at infinity. The most im-
portant aspect of projective geometry is the introduction of homogenous coordinates which
represent a projective transformation as matrix multiplication. This allows for using simple
matrix algebra for most computations, which was a difficult task if Euclidean geometry
were used. In the next sections we will describe the projective representations of the basic
geometrical entities in both 2-D and 3-D space. In addition, a brief description of basic
transformation ranging from Euclidean geometry to projective geometry is presented.
A. 2-D Projective Geometry
a. Points and Lines in P2 In homogenous coordinates the representation of
lines and points is augmented by a third coordinate in addition to the inhomogeneous coor-
dinates in R2. A line l in plane is represented by the equation:
ax + by + c = 0 (83)
144
that can be described by the vector (a, b, c)T .
The vectors (a, b, c)T and k(a, b, c)T represent the same line for any non-zero scal-
ing factor k. An equivalence class of vectors under this scaling relation is known as a
homogeneous vector. Any particular vector (a, b, c)T is a representative of the equivalence
class. The set of equivalence classes of vectors in R3−(0, 0, 0)T forms the projective space
P2 [34]. A point x = (x, y)T lies on line l = (a, b, c)T if and only if ax + by + c = 0, or in
vector notations:
xT l = lTx (84)
An arbitrary homogeneous vector representation of a point is of the form x = (x1, x2, x3)T ,
representing the point x = (x1/x3, x2/x3)T in R2. Points, then as homogeneous vectors
are also elements of P2. The point x can also be defined as the intersection of two lines l1,
and l2 as
x = l1 × l2 (85)
The points and lines are duals in P2 then; the line l joining two points x1 and x2 is defined
as:
l = x1 × x2 (86)
The intersection of lines is fully described in P2 even they are parallel. This leads to the
definition of points and lines at infinity. Consider, the two parallel lines l1 = (a, b, c1)T
and l2 = (a, b, c2)T where c1 �= c2. The intersection of l1 and l2 is the homogenous point
x = (b,−a, 0)T which is a point at infinity (b/0,−a/0)T in R2. The vector (b,−a)T
represents the direction of lines l1 and l2. If we think of all points that have the form
x = (x1, x2, 0) as points at infinity we will find a line l = (0, 0, 1)T that joins these
points at infinity. This is verified by computing xT l = 0 for all points x at infinity. The
description of points and lines at infinity is of great importance in computer vision, thanks
to the projective geometry.
145
FIGURE 62 – Representation of points and lines in P2.
b. The Projective Plan P2 We can think of P2 as a set of rays in R3. The set
of all vectors k(x1, x2, x3)T as k varies, forms a ray through the origin. Such a way may
be thought of as representing a single point in P2. In this model, the lines in P2 are planes
passing through the origin. Points and lines may be obtained by intersecting this set of rays
and planes by the projective plane at x3 = 1. As shown in Figure 62 the ray representing
points and lines at infinity are parallel to the plane x3 = 1.
c. 2-D transformations The 2-D projective geometry is defined as the study of
the properties of the projective plane P2 that are invariant under a group of transformations
known as projectivities or, homographies. A projectivity h is defined as the invertible
mapping from P2 to itself such that three points x1, x2, and x3 lie on the same line if and
only if h(x1), h(x2) and h(x3) do. One of most important projectivity in computer vision
is the central projection, since it is used to model the finite cameras. The central projection
maps points from one plane to another and also maps lines to lines as shown in Figure 63.
This planar projective transformation is a linear transformation on homogeneous 3-vector
represented by 3×3 non-singular matrix H as:
x2 = Hx1 (87)
146
FIGURE 63 – The central projection as a planar projectivity.
H is defined up to scale factor, so it has 8 degrees of freedom. To compute H that maps one
plane to another at least 4 corresponding points in each plane should be known provided
that 3 of them are not collinear. Figure 64, shows an image for ceiling tiles in the CVIP
lab. As a perspective image, it undergoes a perspective distortion which causes mapping
of parallel lines into intersecting lines. We can remove this distortion if we select 4 planar
points for a distorted shape and suppose that we know the proper shape and have 4 cor-
responding points in that proper shape. Solving for the projectivity that maps points into
proper ones, we can correct for all points in the plan that have the same type of distortion.
This is shown in Figure 64b after computing the projectivity matrix H and applying it to
points in Figure 64a. The same technique is used to rectify the image, Figure 65a, of the
Kent School building at the University of Louisville to make the front of the building facing
the viewer. As shown in Figure 65b, the front of the building is rectified however, other
points in different planes are distorted, since it is supposed that the projectivity applies only
to planes.
d. Hierarchy of Transformations Projection transformations form a group of
transformations called the projective linear group. Subgroups of projective transformation
are considered specializations of the projective group. Here we will summarize the defini-
tions of these subgroups and the geometrical entities invariants under these transformations.
147
(a) (b)
FIGURE 64 – Computing homographes. (a) The ceiling tiles image at CVIP lab (b) therectified image.
(a) (b)
FIGURE 65 – Computing homographes.(a) Kent School at the University of Louisville, (b)the rectified image.
148
The hierarchy of these transformations is: Euclidean (isometry), Similarity, Affinity, and
Projectivity.
The Euclidean transformation is described by a 3×3 matrix that has 3 degrees of
freedom; one for the rotation angle θ and the other two for translations, tx, and ty. This
matrix I is defined as:
I =
⎡⎢⎢⎢⎢⎣
i cos (θ) − sin (θ) tx
i sin (θ) cos (θ) ty
0 0 1
⎤⎥⎥⎥⎥⎦ (88)
where i = ±1. If it is -1, the orientation is reversed.
The similarity provides isotropic scaling, s, in addition to the rotation and transla-
tion provided by the isometry, hence the similarity has 4 degrees of freedom. The matrix S
is defined as:
S =
⎡⎢⎢⎢⎢⎣
s cos (θ) −s sin (θ) tx
s sin (θ) s cos (θ) ty
0 0 1
⎤⎥⎥⎥⎥⎦ (89)
The affinity AF is defined as:
AF =
⎡⎢⎢⎢⎢⎣
a1 a2 tx
a3 a4 ty
0 0 1
⎤⎥⎥⎥⎥⎦ (90)
where the matrix A =
⎡⎢⎣ a1 a2
a3 a4
⎤⎥⎦ can always be decomposed as:
A =
⎡⎢⎣ cos (θ) − sin (θ)
sin (θ) cos (θ)
⎤⎥⎦⎡⎢⎣ cos (−φ) − sin (−φ)
sin (−φ) cos (−φ)
⎤⎥⎦⎡⎢⎣ λ1 0
0 λ2
⎤⎥⎦⎡⎢⎣ cos (φ) − sin (φ)
sin (φ) cos (φ)
⎤⎥⎦
(91)
which can be interpreted as a rotation by φ followed by non-isotropic scaling by λ1, and
λ2 followed by rotation back by −φ and finally a rotation by θ . The affinity has 6 degrees
149
of freedom. The additional two over the similarity come from the rotation φ, the shearing
direction, and the non-isotropic scaling by the ratio λ1 : λ2.
The projectivity as we mentioned before has 8 degrees of freedom. These two
additional degrees of freedom came from adding two parameters v1 and v2 responsible for
the perspective projection. As a result of these effects, points at infinity are converted into
finite points. In addition, parallel lines are converted into intersecting lines. The projectivity
H can be written as:
H =
⎡⎢⎢⎢⎢⎣
a1 a2 tx
a3 a4 ty
v1 v2 1
⎤⎥⎥⎥⎥⎦ (92)
where v is a real value. Note, the zeros in the third row of I, S, and AF are no longer
zeros in H, hence the perspective projection effects. A summary for the discussed trans-
formations and the geometrical entities preserved under such transformations is shown in
Table 9 [35]. Figure 66(b-d) shows the effects of the above transformations on the image
in Figure 66a.
B. 3-D Projective Geometry
e. Representation of Points in P3 The point X = (x1,x2,x3,x4)T with x4 �=
0 is the homogeneous representation of the point (X,Y,Z)T , of R3 where X = x1/x4,
Y = x2/x4, and Z = x3/x4.
f. Representation of Planes in P3 In P3 planes and points are dual, while lines
are self-duals. A plan Π in 3-D may be written as:
π1X + π2Y + π3Z + π4 = 0 (93)
where the homogeneous representation of a plane is the 4-vector Π = (π1, π2, π3, π4)T .
The point X lies on a plane Π if and only if
ΠT X = XTΠ = 0 (94)
150
(a) (b)
(c) (d)
(e)
FIGURE 66 – 2-D transformations. (a) original image (b) after isometry (c) after similarity(d) after affinity (e) after projectivity.
151
TABLE 9SUMMARY OF TRANSFORMATIONS.
isometry similarity affinity projectivity
transformation
rotation, translation × × × ×
isotropic scaling × × ×
non-isotropic scaling × ×
perspective projection ×
invariants
distance ×
angles, ratios of distances × ×
Parallelism, center of mass × × ×
Incidence, cross ratio × × × ×
152
In general, points and planes are related to each other in 3-D space by the following rela-
tions:
A plane Π is defined uniquely by 3 distinct points X1, X2, and X3 provided that
the points are not collinear ⎡⎢⎢⎢⎢⎣
XT1
XT2
XT3
⎤⎥⎥⎥⎥⎦Π = 0 (95)
which represents a system of linear equations that can be solved for the unknown plane.
A point X is defined uniquely by the intersection of 3 distinct planes Π1, Π2, and
Π3 (the dual of the above relation) such that:⎡⎢⎢⎢⎢⎣
ΠT1
ΠT2
ΠT3
⎤⎥⎥⎥⎥⎦X = 0 (96)
which represents a system of linear equations that can be solved for the unknown point.
g. Representation of lines in Π1 A line is defined by a joint of two points or the
intersection of two planes. The line has 4 degrees of freedom in 3-D space. Suppose X, Y
are two (non-coincident) space points. Then the line joining these points is represented by
the span of the row space of the 2×4 matrix L composed of XT and YT as rows:
L =
⎡⎢⎣ XT
YT
⎤⎥⎦ (97)
with the span of LT is the pencil of points μX + λY on the line where μ and λ are real
values. The dual representation of a line as the intersection of planes Π1, Π2 is
L1 =
⎡⎢⎣ ΠT
1
ΠT2
⎤⎥⎦ (98)
with the span of (L1)T is the pencil of planes μ1Π1 + λΠ2.
The plane Π defined by the point X and line L is the solution of the following
153
equation: ⎡⎢⎣ L
XT
⎤⎥⎦Π = 0 (99)
in addition, the point X defined by the intersection of line L with plane Π is the solution
of the following equation: ⎡⎢⎣ L
ΠT
⎤⎥⎦X = 0 (100)
Note the duality principle of points and planes in the previous two equations.
h. Plucker matrices Here the line L is represented by a 4×4 skew-symmetric
homogeneous matrix. The line joining the two pints X and Y is
L = XYT − YXT (101)
and the dual representation in terms of two intersecting planes Π1, Π2 is
L = Π1ΠT2 − Π2Π
T1 (102)
using the Plucker representation it is easier to directly determine points and planes joining
or intersecting a certain line. For example, the plane Π defined by the point X and line L
is:
Π = LX (103)
and the point X defined by the intersection of the line L with the plane Π is
X = LΠ (104)
once again note the duality property of planes and lines in 3-D space.
154
APPENDIX II
CAMERA CALIBRATION
The geometrical analysis of a single view helps to understand the relation between
a scene in 3-D space and its image captured by a camera. As images provide an abstract
description of 3-D scenes, understanding of the geometrical laws that govern the formation
of these images can help in the recovery of 3-D information absent in the 2-D images.
Therefore, it is important to understand the camera anatomy and techniques of camera
modeling.
A. Camera Modeling
A camera is a mapping between the 3-D world (object space) and a 2-D image. Of
our interest in this dissertation is the central projection camera or, the finite camera. Gen-
eral projective camera models include in addition to the finite cameras, the infinite cameras.
The term finite and infinite refers to the position of the optical center of the camera in the
3-D space.
A very common model for finite cameras is the pinhole model. As shown in Fig-
ure 67, The model consists of a plane Π, the image plane, and a 3-D point C, the optical
center or the focus of projection. The distance between Π and C is the camera focal length
f . The line through C and perpendicular to Π is the optical axis. The intersection of the
optical axis and Π is the image center or the principle point o. The point x on Π is the
image of the 3-D point X. The point x is the intersection of the straight line, the optical
ray, joining C and X and the image plane.
155
FIGURE 67 – Modeling of a central projection camera
Consider C as the origin of the camera coordinate system (Xcam, Ycam, Zcam) and
o is the origin of the image coordinate system (xim, yim), then the relation between the
image x = (x, y)T and the 3-D point X = (X,Y, Z)T is written as:
x = fX
Z(105)
y = fY
Z(106)
In practice, the principle point may not be at the origin of the camera coordinate system.
In addition, in CCD cameras when the coordinates are measured in pixels, it is common to
have non-squared pixels; i.e. they have different scaling in x-axis and y-axis directions. It is
also possible to have, even though it rarely happens, non-orthogonal image axis. Assuming
that the scaling in x-axis and y-axis are mx (pixel/unit length) and my (pixel/unit length)
respectively. Taking these effects into consideration and expressing the 2-D and 3-D coor-
dinates in their homogeneous form, the relation between a 3-D point X = (X,Y, Z, 1)T in
3-D space and the image point x = (x, y, 1)T is written as:
sx = PX (107)
156
where s is a scale and P = [K|0] is the camera calibration matrix. The matrix K is defined
as:
K =
⎡⎢⎢⎢⎢⎣
αx −αx cot (θ) px
0 αx/ sin (θ) py
0 0 1
⎤⎥⎥⎥⎥⎦ (108)
where θ is the angle between the image axes, αx = fmx, αy = fmy, and (px, py)T is the
image center expressed in the pixel dimensions. The matrix K has 5 degrees of freedom px,
py, αx, αy, and θ. These parameters are the intrinsic calibration parameters of the camera.
In fact, camera calibration is defined as the process of estimating two sets of parameters:
the intrinsic parameters and the extrinsic parameters.
The intrinsic parameters are the parameters necessary to link the pixel coordinates
of an image point with the corresponding coordinates in the camera reference frame. These
parameters are the entries of the intrinsic parameters matrix K. The extrinsic parame-
ters define the location and the orientation of the camera reference frame with respect to
a known world reference frame. The extrinsic parameters thus defined as any set of geo-
metric parameters that identify uniquely the transformation between the unknown camera
reference frame and a known reference frame, named the world reference frame.
A typical choice for describing the transformation between the camera and the
world frame is to use a 3-D translation vector, t = [tx, ty, tz]T , describing the relative
positions of the origins of the two reference frames, and a 3×3 rotation matrix, R, an or-
thogonal matrix (RTR = RRT = I) that brings the corresponding axes of the two frames
onto each other. We can express the 3-D rotation as the result of three consecutive rota-
tions around the coordinate axes by angles α, β and γ. The angles are then the three free
157
parameters of R. The rotation matrix R can be expressed in terms of α, β and γ as:
R =
⎡⎢⎢⎢⎢⎣
cos β cos γ − cos β sin γ sin β
sin α sin β sin γ + cos α sin γ − sin α sin β cos γ + cos α sin γ sin α cos β
− cos α sin β cos γ + sin α sin γ cos α sin β sin γ + sin α cos γ cos α cos β
⎤⎥⎥⎥⎥⎦
(109)
The extrinsic parameters matrix, D, can be expressed in terms of t and R as:
D =
⎡⎢⎣ R t
OT3 1
⎤⎥⎦ (110)
where D is 4×4 matrix and O3 = [0, 0, 0]T . Taking the effect of the extrinsic parameters,
additional 6 degrees of freedom, into the projection matrix then P has the general form:
P = KPprojD (111)
Where Pproj depends on the type of the projection as follows:
for finite cameras
Pproj = Pfinite =
⎡⎢⎢⎢⎢⎣
1 0 0 0
0 1 0 0
0 0 1 0
⎤⎥⎥⎥⎥⎦ (112)
for infinite cameras
Pproj = Pinfinite =
⎡⎢⎢⎢⎢⎣
1 0 0 0
0 1 0 0
0 0 0 1
⎤⎥⎥⎥⎥⎦ (113)
taking into account that the image center is undefined for this type of projection, hence it
is replaced by zeros in the matrix K. Since the finite cameras are of our interest in this
dissertation, we will describe how the projection matrix is computed and provide anatomy
of this matrix.
158
B. Anatomy of the Projection Matrix
A general projective camera may be written as P = [M|p4], where M is a 3×3
matrix composed of the first three columns of P and p4 is the fourth column. M is an
important matrix since from which we can determine whether P is for a finite or infinite
camera. In other words, a camera is finite if M is non-singular, otherwise it is infinite [34].
A- Camera Center: The camera center C is the 1-dimentional right-null space of P such
that:
PC = 0 (114)
C =
⎡⎢⎣ M−1p4
1
⎤⎥⎦ (115)
for finite cameras where M is non-singular matrix
C =
⎡⎢⎣ d
1
⎤⎥⎦ , Md = 0 (116)
B- Column vectors of P: The column vectors p1, p2, p3, and p4 of the projection matrix P
have the following geometrical meaning:
• p1, p2, and p3 are the vanishing points of the world coordinate Xw, Yw and Zw axis
respectively. For example the X-axis has the direction Dx = (1, 0, 0, 0)T , which is
imaged at p1 = PDx. See Figure 68.
• p4 is the image of the world origin Ow = (0, 0, 0, 1)T .
C- Row Vectors of P: The row vectors of P can be interpreted as (see Figure 69:
• PT1 is the y-axis plane, since the image of any point in this plane is (0, y, w)T
• PT2 is the x-axis plane, since the image of any point in this plane is (x, 0, w)T
• PT3 is the principle plane; the plane through the camera center parallel to the image
plan. A point X lies on the principle plane if and only if PT3 X = 0. In fact, the
159
FIGURE 68 – The geometrical interpretation of the projection matrix columns
FIGURE 69 – The geometrical interpretation of the projection matrix rows.
principle plane consists of the set of points X which are imaged on the line at infinity.
Explicitly, PX = (x, y, 0)T .
D- The Principle Point: The principle point is computed as:
x0 = Mm3 (117)
E- The Optical Ray: The vector that consists of all points X that project to point x and it
can be defined as:
X(λ) =
⎡⎢⎣ M−1(λxp4)
1
⎤⎥⎦ (118)
160
CURRICULUM VITA
NAME: Ahmed Hamad Mohamed Eid
ADDRESS:Electrical and Computer Engineering DepartmentUniversity of LouisvilleLouisville, KY 40292
EDUCATION:• M.Sc. in Electrical Communications Engineering 1999 Mansoura University.• B.Sc. in Electronics Engineering 1994 Mansoura University.
TEACHING:• Teaching Assistant for the following courses at university of Louisville: Digital
signal processing, Image processing, Pattern recognition, Random variables and stochasticprocesses, Computer vision.
• Teaching Assistant for the following courses at Mansoura University, Egypt: Elec-tronic circuits I, II, III and IV, Instrumentations and Measurements, Microprocessor Design,Applied Statistics.
PREVIOUS RESEARCH:• Vision-guided autonomous refueling systems.• Vision-based 3-D modeling of the human jaw.• Design and analysis of electronic circuits.
AWARDS AND SCHOLARSHIPS:• Who’s Who Among Students of American Universities and Colleges 2004.• Graduate Research Assistant, CVIP Lab, University of Louisville, 2003-Present,• SGI Award for Excellence in Computational Sciences and Visualization (spon-
sored by Silicon Graphics Inc.), Speed School of Engineering, University of Louisville,2003.
• Graduate Teaching Assistant, Department of Electrical and Computer Engineer-ing, University of Louisville, 2000-2003.PUBLICATIONS:JOURNALS[1] A. Eid and A. Farag, “On the performance evaluation of 3-D reconstruction techniquesfrom a sequence of images,” EURASIP Journal on Applied Signal Processing, to appear
161
2005.
[2] A. Farag and A. Eid, “Video reconstructions in dentistry”, Orthod. Craniofacial Res. 6(Suppl.1), pp. 108-116, Aug. 2003.
CONFERENCES[1] A. Farag and A. Eid, “Local quality assessment of 3-D reconstructions from sequenceof images: a quantitative approach,” Advanced Concepts for Intelligent Vision Systems(ACIVS’04), Brussels, Belgium, Aug. 31-Sep. 3, 2004, pp. 161-168.
[2] A. Eid and A. A. Farag, “On the fusion of 3-D reconstruction techniques,” Proceedingsof Seventh International Conference on Information Fusion (IF’04), Stockholm, Sweden,June 28-July 1, 2004, pp. 856-861.
[3] A. Eid and A. Farag, “A unified framework for performance evaluation of 3-D re-construction techniques,” IEEE Conference on Computer Vision and Pattern Recognition(CVPR’04), Workshop on Real-time 3-D Sensors and their Use, Washington DC, June 27-July 2, 2004.
[4] A. Eid and A. Farag, “Design of an experimental setup for performance evaluationof 3-D reconstruction techniques from sequence of images,” Eighth European Conferenceon Computer Vision (ECCV’04), Workshop on Applications of Computer Vision, Prague,Czech Republic, May 11-14, 2004, pp. 69-77.
[5] A. Eid and A. Farag, “On the performance characterization of stereo and space carv-ing,” Proceedings of Advanced Concepts for Intelligent Vision Systems (ACIVS’03), Ghent,Belgium, Sep. 2-5, 2003, pp. 291-296.
[6] A. Farag, E. Dizdarevic, A. Eid, and A. Lorincz, “Monocular, vision based, autonomousrefueling system”, Proceedings of Sixth IEEE Workshop on Application of Computer Vision(WACV’02), Orlando, Fl, Dec. 3-4, 2002, pp. 309-313.
[7] A. Eid, S. Rashad and A. Farag, “A general purpose platform for 3D reconstructionfrom sequence of images,” Proceedings of Fifth International Conference on InformationFusion (IF’02), vol. I, Annapolis, MD, July 7-11, 2002, pp. 425-413.
[8] A. Eid, S. Rashad and A. Farag, “Validation of 3D reconstruction from sequence ofimages,” Proceedings of the International Conference on Signal Processing, Pattern Recog-nition, and Applications (SSPRA’02), Crete, Greece, June 25-28, 2002, pp. 375-380.
[9] M. Ahmed, A. Eid, A. Farag, “3D reconstruction of the human jaw: a new approach andimprovements”, International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’01), Netherlands, Oct. 2001, pp. 1007-1014.
162
[10] M. Ahmed, A. Eid, and A. Farag, “3-D reconstruction of the human jaw using spacecarving,” IEEE International Conference on Image Processing (ICIP’2001), vol. II, Greece,Oct. 2001, pp. 323-326.
[11] H. Soliman, A. Hamad (Eid), and N. Hamdy, “A video-speed switched resistor A/Dconverter architecture”, Proc. of the 43rd IEEE Midwest Symposium on Circuits and Sys-tems, Michigan, Aug. 2000.
[12] N. Hamdy, H. Soliman and A. Eid, “A vertical successive approximation A/D con-verter architecture for High-Speed Applications”, Proc. of the 41st Midwest Symposium onCircuits and Systems, Notre Dame, Aug. 1998.
163