· ACKNOWLEDGMENTS I would ﬁrst like to thank my advisor, Dr. Aly Farag, for his guidance and support duringthiscourseofstudy. IamindebtedtoDr. Farag,whohashadatremendousinﬂuence

A COMPUTATIONAL FRAMEWORK FOR PERFORMANCECHARACTERIZATION OF 3-D RECONSTRUCTION

TECHNIQUES FROM SEQUENCE OF IMAGES

By

Ahmed EidM.Sc., EE, Mansoura University, 1999

A DissertationSubmitted to the Faculty of the

Graduate School of the University of Louisvillein Partial Fulfillment of the Requirements

for the Degree of

Doctor of Philosophy

Department of Electrical and Computer EngineeringUniversity of Louisville

Louisville, Kentucky

December 2004

ii

A COMPUTATIONAL FRAMEWORK FOR PERFORMANCECHARACTERIZATION OF 3-D RECONSTRUCTION

TECHNIQUES FROM SEQUENCE OF IMAGES

By

Ahmed EidM.Sc., EE, Mansoura University, 1999

A Dissertation Approved on

by the Following Reading and Examination Committee:

Aly Farag, Ph.D., Dissertation Director

Georgy Gimel’farb, Ph.D.

John Naber, Ph.D.

Peter Quesada, Ph.D.

Udayan Darji, Ph.D.

Xiangqian Liu, Ph.D.

DEDICATION

This dissertation is dedicated to

my mother

and

my wife, Asmaa

for their patience, understanding, and support.

iii

ACKNOWLEDGMENTS

I would first like to thank my advisor, Dr. Aly Farag, for his guidance and support

during this course of study. I am indebted to Dr. Farag, who has had a tremendous influence

on my career as a researcher. He has taught me a lot about effectively conducting research.

Many thanks to my other committee members Dr. Georgy Gimel’farb, from the

University of Auckland, Dr. John Naber, Dr. Peter Quesada, Dr. Udayan Darji, and Dr.

Xiangqian Liu, all of whom carefully read drafts of my dissertation and gave me valuable

comments, suggestions, and corrections. I greatly appreciate their time and flexibility.

Thanks especially to Dr. Gimel’farb for conducting valuable discussions about this research

during his sabbatical leave in the CVIP Laboratory.

I would like to thank all the members of the CVIP Laboratory at the University of

Louisville for their tremendous support, especially Chuck Sites, the system administrator.

Special thanks to Dr. Moumen Ahmed, who has been a great friend and teacher. I would

also like to thank all my friends for making my stay in Louisville enjoyable. Many thanks

to everyone who helped me during the first days of my stay in Louisville.

Finally, I would like to thank my family for their love, support, and confidence

without which this dissertation would not have been possible.

iv

ABSTRACT

A COMPUTATIONAL FRAMEWORK FOR PERFORMANCE CHARACTERIZATION

OF 3-D RECONSTRUCTION TECHNIQUES FROM SEQUENCE OF IMAGES

Ahmed Eid

December 13, 2004

This dissertation addresses the problem of performance characterization of 3-D re-

construction techniques from a sequence of images. Although many 3-D reconstruction

techniques have been proposed in other literature, the work done to quantify their perfor-

mance is quite insufficient from the computational point-of-view. The qualitative evalu-

ation methods are the dominant among all other methods. Most of the current computa-

tional methods are depending on unrealistic data sets, and/or applicable to certain types of

algorithms. This, in turn, has led to the presence of unpopular and limited-use evaluation

approaches. Certainly, this situation does not serve the goal of having standard, on-shelf

methodologies that are able to quantify the performance of existing and future 3-D recon-

struction techniques.

In this dissertation, we try to rectify this situation by proposing a unified computa-

tional framework for performance characterization of 3-D reconstruction techniques. The

framework is three-fold. First, we introduce a new design for an experimental test-bed for

the performance evaluation of 3-D reconstruction techniques. The setup integrates the func-

tionality of 3-D laser scanners and CCD cameras. The setup provides accurate, general-use,

automatically generated and registered dense ground truth data and their corresponding in-

v

tensity data. The system bridges a gap in the evaluation research that is suffering from a

lack of such data sets.

Second, we introduce a new 3-D registration technique dedicated to the evaluation

problem. The 3-D registration is an important pre-evaluation request to get referenced

evaluations. The proposed technique uses the image silhouettes instead of the actual 3-D

reconstruction under-test. This makes the registration results independent of the quality of

the reconstruction under-test. This feature is the major advantage of the proposed registra-

tion technique over the conventional techniques.

Third, we propose different computational evaluation methodologies and corre-

sponding measuring criteria. These testing methodologies are independent of the 3-D re-

construction under-test. The methodologies are applied to the space carving technique as a

common 3-D reconstruction technique to characterize its performance. Several concluding

remarks on the space carving performance are provided.

Applications of the proposed framework other than performance tracking and diag-

nosis, as provided in the space carving case study, include system design and data fusion.

We propose a draft design to a 3-D modeling vision system based on the evaluation pro-

vided for the space carving technique. Moreover, a method for data fusion of laser-based

and camera-based reconstructions is presented.

We believe that presenting this framework to the computer vision community will

help measure the progress in the 3-D modeling research and provide diagnosis tools for the

current and the future 3-D reconstruction techniques. To maximize the benefits from this

work, the data sets used throughout this research will be provided for the public use.

vi

TABLE OF CONTENTS

DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiLIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xLIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiNOMENCLATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivCHAPTER

I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

A. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2B. 3-D Reconstruction Techniques: An Overview . . . . . . . . . . . . . 3

1. Stereo Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 32. Volumetric Representation Approaches . . . . . . . . . . . . . . 6

a. Shape from Silhouettes Approach . . . . . . . . . . . . . . . 6b. Voxel Coloring (VC) Approach . . . . . . . . . . . . . . . . 7c. Space Carving (SC) Approach . . . . . . . . . . . . . . . . . 7d. Generalized Voxel Coloring (GVC) Approach . . . . . . . . 9

C. The Need for this Work . . . . . . . . . . . . . . . . . . . . . . . . . . 10D. The Contribution of this Work . . . . . . . . . . . . . . . . . . . . . . 12E. The Organization of the Dissertation . . . . . . . . . . . . . . . . . . . 15

II. DATA ACQUISITION AND PREPARATION TECHNIQUES . . . . . . . 16

A. Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16B. System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18C. Background Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . 19D. Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23E. Setup Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28F. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

III. A NOVEL TECHNIQUE FOR 3-D DATA REGISTRATION AS A PRE-

EVALUATION STEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33A. 3-D Data Registration . . . . . . . . . . . . . . . . . . . . . . . . . . 33B. 3-D Data Registration Through Silhouettes (RTS) . . . . . . . . . . . . 39

1. An Overview of the Approach . . . . . . . . . . . . . . . . . . . 39

vii

2. The Registration Procedure . . . . . . . . . . . . . . . . . . . . 40a. A Two-step Minimization . . . . . . . . . . . . . . . . . . . 42b. Occluding Contours as Replacements to the Silhouettes . . . 43c. An Evaluation Criterion for the RTS Approach . . . . . . . . 45

C. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 45D. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

IV. PERFORMANCE EVALUATION: METHODOLOGIES AND MEASURES 58

A. Classification of Evaluation Techniques . . . . . . . . . . . . . . . . . 59B. Local Quality Assessment (LQA) Methodology . . . . . . . . . . . . . 61

1. Performance Evaluation Procedure . . . . . . . . . . . . . . . . 622. Statistical Modeling of the Quality Index . . . . . . . . . . . . . 65

C. Image Reprojection (IR) Test . . . . . . . . . . . . . . . . . . . . . . . 691. Image Quality Measures . . . . . . . . . . . . . . . . . . . . . . 692. The IR Test Procedure . . . . . . . . . . . . . . . . . . . . . . . 72

D. Silhouette-Contour Signature (SCS) Test Methodology . . . . . . . . . 721. Shape Histogram Signature . . . . . . . . . . . . . . . . . . . . 752. The Error Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 763. Boundary Signature . . . . . . . . . . . . . . . . . . . . . . . . 76

E. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

V. Experimental Evaluation of the Space Carving Technique: A Case Study . . 87

A. Shape Recovery by Space Carving . . . . . . . . . . . . . . . . . . . . 87B. Experimental Evaluation of Space Carving . . . . . . . . . . . . . . . 88

1. The Effect of the Number of Input Images . . . . . . . . . . . . 89a. LQA Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89b. IR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90c. SCS Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93d. Evaluation Remarks . . . . . . . . . . . . . . . . . . . . . . 93

2. Effect of the Camera Pose . . . . . . . . . . . . . . . . . . . . . 94a. LQA Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95b. IR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95c. SCS Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99d. Evaluation Remarks . . . . . . . . . . . . . . . . . . . . . . 99

3. Effect of the Photo-consistency Threshold . . . . . . . . . . . . . 100a. LQA Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100b. IR Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105c. The SCS Test . . . . . . . . . . . . . . . . . . . . . . . . . . 105d. Evaluation Remarks . . . . . . . . . . . . . . . . . . . . . . 106

4. Effect of Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065. Effect of the Initial Volume Resolution . . . . . . . . . . . . . . 108

C. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

viii

VI. APPLICATIONS (POST-EVALUATIONS) . . . . . . . . . . . . . . . . . . 120

A. A 3-D Fusion Methodology . . . . . . . . . . . . . . . . . . . . . . . 1211. The Closest Point Test . . . . . . . . . . . . . . . . . . . . . . . 1222. The Closest Contour Test . . . . . . . . . . . . . . . . . . . . . 1233. The Fusion Decision . . . . . . . . . . . . . . . . . . . . . . . . 1234. Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 124

B. System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125C. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

VII. Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . 133

A. Contribution to Data Acquisition and System Design . . . . . . . . . . 133B. Contribution to the 3-D Data Registration . . . . . . . . . . . . . . . . 134C. Contribution to the Performance Evaluation Methodologies and Mea-

suring Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135D. Contribution to the Experimental Evaluation of 3-D Reconstruction

Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135E. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136F. Future Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

1. Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . 1372. 3-D Data Registration . . . . . . . . . . . . . . . . . . . . . . . 1373. Testing Methodologies and Measures . . . . . . . . . . . . . . . 1384. The Performance of Space Carving Technique . . . . . . . . . . 138

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139APPENDIX

I. PROJECTIVE GEOMETRY . . . . . . . . . . . . . . . . . . . . . . . . . 144

A. 2-D Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 144a. Points and Lines in P2 . . . . . . . . . . . . . . . . . . . . . 144b. The Projective Plan P2 . . . . . . . . . . . . . . . . . . . . . 146c. 2-D transformations . . . . . . . . . . . . . . . . . . . . . . 146d. Hierarchy of Transformations . . . . . . . . . . . . . . . . . 147

B. 3-D Projective Geometry . . . . . . . . . . . . . . . . . . . . . . . . . 150e. Representation of Points in P3 . . . . . . . . . . . . . . . . . 150f. Representation of Planes in P3 . . . . . . . . . . . . . . . . 150g. Representation of lines in Π1 . . . . . . . . . . . . . . . . . 153h. Plucker matrices . . . . . . . . . . . . . . . . . . . . . . . . 154

II. CAMERA CALIBRATION . . . . . . . . . . . . . . . . . . . . . . . . . . 155

A. Camera Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155B. Anatomy of the Projection Matrix . . . . . . . . . . . . . . . . . . . . 158

CURRICULUM VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

ix

LIST OF TABLES

TABLE . PAGE1. Convergence of the RTS approach to the desired values. . . . . . . . . . . . 482. The effect of the number of input images on the performance of space carving. 903. The final number of voxels and the run time, on ONYX2 SGI machine, for

36-, 18-, 12- and 9-reconstruction. . . . . . . . . . . . . . . . . . . . . . . . 924. The effect of the camera pose on the performance of space carving. . . . . . 995. The effect of the photo-consistency threshold on the performance of space

carving. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076. The effect of noise at different thresholds on the performance of space carving.1077. The run time of the space carving algorithm at different resolutions and dif-

ferent numbers of input images. . . . . . . . . . . . . . . . . . . . . . . . . 1098. Initial specifications of a passive 3-D scanner based on the reconstruction by

the space carving technique. . . . . . . . . . . . . . . . . . . . . . . . . . . 1319. Summary of transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . 152

x

LIST OF FIGURES

FIGURE . PAGE1. Simple stereo configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . 42. Stereo images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53. Camera configuration for the voxel coloring approach. . . . . . . . . . . . . 84. Basic idea of space carving. . . . . . . . . . . . . . . . . . . . . . . . . . . 95. Generalized voxel coloring. . . . . . . . . . . . . . . . . . . . . . . . . . . 106. The system setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207. Modes of operation of the setup. . . . . . . . . . . . . . . . . . . . . . . . . 218. Background subtraction versus intensity threshold. . . . . . . . . . . . . . . 249. Background subtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2510. Camera calibration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2611. System accuracy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3212. An example of the registration under distortion. . . . . . . . . . . . . . . . . 3613. Registration errors with different error criteria and matching strategies. . . . 3714. A difficult 3-D data registration case. . . . . . . . . . . . . . . . . . . . . . 3815. A degenerate case for silhouettes alignment. . . . . . . . . . . . . . . . . . . 4416. Registration Through Silhouettes (RTS) Results. . . . . . . . . . . . . . . . 5017. Registration parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5118. 3-D registration visual results. . . . . . . . . . . . . . . . . . . . . . . . . . 5219. 3-D registration quantitative results. . . . . . . . . . . . . . . . . . . . . . . 5320. Rendered views to show the alignment of the ground truth contours (blue)

and the input image contours (red.) . . . . . . . . . . . . . . . . . . . . . . 5421. The convergence of the registration parameters to the desired values. . . . . . 5522. RTS Evaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5623. Rendered views to show the alignment, with known parameters, of the ground

truth contours (blue) and the image contours (red). . . . . . . . . . . . . . . 5724. Local Quality Assessment (LQA) methodology. . . . . . . . . . . . . . . . . 6425. The LQA test applied to different two reconstructions registered to the ground

truth data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6826. The IR test applied to a reconstruction of 12 input images. . . . . . . . . . . 7327. IR test visual results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7428. An example of the 17 possible configurations describing the shape boundaries. 7729. A number of 17 possible configurations for three adjacent contour points. . . 7830. Examples of basic geometric shapes. . . . . . . . . . . . . . . . . . . . . . . 8131. Shape signatures for the rectangle and the rotated rectangle shapes. . . . . . 8232. Shape signatures for the circle and the ellipse shapes. . . . . . . . . . . . . . 83

xi

33. Examples of ground truth and measured shapes. . . . . . . . . . . . . . . . . 8434. Shape signatures for shapes in Figure 33 a . . . . . . . . . . . . . . . . . . . 8535. Shape signatures for shapes in Figure 33 b . . . . . . . . . . . . . . . . . . . 8636. LQA test results when the number of input images to the space carving is

changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9137. The quality index for two different reconstructions. . . . . . . . . . . . . . . 9238. IR test results when the number of input images to the space carving is changed. 9639. Rendered cutting-views for different reconstructions when the number of

input images to the space carving is changed. . . . . . . . . . . . . . . . . . 9740. SCS test results when the number of input images to the space carving is

changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9841. LQA test results when the camera pose is changed. . . . . . . . . . . . . . . 10142. IR test results when the camera pose is changed. . . . . . . . . . . . . . . . 10243. Rendered cutting-views for different reconstructions when the camera pose

is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10344. SCS test results when the camera pose is changed. . . . . . . . . . . . . . . 10445. LQA test results when the photo-consistency threshold is changed. . . . . . . 11146. IR test results when the photo-consistency threshold is changed. . . . . . . . 11247. Cutting-view images for different reconstructions at different thresholds. . . . 11348. SCS test results when the photo-consistency threshold is changed. . . . . . . 11449. LQA test results when Gaussian noise is added and the photo-consistency

threshold is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11550. IR test results when Gaussian noise is added and the photo-consistency thresh-

old is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11651. Cutting-view images for different reconstructions at different thresholds with

Gaussian noise is added to the input images. . . . . . . . . . . . . . . . . . . 11752. SCS test results when Gaussian noise is added and the photo-consistency

threshold is changed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11853. The effect of the initial volume resolution on the space reconstruction quality. 11954. Basic idea of the 3-D fusion methodology. . . . . . . . . . . . . . . . . . . . 12555. Screen captures of a 3-D reconstruction by 3-D laser scanner. . . . . . . . . . 12656. Silhouette images for the 3-D fusion technique. . . . . . . . . . . . . . . . . 12757. Contour images for the 3-D fusion technique. . . . . . . . . . . . . . . . . . 12858. Snapshots for 3-D reconstruction. . . . . . . . . . . . . . . . . . . . . . . . 12859. A draft design to a passive 3-D scanner. . . . . . . . . . . . . . . . . . . . . 13060. Top views for the house object . . . . . . . . . . . . . . . . . . . . . . . . . 13161. Another draft design to a passive 3-D scanner. . . . . . . . . . . . . . . . . . 13262. Representation of points and lines in P2. . . . . . . . . . . . . . . . . . . . 14663. The central projection as a planar projectivity. . . . . . . . . . . . . . . . . . 14764. Computing homographes. (a) The ceiling tiles image at CVIP lab (b) the

rectified image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14865. Computing homographes.(a) Kent School at the University of Louisville, (b)

the rectified image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

xii

66. 2-D transformations. (a) original image (b) after isometry (c) after similarity(d) after affinity (e) after projectivity. . . . . . . . . . . . . . . . . . . . . . 151

67. Modeling of a central projection camera . . . . . . . . . . . . . . . . . . . . 15668. The geometrical interpretation of the projection matrix columns . . . . . . . 16069. The geometrical interpretation of the projection matrix rows. . . . . . . . . . 160

xiii

NOMENCLATURE

The following convention is used throughout the dissertation. Matrices, vectors, and

3-D space points are expressed in bold upper case letters, e.g., P. The 2-D image points are

expressed in bold lower case letters, e.g., x. The elements of the matrices, vectors, points,

and functions are expressed in italic letters. Below is a list of symbols commonly used in

this text.

X 3-D world point

x 2-D image point

Pk projection matrix at view k

Pe probability estimate for background subtraction

Pq quality estimate at the quality index q

D camera extrinsic parameters matrix

K camera intrinsic parameters matrix

R rotation matrix (3×3 orthonormal matrix)

t translation vector (3-vector)

β rotation angle around the Y-axis

M measured data set of the 3-D reconstruction under-test

G ground truth data set

T 3-D Euclidean transformation

E error criterion

χ2 Chi square test measure

SNR Signal to Noise Ratio

Q quality index

Er error ratio

xiv

CHAPTER I

INTRODUCTION

In computer vision, the 3-D scene reconstruction from multiple images is a chal-

lenging and interesting problem to tackle. It is interesting because humans naturally solve

the depth estimation problem in easy and efficient ways. It is a challenge because there is

no single solution, of many different solutions proposed to solve the problem, that has the

completeness of the human solution.

Of course, there are presently good methods and there may be others in the future.

To guide the research towards better solutions, it is important to characterize the perfor-

mance of the existing solutions. How good are these solutions? and What is missing in

order to have better solutions? These are among the questions that should be answered

under the performance evaluation topic.

Unfortunately, although the performance evaluation of 3-D reconstruction tech-

niques is an important topic, it is not usually treated in stand alone research. This makes

many of the evaluation methodologies “algorithmic” in nature, i.e. they are not indepen-

dent of the algorithms under test. This, in turn, has led to the presence of unpopular and

limited-use evaluation approaches. Certainly, this situation does not serve the goal of hav-

ing standard, on-shelf methodologies that are able to quantify the performance of existing

and future 3-D reconstruction techniques.

In this study, we rectify this situation by introducing a unified framework for the

performance evaluation of 3-D reconstruction techniques. The framework includes designs

for an experimental test-bed, performance pre-evaluation/evaluation methodologies, and

quality measures. In addition, we propose different applications of this framework in data

1

fusion of 3-D reconstructions and system design. The ultimate goal of this study is to

have an impact on the progress of the field and to give some kind of standardization to the

evaluation process.

A. The Problem

Formally, we want to solve the following problem:

Given

1- a set M as a non-empty finite set of measured points m such that:

M = {m : m ∈ R3}

where M is generated by the 3-D reconstruction technique X; the technique under test.

2- a ground truth (gold standard) set G as a non-empty finite set of reference points g such

that:

G = {g : g ∈ Rn, n ∈ {1, 2, 3}}

G can be 3-D volume, 2-D images or a 1-D vector of parameters that describe a volume or

images.

Required

Quantify the performance of technique X.

If n �= 3, then we define the set D as the set of data points to be matched with G according

to the transformation or criterion C such that:

D = {d : d = C(m), d ∈ Rn, n ∈ {1, 2}}

If M and G are not aligned to each other, then a registration function or transformation T

is required such that the energy E is minimal, where E can be defined as:

E =∑

i

d2(mi ∈ M, T (gi ∈ G))

2

and d denotes the Euclidean distance.

B. 3-D Reconstruction Techniques: An Overview

In this section, we provide an overview of the common 3-D reconstruction tech-

niques. This includes two main sets of 3-D reconstruction techniques, the stereo and the

volumetric techniques.

1. Stereo Approaches

Stereo vision refers to the ability to infer information on the 3-D structure and dis-

tance of a scene from two or more images taken from different views [1]. From the compu-

tational viewpoint, the stereo system has to solve two problems: the correspondence prob-

lem and the reconstruction problem. The correspondence problem is the most challenging

problem in the stereo vision. To solve the problem, the projections of a scene element in

all/subset of images that see this element should be matched. While the matching process

is difficult, the reconstruction can be solved easily using the depth triangulation.

A simple stereo vision configuration is shown in Figure 1. The system consists of

two coplanar cameras with a baseline distance b; the distance between the optical centers

Ol and Or of the given cameras. To solve the correspondence problem, it is required to

match the projections xl and xr, of the 3-D point X in the left image Il and the right im-

age Ir, respectively. In the simple stereo configuration, the problem is easier because the

y-coordinates of the corresponding image points are equal, then the searching domain will

be restricted only to the x-direction to find the matched positions xl and xr. Once xl and xr

are determined, the disparity Disp = xl − xr can be related to the depth Z of point X by:

Disp = fb

Z(1)

3

FIGURE 1 – Simple stereo configuration.

Typically, stereo algorithms represent shape using the depth map, or the disparity

map, which is also known as the 212-D sketch [2]. The depth map is an image that encodes

the above depth-disparity relation. Figure 2 shows an example of a simple stereo pair and

the corresponding depth map [3].

In the general stereo configuration, the image planes are not coplanar which makes

the correspondence problem more difficult than the simple stereo configuration. If no con-

straints are applied to this configuration, then it is necessary to search for the point xr in the

entire right image to be matched with xl. The search space can be limited to a general line

in the right image if the epipolar constraint [4] is applied. Furthermore, image rectification

algorithms such as [5] can be used to parallelize and align the epipolar lines to the x-axis.

This rectification step reduces the problem to a simple stereo configuration problem.

In general, the stereo approaches operate either by edge-feature matching or area-

based matching. Feature-based stereo approaches are useful because they describe the im-

portant geometry of the object. However, the major problem of most of these approaches

is the low density of the output. A dynamic programming approach [4] matches features

to get dense reconstruction, but it can only be applied on a scanline by scanline basis. Re-

cently, Boykov, et al. [6] and Roy and Cox [7] have solved the inter-scanline problem using

4

(a)

(b)

FIGURE 2 – Stereo images. (a) A pair of stereo images and (b) the corresponding depthmap.

5

graph cuts techniques.

In contrast to the feature-based stereo, the area-based stereo provides dense recon-

structions. Okutumi and Kanade [8] used a variable size correlation window to generate

dense depth maps. However, area-based usually fails when applied to surfaces of large

textureless areas.

Another challenging problem for stereo approaches is the occlusion problem, where

scene elements appear in one image but they are occluded in the other image. Another dif-

ficulty with the stereo approaches is the constraints on the baseline b. Small b, very close

cameras, permits good correspondence, but it affects the depth accuracy. On the other hand,

large values of b make the corresponding problem more difficult, while the estimated depth

becomes more accurate. As we will see in the next section, the volumetric approaches

are not suffering from either the occlusion problem or the baseline problem. We refer the

reader to the survey by [9] for more about the stereo approaches.

2. Volumetric Representation Approaches

Volumetric modeling of a scene assumes there is a known, bounded volume in

which the object of interest lies. The most common approach to represent this volume

is as a regular tessellation of cubes, called voxels, in the Euclidean 3-D space.

a. Shape from Silhouettes Approach This method provides an approximate re-

construction of an object from its silhouette images. A silhouette image is a binary image,

with the value at a point is indicating whether or not the visual ray from the optical center

intersects the object surface in the scene. The best approximate to the object is obtainable

for an infinite number of captured silhouettes from all views surrounding the object. The

best approximation is called the visual hull [10]. Recent implementations of the shape from

silhouettes approach uses the concept of voxel projection used in the following volumetric

techniques. Although, the shape from silhouettes provides approximate surface, it has wide

6

applications in modeling human motion [11].

b. Voxel Coloring (VC) Approach Sietz and Dyer [12] have presented a voxel

coloring approach that traverses a discritized volume of voxels and decides the color con-

sistency of each voxel in all views from which it is visible assuming Lambertian surfaces.

The scene is assumed contained in that volume of voxels in which all voxels are initially

opaque. The voxels that have inconsistent colors are considered transparent. The remain-

ing voxels which are color-consistent are still opaque and represent the scene under recon-

struction. This approach has several advantages over the existing stereo approaches: (i) it

accounts for the occlusion problem, (ii) unlike stereo it does not put any constraints on the

base distances of cameras, (iii) it provides dense reconstructions and finally (iv) it provides

synthetic views of photo-realistic quality for many applications of virtual reality [12].

However, to determine the visibility of each voxel, Sietz and Dyer [12] imposed

what they called the the ordinal visibility constraint on the camera locations. This means

that the positions of the cameras are limited to be on one side of the scene. This makes

the visibility check of each voxel easy since the voxel will be visited in one scan of voxels

in planes that are successively further from the cameras. However, this constraint limits

the use of this approach in cases where a complete model from all directions of the scene

is required. Figure 3 shows the camera configuration imposed by the ordinal constraint.

Other configurations under the ordinal constraint are helpful if the cameras are distributed

above the 3-D object to get a complete model.

c. Space Carving (SC) Approach Kutulakos and Seitz [13] rectified the con-

strained camera positions situation in the voxel coloring approach by proposing the space

carving framework. They proposed the multiple plane-sweep procedure to unconstrain the

camera positions. Typically, these sweeps are along the positive and negative directions of

the three axes. Space carving forces the scans to be near to far of the cameras that see the

voxel under the consistency test. The procedure continues until all voxels achieve the photo

7

FIGURE 3 – Camera configuration for the voxel coloring approach.

consistency requirements, see Figure 4. Kutulakos and Seitz proved that the algorithm finds

the unique color consistent model that is a superset of any consistent model. They called

this unique model the photo hull. The space carving procedure is summarized as follows:

Space Carving Algorithm:

Space carving starts with an initial volume, V, that includes the object(s) to be re-

constructed. This 3-D space is then discretized into a finite set of voxels v1, v2, ..., vn. The

idea is to successively carve (remove) some voxels until the final 3-D shape, V*, agrees

with all the input images.

Step 1: Initialize V.

Step 2:

• Determine the set of visible voxels Vis(V) on the surface of V.

• Project each voxel v on Vis(V) to the different images where v is visible.

• Determine the photo-consistency of each voxel v on Vis(V).

Step 3: If all photo-consistent voxels are found, set V* = V and terminate. Otherwise, set V

= V - {non-photo-consistent v’s} and return to Step 2.

8

(a) (b)

FIGURE 4 – Basic idea of space carving. Voxels are projected to the input images usingtheir respective projection matrices. C1, C2 and C3 represent the optical centers of the threecameras. (a) Consistent voxels are assigned the color of their projections. (b) Inconsistentvoxels are removed from the volume.

d. Generalized Voxel Coloring (GVC) Approach While space carving never

carves voxels it should not, it is likely to produce a model that includes some inconsis-

tent voxels. This happens because the space carving uses only subsets of cameras that see

the voxel although other cameras can see the voxel when some of its surrounding vox-

els are carved as shown in Figure 5. In contrast, the generalized voxel coloring approach

(GVC) [14] is guaranteed to retain every voxel in the final model that is checked for color-

consistency by all images that see it.

The GVC works with arbitrary camera positions, that is why it is a generalization

to the voxel coloring approach. In addition, it enhances the performance of SC by check-

ing the visibility of each voxel through all the images that see it. The GVC maintains a

data structure that indicates for each pixel the address of the closest opaque voxel along

the pixel’s visual ray. This data structure can be updated less frequently after each voxel is

carved with a possibility of adding more iterations.

Fougeras and Keviven [15] have presented a volumetric technique based on the level

set approach. An initial scene-bounding surface represented in voxels evolves towards the

objects in the scene until a matching criterion based on normalized cross correlation is min-

imized. Good survey on the volumetric approaches for 3-D reconstruction from multiple

9

(a) (b)

FIGURE 5 – Generalized voxel coloring. (a) A voxel can be unseen to the camera in theearly carving stages, (b) the voxel can be seen in the final stages.

views can be found in [16, 17].

It can be inferred from the above discussion about the 3-D reconstruction techniques

that the volumetric approaches are promising approaches and have many advantages over

the classical stereo approaches.

C. The Need for this Work

The 3-D reconstruction from a sequence of images finds many applications in mod-

ern computer vision systems such as virtual reality, vision-guided surgeries, autonomous

navigation, medical studies and simulations, reverse engineering, and architectural design.

The very basic requirement of these applications is to find accurate and realistic reconstruc-

tions.

Many 3-D reconstruction approaches have been proposed to fulfill the requirements

of such important applications. However, the work done to validate the performance of

these approaches is quite insufficient. The performance of these approaches is either in-

vestigated visually or mostly computed using synthetic data sets. Since reality is a basic

10

requirement of most computer vision applications, real data sets should be used through

the evaluation phase of the 3-D reconstruction techniques. To quantitatively evaluate a

given 3-D reconstruction, dense ground truth data should be available. However generating

dense ground truth data may be difficult or laborious [18]. To get around the difficulty of

generating dense ground truth data, a sparse ground truth data can be used [19]. However,

testing the 3-D reconstructions using such data does not achieve the requirements needed

by applications such as virtual reality.

Szeliski and Zabih [20] used manually generated dense ground truth data in their

experimental comparison of stereo algorithms. However, most of the these data sets are

generated for fronto-parallel scenes. These types of scenes are considered special cases.

Recently, Scharstein and Szeliski [9] have developed such data sets to include more chal-

lenging surfaces for stereo techniques such as slanted surfaces. These data sets [3] are

represented in the depth map form which is a specific form of the simple stereo configu-

ration. This puts a major limitation on using these data sets for other 3-D reconstruction

techniques or even for general stereo configurations.

Mulligan et al. [21] have presented an experimental setup that provides dense ground

truth data for stereo tele-presence applications. This system is considered the first to pro-

vide dense 3-D ground truth data [21]. Although these data sets are generated in the 3-D

form, they have limited applicability (they are only applicable to the stereo approaches).

Pre-evaluation, 3-D registration techniques should be available to align the ground

truth data with the data under-test. Using conventional registration methods, the outcome

of the registration process would be questionable if the measured data set is a corrupted

version of the ground truth data set. Unfortunately, registration of such data types is an

unavoidable step in most evaluation procedures.

One of the common techniques that is used to solve the registration problem is the

Iterative Closest Point (ICP) technique [22]. The algorithm is simple and efficient, however

11

it needs a good initial estimate, otherwise it will be stuck to a local minima. In addition, the

algorithm is sensitive to the statistical outliers [22]. Many techniques have been introduced

in the literature to provide robustness to the ICP method e.g. [23–25]. However, other dis-

tortion models were not treated by these studies.

Other 3-D registration techniques that rely on the selection of distinct features in the

data under registration could be used instead of the ICP approaches [26–29]. However, the

selection and matching of these features are challenging tasks for these techniques when

corrupted data sets are manipulated. Manual selection and matching of features could be a

solution under these circumstances as in [21].

In general, to solve the evaluation problem in a unified framework three main com-

ponents should be available: (i) an experimental testbed to provide general-use data sets,

(ii) pre-evaluation techniques for preparing data for the evaluation process with minimal

undesirable effects on the given data, and (iii) performance evaluation methodologies and

measures. Having these components within a unified framework could ease the solution

and avoid unnecessary complexities if they were treated separately.

This dissertation provides a unified computational framework for performance char-

acterization of the 3-D reconstruction techniques. It provides new designs for the main

components of the general performance evaluation system. In addition, the applicability

of these designs is examined in three ways: (i) application to the performance evaluation

of a recent common 3-D reconstruction technique, the space carving, (ii) application to the

data fusion of different reconstructions, and (iii) application to the design of a passive 3-D

scanner.

The general objective of this study is to allow for measuring the progress in the

3-D reconstruction research. This helps in quantifying the performance of the existing

techniques, analyzing the errors, and propose solutions to enhance the performance.

12

D. The Contribution of this Work

This dissertation introduces a new design for an experimental setup that integrates

the functionality of laser scanners and CCD cameras. The system is able to collect very

dense ground truth data. The system contains very efficient data acquisition modules that

guarantee generating high quality intensity data sets. These data are then calibrated, seg-

mented, and automatically registered to the ground truth data. These data sets can be used

by different 3-D reconstruction techniques including the stereo and the volumetric based

approaches. A database for such data sets will be available for the public-use to bridge the

gap caused by the unavailability of global experimental data sets.

A novel technique for 3-D data registration is presented. This technique is dedicated

to the evaluation procedures that aim to localize errors in the data under-test. The approach,

unlike the conventional 3-D data registration techniques, does not rely on the presence of

the 3-D reconstruction under test during the registration phase. This gives a major advan-

tage to this approach since the 3-D reconstruction could be of low quality that might add

difficulties to any 3-D registration technique. In addition, if the actual 3-D reconstructions

under test were used in the registration phase, then some errors that the evaluation process

tries to investigate might disappear during the minimization step used by any 3-D registra-

tion technique. The proposed approach employs silhouette images to align the given data

sets. Undistorted silhouette images can be generated easily, hence permitting good data

sets for the registration process. The approach is simple and efficient and can be applied to

any 3-D registration problem assuming the availability of a calibrated sequence of images

describing one of the data sets under registration.

Three testing methodologies are presented. The first test is the Local Quality As-

sessment (LQA) test. This test quantifies the performance of a given 3-D reconstruction

with respect to a reference 3-D reconstruction provided by the 3-D laser scanner. It is

designed to investigate local errors in the given 3-D reconstruction by decimating it into

13

different patches and measure the quality of each patch. This makes the error analysis

much easier and permits the integration of different 3-D reconstruction techniques based

on the results of this test.

An Image Re-projection (IR) testing methodology is presented to cope with the un-

availability of 3-D ground truth data. The test uses the acquired images as the reference of

comparison with the corresponding images re-projected from the given 3-D reconstruction.

This test also measures the applicability of the 3-D reconstruction techniques for virtual

reality problems.

To avoid errors due to the intensity variations and the re-projection process in the

IR test, we propose a Silhouette-Contour Signature (SCS) methodology that extracts shape

features form silhouette and contour images and permits the inclusion of distinct, cutting,

views from the 3-D ground truth data.

A classification criterion for testing methodologies is also presented. Based on this

criterion we can classify the tests that measure the performance of the 3-D reconstruction

techniques into 24 types of tests. This classification will eventually help in getting standard

ranking to reflect the validity of such tests.

An experimental evaluation of the space carving, as a recent common technique for

3-D reconstruction from a sequence of images, is presented. The evaluation procedures

used in this study are based on the presented performance evaluation framework. In this

study, we track the response of the space carving to the changes in the key controlling pa-

rameters of the algorithm.

Two applications for the performance evaluation framework are presented. The first

application is the 3-D data fusion of different 3-D reconstructions. A fusion technique

based on the image contour comparison is presented. The technique rectifies the 3-D re-

construction based on the closeness of its projected contours to the ground truth contours.

The method is used to combine reconstructions generated by a 3-D laser scanner and the

14

space carving technique.

The second application is the system design. A draft design for a passive 3-D scan-

ner is presented. The design is based on the experimental results of evaluating the perfor-

mance of the space carving. The proposed scanner should be able to reconstruct surfaces

that the commercial 3-D laser scanner may not able to reconstruct.

E. The Organization of the Dissertation

The remaining chapters of this dissertation are organized as follows:

• Chapter II: introduces a new design of a testing setup with other related components

such as camera calibration, object segmentation, and system accuracy. The chapter

provides discussions and results of the implemented components.

• Chapter III: introduces a novel 3-D registration methodology dedicated to the evalu-

ation problem.

• Chapter IV: introduces three different methodologies for the performance character-

ization of 3-D reconstruction techniques. Results and discussions are also presented.

• Chapter V: provides a study of the performance evaluation of the space carving tech-

nique when the key controlling parameters of this algorithm are changed.

• Chapter VI: provides applications of the performance evaluation framework, this in-

cludes data fusion of different reconstructions and a draft design of a 3-D scanner.

• Chapter VII: provides the conclusions of this dissertation and future extensions.

• Appendix A: provides a brief introduction to the projective geometry.

• Appendix B: provides a background for the camera calibration process as an impor-

tant component of the proposed system.

15

CHAPTER II

DATA ACQUISITION AND PREPARATION TECHNIQUES

In this chapter, we present a new design for an experimental test-bed for 3-D re-

construction techniques. The setup integrates the functionality of 3-D laser scanners and

CCD cameras. The setup provides accurate, general-use, automatically generated and reg-

istered dense ground truth data and their corresponding intensity data. Designs of object

segmentation and camera calibration submodules are also provided. Moreover, we present

an analytical solution to the system accuracy against calibration errors due to deviations

from the pre-assumed camera-motion mechanism.

A. Previous Work

To quantitatively evaluate a given 3-D reconstruction, dense ground truth data should

be available. However, generating dense ground truth data may be difficult or labori-

ous [18]. To get around the difficulty of generating dense ground truth data, sparse data can

be used [19]. However, testing the 3-D reconstructions using such data does not achieve

the requirements needed by many computer vision applications (e.g. virtual reality, reverse

engineering, or architectural design).

Szeliski and Zabih [20] used manually generated dense ground truth data in their

experimental comparison of stereo algorithms. However, most of the these data sets are

generated for fronto-parallel scenes. These types of scenes are considered special cases.

Recently, Scharstein and Szeliski have developed such data sets to include more challeng-

ing surfaces for stereo techniques such as slanted surfaces [3]. These data sets are repre-

16

sented in the form of depth maps which is a specific form of the simple stereo configuration.

This puts a major limitation on using these data sets for other 3-D reconstruction techniques

or even for general stereo configurations.

Mulligan et al. [21] have presented an experimental setup that provides dense ground

truth data for stereo tele-presence applications. This system is considered the first to pro-

vide dense 3-D ground truth data [21]. Although we propose a similar system of generating

ground truth data, there are several distinctions:

• Mulligan’s setup uses only fixed cameras while we use a rotating camera connected

to the scanner head. This feature makes our setup acquire large number of images

that cover a range of 0− 360o. These data can be used for both stereo and volumetric

approaches not only for stereo approaches as Mulligan’s setup.

• The registration procedure used in Mulligan’s setup is very complicated since the

scanner should reconstruct a calibration pattern, then a stereo approach should do the

same. Matched points between the two reconstructions are selected manually to find

the registration parameters. However, in our setup we automatically register the data

disregarding the 3-D reconstruction technique under test.

• To acquire different views at different rotations of an object by the Mulligan’s setup,

the object should be rotated manually. This provides inaccuracy in the calibration

process. We address this issue in this chapter for our system. We provide an ana-

lytical solution to the upper bound on such errors. For errors less than 0.5 pixels in

the acquired images, the error in the assumed rotation angle should be less than 0.2o.

In our setup, the precision is determined by the motion mechanism of the 3-D laser

scanner. An error of 0.1o is a typical error in commercial 3-D laser scanners.

• In Mulligan’s setup, calibrating the input images is a separate process from the cal-

ibration process used in the registration phase. In addition, a separate scan by the

17

scanner should be performed for each rotation of the object to generate the ground

truth data. However, in our system we scan the object and calibrate the camera only

once.

In general, Mulligan’s setup requires manual adjustments in one step or another during

the data acquisition process. This increases chances of generating inaccurate data and

extending the acquisition time. To overcome these limitations, we introduce a new design

for a 3-D test-bed. The setup provides accurate, general-use, automatically generated and

registered dense ground truth data and their corresponding intensity data. These data sets

are available for the computer vision community through the CVIP laboratory, University

of Louisville, ftp site at ftp://egypt.spd.uofl.edu/pub/Eva Data.

B. System Overview

The proposed system setup consists of a 3-D laser scanner and a CCD camera

mounted on a metal arm with multiple joints that is attached to the scanner head. A mono-

color, usually black, screen is attached to the scanner head facing the CCD camera such that

the screen constitutes a fixed background of the object under reconstruction. The structure

of the mono-color screen and the motion mechanism ensures fixed background that facili-

tates the object segmentation task [30].

The shaft over which the scanner head is mounted, is controlled in terms of speed

and angle of rotation to capture images at specific locations on a circular path. A sequence

of NI images I0, I2, ...INI−1 can be acquired by the calibrated camera. In addition, the

scanner generates a 3-D scan of the object by rotating 360o. This reference model is used

as a ground truth for the evaluation process. The field of view of the camera is set to cover

the same size of the objects as the laser scanner. The system setup and the motion mecha-

nism are shown in Figure 6a and Figure 6b respectively.

The system has two modes of operation:

18

1. continuous mode, where the 3-D scanner is working in its normal operation to gen-

erate a 3-D model of the object, or the rotating camera acquires a video of the object

under concern. This video is used to generate panoramic images for the object, which

can be used as inputs for panoramic stereo techniques such as [31].

2. discrete mode, where images are acquired at pre-defined locations in a circular path.

These images are used as the inputs to different 3-D reconstruction techniques from

a sequence of images.

Figure 7 shows examples of acquired images in discrete mode in (a), and continuous mode

in (b).

Other design aspects of the system such as background subtraction, camera cali-

bration and system accuracy will be presented in the following sections.

C. Background Subtraction

Efficient object segmentation permits good 3-D modeling and reduces the outliers

in the final model. For fair evaluation, we should make sure that the segmentation is not

a degradation factor in the overall performance of the vision technique. For this reason,

we propose a hardware solution by attaching a mono-color screen to the rotating shaft of

the 3-D scanner. This fixes the background of the acquired images and facilitates object

extraction. However, light variations and reflections could violate the mono-color assump-

tion. Therefore, we propose a secondary solution by applying a background subtraction

technique. The proposed technique is a modification to the algorithm by Elgammal [32]

that is used to subtract the background from successive frames in video sequence assuming

a fixed background scene.

In the proposed algorithm, a sequence of background images are acquired before-

hand, then the data images are acquired. A probabilistic model for the difference between

19

(a)

(b)

FIGURE 6 – The system setup. A CCD camera is mounted on the 3-D scanner head. Ascreen is attached to the scanner head against the camera to a guarantee fixed backgroundfor the test object. (a) Snapshot of the system, and (b) the system diagram.

20

(a)

(b)

FIGURE 7 – Modes of operation of the setup. (a) Discrete; sequence of images and (b)Continuous; Panoramic images.

21

background pixels is assumed: Gaussian of zero mean and covariance Σ. The probability

density estimate of the difference distribution Pe(xd − xb) is defined as

Pe(xd − xb) =1

(2π)1.5|Σ|0.5 exp(−0.5(xd − xb)tΣ−1(xd − xb)) (2)

where xd and xb denote the data and the background pixel intensities respectively.

If we assume that the RGB components of color images are independent with dif-

ferent σ2j for the jth color component, then

Σ =

⎡⎢⎢⎢⎢⎣

2σ21 0 0

0 2σ22 0

0 0 2σ23

⎤⎥⎥⎥⎥⎦ (3)

therefore, the density estimation can be written as

Pe(xd − xb) =3∏

j=1

1

(4π)1.5σj

exp(−1

4

(xdj− xbj

)2

σ2j

) (4)

Using this probability estimate, the pixel is considered a foreground pixel if Pe(xd−

xb) < Tp where Tp is a global threshold. The value of Tp is selected based on the histogram

of Pe(xd − xb) values.

Further processing may be needed if some foreground pixels are removed. A me-

dian filtering step can be used to restore such voids, but it can produce blurred colors and

edges. Fortunately, such blurred colors can be restored from the original image.

Figure 8a shows an image for an eagle-object. An intensity threshold is applied to

the image in Figure 8a to get the result shown in Figure 8b. As shown from the figures,

the background is not completely removed. Increasing the intensity threshold can enhance

the background removal, however with the possibility of removing foreground pixels as

shown in Figure 8c. A background image is captured as shown in Figure 8d and the back-

ground subtraction technique is applied to the image in Figure 8a. The segmentation result

is shown in Figure 8e. As shown, the background is removed with minimal errors in the

22

foreground. A value of the probability threshold Tp = 2 × 10−6 is used based on the his-

togram shown in Figure 8f.

Another example is shown in Figure 9. The figure shows results for a house-object

and the subsequent steps for background removal, with Tp = 5 × 10−6, and the makeup

using the median filter.

D. Camera Calibration

As the scanner head rotates, the attached camera should be calibrated at each new

position [33]. Geometric camera calibration [1, 34, 35] is a fundamental step in any vision

system that relies on quantitative measurements of the observed scene. Camera calibration

is the process of determining the internal camera geometric and optical characteristics (in-

trinsic parameters), and the 3-D position and orientation of the camera frame relative to a

chosen world coordinate system (extrinsic parameters).

Camera calibration is usually performed using calibration patterns [36]. A com-

mon calibration pattern consists of two white perpendicular planes, printed with orthogonal

grids of equally spaced black squares as shown in Figure 10a. The 3-D coordinates of the

vertices of each square in our chosen world coordinate frame are known. The pixel coordi-

nates of the projections of the vertices on the image plane can be determined as shown in

Figure 10b. The world-image point matches of the pattern can now be used to determine

P0, the projection matrix at the home position.

Assume that the projection matrix P0, defined up to an arbitrary scale factor, is:

P0 =

⎡⎢⎢⎢⎢⎣

p11 p12 p13 p14

p21 p22 p23 p24

p31 p32 p33 p34

⎤⎥⎥⎥⎥⎦ (5)

and given Nmatch 3-D points, Nmatch > 6, of an object and the corresponding 2-D points

in its projected image, the 11 unknowns of P0 can be computed.

23

(a) (b) (c)

(d) (e)

0 0.5 1 1.5 2 2.5 3 3.5 4

x 10−6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Pe

Nor

mal

ized

his

togr

am

(f)

FIGURE 8 – Background subtraction versus intensity threshold. (a) an original image (b)segmentation using an intensity threshold, (c) with greater value of the intensity threshold(d), a background image, (e) segmentation using background subtraction, and (f) histogramof the probability estimate values.

24

(a) (b)

(c) (d)

(e)

FIGURE 9 – Background subtraction. (a) an original image (b) a background image, (c)results after background subtraction with Tp = 6 × 10−6, (d) results after application ofmedian filter and (e) recovery of original colors to fix blurring introduced by the medianfilter.

25

(a) (b)

FIGURE 10 – Camera calibration. (a) the calibration pattern (b) selected points on thepattern’s image.

Since the relation between a 3-D point and its corresponding 2-D point is

w

⎡⎢⎢⎢⎢⎣

x

y

1

⎤⎥⎥⎥⎥⎦ =

⎡⎢⎢⎢⎢⎣

p11 p12 p13 p14

p21 p22 p23 p24

p31 p32 p33 p34

⎤⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎣

X

Y

Z

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(6)

where w is a scalar, then for each pair i of corresponding points, the image points are:

xi =p11Xi + p12Yi + p13Zi + p14

p31Xi + p32Yi + p33Zi + p34

(7)

yi =p21Xi + p22Yi + p23Zi + p24

p31Xi + p32Yi + p33Zi + p34

(8)

which can be arranged in 2Nmatch linear equations in the matrix unknowns in the form

WPc = 0 (9)

26

where,

W =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

X1 Y1 Z1 1 0 0 0 0 −x1X1 −x1Y1 −x1Z1 −x1

0 0 0 0 X1 Y1 Z1 1 −y1X1 −y1Y1 y1Z1 y1

X2 Y2 Z2 1 0 0 0 0 x2X2 −x2Y2 −x2Z2 x2

0 0 0 0 X2 Y2 Z2 1 −y2X2 −y2Y2 −y2Z2 −y2

· · · · · · · · · · · ·

· · · · · · · · · · · ·

· · · · · · · · · · · ·

XN YN ZN 1 0 0 0 0 −xNXN −xNYN −xNZN −xN

0 0 0 0 XN YN ZN 1 −yNXN −yNYN −yNZN −yN

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(10)

and Pc = (p11, p12, ........, p33, p34)T . Then the unknowns can be recovered by using the

singular value decomposition of W as:

W = USVT (11)

The solution is the eignvector V corresponds to the smallest eignvalue in the main diagonal

of S. We usually use Nmatch > 6 even it is possible to use Nmatch = 6 since we have 11

unknowns, however an over-determined solution is desirable in the case of noisy measure-

ments.

To obtain high accuracy in estimating the projection matrix, a good initial guess

of the solution is needed to employ a nonlinear error minimization technique. The used

technique is based on Robert’s method [37].

After calibrating the camera at the initial position, a new projection matrix is needed

at each new position in the circular path. To compute the new projection matrix, we simply

need to update the extrinsic parameters of the camera. The projection matrix relates the

world coordinates with the image coordinates. It is composed of two matrices representing

the intrinsic, K, and extrinsic, D, parameters of the camera as described by Equations (108)

27

and (110), respectively in Appendix II. The intrinsic parameters of the camera are not

changing from one view to another since we use the same camera with the same optical

settings. We assume pure rotation of the camera around the Y-axis, so the translational and

the rotational components remain unchanged except for the rotation angle β around the

Y-axis. The extrinsic matrix is then updated at each new position in the circular path due

to the following formula:

Dk = Dk−1Rβk, (12)

where k = 1, 2, ......, NI−1 and Rβkis a 4×4 matrix representing an Euclidian transforma-

tion with a non-zero parameter βk. The detailed description of the rotation and translation

components of the the matrix D is presented in Appendix II. The transformation matrix

Rβkcan be expressed as:

Rβk=

⎡⎢⎢⎢⎢⎢⎢⎢⎣

cos(βk) 0 sin(βk) 0

0 1 0 0

− sin(βk) 0 cos(βk) 0

0 0 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(13)

For equidistant motion where β1 = β2 = ..... = βNI−1 = β then,

Pk = P0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

cos(kβ) 0 sin(kβ) 0

0 1 0 0

− sin(kβ) 0 cos(kβ) 0

0 0 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(14)

Therefore, a sequence of calibrated images I0, I1, ...INI−1 is generated. These images are

used as the input data to the vision technique under-test.

E. Setup Accuracy

Since the rotation angle of the scanner head is pre-assumed, subsequent errors could

result if actual rotation of the head does not follow the pre-assumed rotation. As a result,

28

we put an upper bound on the rotation angle error. The proposed setup has to be accurate

to the limit of this upper bound, otherwise subsequent errors could affect the accuracy of

the evaluation process [38].

Assuming that the scanner head rotates by an angle β (radians) from the initial

position, the relation between the image coordinates and the world coordinates through the

projection matrix at a new position can be written as follows:

⎡⎢⎢⎢⎢⎣

u

v

w

⎤⎥⎥⎥⎥⎦ = P0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

cos(β) 0 sin(β) 0

0 1 0 0

− sin(β) 0 cos(β) 0

0 0 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎣

X

Y

Z

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

= P0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

X cos(β) + Z sin(β)

Y

−X sin(β) + Z cos(β)

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(15)

where u = wx and v = wy. If an inaccuracy of amount �β (radians) is assumed, then

⎡⎢⎢⎢⎢⎣

u + �u

v + �v

w + �w

⎤⎥⎥⎥⎥⎦ = P0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

cos(β + �β) 0 sin(β + �β) 0

0 1 0 0

− sin(β + �β) 0 cos(β + �β) 0

0 0 0 1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

⎡⎢⎢⎢⎢⎢⎢⎢⎣

X

Y

Z

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(16)

= P0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

X cos(β + �β) + Z sin(β + �β)

Y

−X sin(β + �β) + Z cos(β + �β)

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

= P0

⎡⎢⎢⎢⎢⎢⎢⎣

X(cos(β) cos(�β) − sin(β) sin(�β)) + Z(cos(β) sin(�β) + sin(β) cos(�β))

Y

−X(cos(β) sin(�β) + sin(β) cos(�β)) + Z(cos(β) cos(�β) − sin(β) sin(�β))

1

⎤⎥⎥⎥⎥⎥⎥⎦

29

for small values of β (radians) we can use the approximations: sin(Δβ) ≈ Δβ and

cos(Δβ) ≈ 1 then we get:

⎡⎢⎢⎢⎢⎣

u + �u

v + �v

w + �w

⎤⎥⎥⎥⎥⎦ = P0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

X cos(β) + Z sin(β) + �β(−X sin(β) + Z cos(β))

Y

−X sin(β) + Z cos(β) −�β(X cos(β) + Z sin(β))

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(17)

Let fβ(X,Z) = X cos(β) + Z sin(β) and gβ(X,Z) = −X sin(β) + Z cos(β), then

⎡⎢⎢⎢⎢⎣

u + �u

v + �v

w + �w

⎤⎥⎥⎥⎥⎦ = P0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

fβ(X,Z)

Y

gβ(X,Z)

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

+ �βP0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

gβ(X,Z)

0

−fβ(X,Z)

0

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(18)

Therefore, ⎡⎢⎢⎢⎢⎣

�u

�v

�w

⎤⎥⎥⎥⎥⎦ = �βP0

⎡⎢⎢⎢⎢⎢⎢⎢⎣

gβ(X,Z)

0

−fβ(X,Z)

0

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(19)

then,

�u = �β(p11gβ(X,Z) − p13fβ(X,Z))

and

�v = �β(p21gβ(X,Z) − p23fβ(X,Z))

since �x ≈ �uw

and �y ≈ �vw

, then

�x =�β(p11gβ(X,Z) − p13fβ(X,Z))

p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34

(20)

and

�y =�β(p21gβ(X,Z) − p23fβ(X,Z))

p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34

(21)

30

to get |�x|, |�y| ≤ 0.5 pixels, then

|�β| ≤ 0.5|p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34|

|p11gβ(X,Z) − p13fβ(X,Z)| (22)

and

|�β| ≤ 0.5|p31fβ(X,Z) + Y p32 + p33gβ(X,Z) + p34|

|p21gβ(X,Z) − p23fβ(X,Z)| (23)

combining the above two constraints, we get an upper bound on the rotation error, �β, as:

|�β| ≤ min1≤i≤card(M)

(0.5|p31fβ(Xi, Zi) + Yip32 + p33gβ(Xi, Zi) + p34|

max(|ax|, |ay|)

)(24)

where ax = (p11gβ(Xi, Zi) − p13fβ(Xi, Zi)), ay = (p21gβ(Xi, Zi) − p23fβ(Xi, Zi), M is

the 3-D data set under test and card(M) is the cardinality of M.

Equation (24) shows that we can get an upper bound on the rotation error in terms of

camera parameters, the assumed rotation angle and coordinates of 3-D data set to achieve

desired accuracy of generated 2-D images. For different rotation angles, it is clear from

Figure 11 that very little value of error is permitted in the rotation angle before ± 0.5 pixel

error in the image coordinates would happen.

Since the rotation is around the Y-axis, it is quite understandable that the severe

effect on x-coordinate in the image is due to rotation angle error. However, this should not

affect the accuracy in the y-coordinates of the image as it is shown in Figure 11. However,

this effect is mostly due to the slight deviation from the assumption that the rotation is only

around the Y-axis in the world coordinates. Fortunately, the value of the rotation error in

commercial 3-D scanners is less than 0.1 degrees. This value is less than the 0.2 degrees

specified by the upper bound on the rotation error as shown in Figure 11.

F. Summary

In this chapter, we have presented a new design of an experimental setup dedicated

to the performance evaluation tasks. The setup provides accurate, general-use, automat-

ically generated and registered dense ground truth data and their corresponding intensity

31

10 20 30 40 50 60 70 80 900

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

β (degree)

up

per

bo

un

d o

n Δ

β (d

egre

e) y directionx direction

FIGURE 11 – System accuracy. Upper bounds on rotation errors if absolute error of ≤ 0.5pixels in x and y image coordinates is assumed.

data. Two submodules of the setup: the camera calibration and the object segmentation, are

carefully designed to ensure the quality of the generated data. The accuracy of the system

is investigated and a least upper bound on the error of the motion mechanism is provided.

Introducing this system to the vision community will certainly bridge a gap in the

way of creating standard dense ground truth data and input data for the performance evalu-

ation of many 3-D reconstruction techniques. To make this system available for public use,

we publish the collected data for different test objects on the ftp site:

ftp://egypt.spd.uofl.edu/pub/Eva Data.

32

CHAPTER III

A NOVEL TECHNIQUE FOR 3-D DATA REGISTRATION AS APRE-EVALUATION STEP

Data registration is a crucial step in the performance evaluation procedures that aim

at localizing errors in the given measured data. Under the performance evaluation topic,

the given measured data should be accurately registered/aligned to the ground truth data

such that the registration process does not affect the accuracy of the consequent evaluation

steps. Using conventional registration methods, the outcome of the registration process

would be questionable if the measured data set is being a corrupted version of the ground

truth data set. Unfortunately, the registration of such data types is an unavoidable step in

most evaluation procedures.

To cope with this problem, another registration methodology that can go beyond

the conventional ones should be used. Here we propose a novel approach for 3-D data

registration. The performance of the approach is totally independent of the measured data

set, which is probably subjected to distortion, since the approach employs error-free snap-

shots of the 3-D object instead of its measured reconstruction. The key advantage of this

approach is that it keeps the registration process away from being affected by the probably

corrupted data sets, hence it permits confidential evaluation results.

A. 3-D Data Registration

Data registration is a common problem in computer vision. Applications include

object recognition, surface matching, pose estimation, data fusion and our concern, the per-

formance evaluation. The registration process aims at placing the data sets into a common

33

reference frame by estimating the transformation parameters between data sets. The key

problem with any registration technique is that the correspondences between data points

are not known a-priori.

One of the common techniques used to solve the registration problem is the Itera-

tive Closest Point (ICP) technique [22]. The algorithm is simple and efficient however it

needs a good initial estimate otherwise it will be stuck to a local minima. In addition, the

algorithm is sensitive to the statistical outliers [22]. Many techniques have been introduced

in the literature to provide robustness to the ICP method e.g. [23–25]. However, other dis-

tortion models were not treated by these studies.

Other 3-D registration techniques that rely on the selection of distinct features in

the data under registration could be used instead of the ICP approaches [26–29]. However,

selection and matching of these features are challenging tasks for these techniques when

corrupted data sets are manipulated. Manual selection and matching of features could be a

solution under these circumstances as in [21].

We provide an example of 2-D registration of distorted data. In this example, the

2-D shape model is shown in Figure 12a. A copy of this shape is shifted by 20 pixels in

the negative y-direction and clipped from the bottom by 20 pixels as shown in Figure 12b.

We assume that we can match the five numbered-features shown in both figures. The initial

alignment of the two shapes is shown in Figure 12c. The Mean Square Error (MSE) crite-

rion is used to measure the level of alignment when the displacement Δy changes from 0

to 30. The value of the desired displacement is Δydes = 20. Using the MSE criterion, the

optimal displacement Δyopt is shown to be 30 pixels which gives the alignment result in

Figure 12d.

When the absolute difference criterion is used instead of MSE, the correct align-

ment is reached at Δyopt = ydes = 20 pixels. The result of using the absolute difference

is shown in Figure 12e. We repeated this experiment using only four features excluding

34

feature number 5. Using the absolute error criterion, many optimal solutions are detected

at Δyopt ∈ {20, 21, ....., 39, 40}. This adds ambiguity to the alignment process. The regis-

tration result for Δyopt = 40 is shown in Figure 12f.

If we chose to use the distance-based matching, such as closest point matching using

MSE [22], rather than feature-based matching, the result is shown to be similar to that

in Figure 12d, which is not the desired solution. Plots of different errors with different

matching criteria are shown in Figure 13.

We conclude from the previous counter example that the convergence to the desired

registration solution, either by feature-based approaches or distance-based approaches, may

not be guaranteed if one member of the data sets under-registration is subjected to distor-

tion.

In fact, the 3-D registration of low quality data sets is not a simple task. For in-

stance, the registration of the model data in Figure 14a to the measured data in Figure 14b

could be a difficult task for any 3-D registration technique whether it is feature-based or

distance-based. However, the registration results shown in Figure 14c can be reached by

the proposed Registration Through Silhouettes (RTS) technique [39].

The proposed approach does not rely on the presence of the 3-D reconstruc-

tion under-test during the registration phase. This gives an advantage to our approach,

since the 3-D reconstruction could be of low quality that might add difficulties to any 3-D

registration technique. In addition, if the actual 3-D reconstruction under test was used

in the registration phase, then some errors that the evaluation process tries to investigate

might disappear during the minimization step used by any 3-D registration technique. The

proposed approach employs silhouette images to align the given data sets. Undistorted sil-

houette images can be generated easily, hence permitting good data sets for the registration

process.

Image silhouettes have been used for many computer vision applications such as

35

2

1

3

4

5 x

y

1

2 3

4 5

(a) (b)

(c) (d)

(e) (f)

FIGURE 12 – An example of registration under distortion. (a) the original shape, (b) thedistorted shape (clipped bottom), (c) initial registration of shapes in (a) and (b). Optimalregistration results by matching the five features 1-5 indicated in (a) and (b) by minimiz-ing: (d) the Mean Square Error (MSE) (incorrect result) and (e) the absolute error (correctresult). (f) An optimal registration result by matching the four features 1-4 and minimizingthe absolute error (incorrect result).

36

0 5 10 15 20 25 30 35 400

100

200

300

400

500

600

700

800

Δ y

MS

E

0 5 10 15 20 25 30 35 408

10

12

14

16

18

20

22

24

26

28

Δ y

Ab

solu

te E

rror

(a) Δyopt = 30 (b) Δyopt = Δydes = 20

0 5 10 15 20 25 30 35 4010

12

14

16

18

20

22

24

26

28

30

Δ y

Ab

solu

te E

rror

0 10 20 30 40 50 600

100

200

300

400

500

600

700

Δ y

MS

E

(c) Δyopt ∈ {20, 21, ..., 39, 40} (d) Δyopt = 30

FIGURE 13 – Registration errors with different error criteria and matching strategies. (a)mean square error with five-points matching, (b) absolute error with five-points matching,(c) absolute error with four-points matching, (d) mean square error with the closest distancematching.

37

(a) (b)

(c)

FIGURE 14 – A difficult 3-D data registration case. (a) ground truth data (b) corruptedmeasured data, (c) 3-D data registration using the proposed Registration Through Silhou-ettes (RTS) technique.

38

shape recovery [10], texture mapping [40], pose estimation [41], and camera calibration [42].

Since silhouettes are insensitive to colors and they can encode useful information about the

3-D pose, they are used in the proposed approach.

We consider the 3-D registration step the core of any evaluation work that employs

3-D ground truth data provided that it does not affect the accuracy of the evaluation pro-

cess itself. Employing an efficient 3-D registration technique could make the design of

evaluation methodologies a straightforward task. In addition, it helps along with a suit-

able evaluation methodology in localizing errors in the 3-D reconstruction under test. This

localization step is necessary for diagnosis and data fusion post-evaluation techniques.

B. 3-D Data Registration Through Silhouettes (RTS)

Since our goal is to evaluate the quality of a given data set M of measured points

generated by a given 3-D reconstruction technique X, the ground truth data set G should be

aligned with M.

1. An Overview of the Approach

Since we evaluate a 3-D reconstruction M from a calibrated sequence of images,

a set Sin of silhouettes can be generated. In addition, we use G to generate another set

of silhouettes, SG , at the same views as the set Sin. In the ideal case when M and G are

initially registered, Sin and SG are aligned. However, in most cases a certain transformation

T is needed to align G with M. Applying T iteratively to G to get SG such that the error

between Sin and SG is minimal will lead to getting the best T that brings G and M into

match.

As a formal 3-D rigid registration problem, the goal is to find the transformation

T (R, t) where R is a 3 × 3 rotation matrix with 3 degrees of freedom (DOF): θX , θY , and

39

θZ and t is a 3-D translation vector that has 3 DOF: tX , tY , and tZ such that the energy E is

minimal where,

E =∑

i

d2(mi ∈ M, T (R, t)(gi ∈ G)) (25)

and where d denotes the Euclidean distance. Since M is not an ideal reconstruction and the

minimization could be difficult to be performed directly in the 3-D coordinates, we reduced

the problem into 2-D minimization through silhouettes. We assume that M is not available

at the registration phase but its calibrated silhouette set Sin is available. We generate SG by

projecting G to the same views as of Sin.

2. The Registration Procedure

For each iteration i = 1, ...., Nmax, where Nmax is the maximum number of itera-

tions, the registration parameters (θiX , θi

Y , θiZ , tiX , tiY , tiZ), are used to find the transformed

set Gi+1, where:

Gi+1 = T (Ri, ti)Gi (26)

In general,

T (R, t) =

⎡⎢⎣ R t

0T 1

⎤⎥⎦ (27)

For each set Gi, a corresponding set of silhouettes SGiof cardinality Ns is generated where:

SGi= {sl

Gi: sl

Gi⊂ SGi

, l = 1, ....., Ns} (28)

40

for each point I lGi

(xlGi

, ylGi

) ∈ slGi

and gli = (X l

Gi, Y l

Gi, Z l

Gi) ∈ Gi which is visible at view l,

the following relation holds at the proper projection matrix Pl such that:

cli

⎡⎢⎢⎢⎢⎣

xlGi

ylGi

1

⎤⎥⎥⎥⎥⎦ = Pl

⎡⎢⎢⎢⎢⎢⎢⎢⎣

X lGi

Y lGi

Z lGi

1

⎤⎥⎥⎥⎥⎥⎥⎥⎦

(29)

where c is a scalar value and

I lGi

(k1, k2) =

⎧⎪⎨⎪⎩

L1, if k1 = xlGi

and k2 = ylGi

;

L2, otherwise.(30)

where, L1, and L2 are two gray levels, 1 ≤ k1 ≤ Nh, 1 ≤ k2 ≤ Nw, and Nh × Nw is the

cardinality of slGi

.

For a sequence of Ns input images I l, a corresponding set of silhouettes Sin can be extracted

as:

Sin = {slin : sl

in ⊂ Sin, l = 1, ....., Ns} (31)

such that for each point I lin(k1, k2) ∈ sl

in

I lin(k1, k2) =

⎧⎪⎨⎪⎩

L1, if I l(k1, k2) is a silhouette point;

L2, otherwise.(32)

The error criterion Ei is defined as:

Ei =1

NsNhNw

Ns∑l=1

Nh∑k1=1

Nw∑k2=1

[I lin(k1, k2) − I l

Gi(k1, k2)]

2 (33)

then an optimization algorithm is needed to find the solution of

min(E |R,t) → minR,t

(E) (34)

a minimization procedure is described in the next section.

41

a. A Two-step Minimization We use a Genetic Algorithm (GA) [43] to mini-

mize Equation (33). To apply GA to our registration problem, we encoded the transfor-

mation parameters as genes. Each parameter is encoded by 16 bits. The genes are formed

by concatenating six binary coded parameters, the angles of rotation; θX , θY , θZ and the

translation components tX , tY , and tZ . The crossover operation occurs at multi points

along the gene with probability pc = 0.95. A mutation rate of 0.01 is usually used. Since

GA maximizes an objective function, we used the following objective function F to be

maximized:

F =1

E (35)

Since GA is a global search method that converges at infinity, we used it only to get

a primary solution to a local search method. Here we used the Nelder-Mead (NM) simplex

as a local search method. It is important to note that other suitable optimization techniques

can replace the (GAs + simplex) solution such as simulated annealing, etc. without affect-

ing the validity of the RTS technique.

In the presented algorithm, the convergence of the optimization techniques and

hence the performance of RTS technique depends on the selection of Sin and to what ex-

tent the silhouettes are distinct. Symmetric objects that have similar silhouettes are of no

interest under the evaluation topic. Simply, if needed they can be synthetically generated.

In practice, a subset of Sin that consists of 4 orthogonal silhouettes provides enough con-

straints on the shape of moderate complexity objects. In general, at least two silhouettes

generated by two non-collinear cameras should be used to avoid the degenerate case where

corresponding points from unregistered objets lie on the same optical ray. As shown in

Figure 15 the geometric distance between 3-D points X1 and X2 is vanished in image of

camera Oa.

Proposition 1. At least two silhouettes generated by two non-collinear cameras should be

used by the RTS approach.

42

Proof : (using Figure 15)

Assume that X1 �= X2 and X1 and X2 lie on the optical ray Oax or equivalently line

L1 (a degenerate case for camera Oa). Assume that another degenerate case can happen

in camera Ob, i.e. X1, X2 ∈ L3. Then this means that either X1 = X2, contradiction, or

L2 ≡ L3, i.e. cameras Oa and Ob are collinear.

This proves that the degenerate cases, where corresponding points from the unreg-

istered data sets lie on the same optical ray, can not happen simultaneously in two cameras

unless they are collinear. Therefore, to avoid such cases at least two silhouette images

generated by two non collinear cameras should be used.

b. Occluding Contours as Replacements to the Silhouettes The object occlud-

ing contours can be used instead of the silhouettes in our approach to reduce the redundancy

in the silhouette images. A sequence of pre-processing operations such as image filtration

and edge detection are applied to the sets Sin and SG to generate sets of contour images:

Cin and CG , respectively. These sets are defined as:

Cin = {clin : cl

in ⊂ Cin, l = 1, ....., Ns} (36)

such that for each point J lin(x, y) ∈ cl

G

J lin(x, y) =

⎧⎪⎨⎪⎩

J lin(xc, yc) = 1, if J l

in(x, y) is a contour point;

0, otherwise.(37)

and

CG = {clG : cl

G ⊂ CG, l = 1, ....., Ns} (38)

such that for each point J lG(x, y) ∈ cl

G

J lG(x, y) =

⎧⎪⎨⎪⎩

J lG(xc, yc) = 1, if J l

G(x, y) is a contour point;

0, otherwise.(39)

for each point J lin(xc, yc)|j ∈ {J l

in(xc, yc)}we find the closest point, as in [22], J lG(xcp, ycp)|j ∈

{J lG(xc, yc)} for j = 1, ....., Nc = card({J l

in(xc, yc)}). Then the error criterion E can be

43

FIGURE 15 – A degenerate case for silhouettes alignment. The points X1 and X2 are twocorresponding 3-D points in space. To align these points using silhouettes, a non-zerogeometric distance between their image points, x1 and x2 should be detected (as in cameraOb). However, if X1 and X2, lie on the same optical ray (as in cameras Oa and Oc), thentheir projections degenerate to a single point, hence a zero geometric distance is detected.To avoid such case, at least two silhouettes images from non-collinear cameras should beused.

44

written as:

E =1

NsNc

Ns∑l=1

Nc∑j=1

d2(J lin(xc, yc)|j, J l

G(xcp, ycp)|j) (40)

The occluding contours are more geometrically descriptive than the silhouettes,

however they need additional preprocessing operations such as image filtration and edge

detection. Errors in extracting such contours could affect the convergence of the registra-

tion process. Throughout our implementation of the RTS algorithm we use the occluding

contours in the second optimization step using the simplex method, since computing the

occluding contours and using the closest point criterion could increase the computation

complexity of the first optimization step by the genetic algorithm.

c. An Evaluation Criterion for the RTS Approach The RTS approach is a self-

evaluating approach. The average distance dlav for each image l defined as:

dlav =

1

Nc

Nc∑j=1

d(J lin(xc, yc)|j, J l

G(xcp, ycp)|j) (41)

can be used as a measure of the quality of the RTS approach. An error distance dexp =√

2 is

expected due to truncation errors of image re-projection and the preprocessing approaches

of image filtration and edge detection. A range of dav ∈ [√

2 2√

2] is considered an

expected range for good registration. This can be used as a stopping criterion for the ap-

proach, as well.

The distance error can be expressed as error ratio in dB using what we call the coincidence

index (CI) as follows:

CI = 20 log

√2

dav

(42)

with expected range of good quality of [0 − 6] dB.

C. Results and Discussion

In this section, we present experimental results for the RTS approach. A number

of 36 images are acquired for a house-object. Out of these images, we select 4 images

45

of consequent orthogonal views at angles of 0, 90, 180, and 270 as shown in Figure 16a,

from left to right respectively. Their silhouettes are shown in Figure 16b. A reference 3-D

model for the house-object is generated by the 3-D laser scanner. Only 10% of 3-D points

of the scanner model is used to generate the silhouettes by projecting the model using the

projection matrices in Equation (14). These silhouettes are generated at the same views of

Figure 16b as shown in Figure 16c.

The search parameters are initialized with random values. A Genetic Algorithm

(GA) with crossover rate of 0.95, mutation rate of 0.01 and a population size of 20 is ap-

plied to the given sets of silhouettes. The silhouettes of the 3-D scanner model resulted

after running 100 generations/iterations of GA are shown in Figure 16d. The search pa-

rameters gained from GA step were applied to the next optimization step using the local

search algorithm. 100% of scanner data are used in this step to get accurate results. The

results after running the simplex method for 150 iterations are shown in Figure 16e.

The convergence of the search parameters is plotted for the six parameters in Fig-

ure 17. It is clear from the Figure that GA provided good approximation for the parameters

after less than 50 iterations and the simplex method has refined the solution after less than

100 iterations.

To validate these results we plotted the ground truth data, before applying the RTS

algorithm, in the same plot with the 3-D reconstruction of the house-object generated by

a technique X. The reconstruction by technique X plays no role in the registration pro-

cess, but we plotted it to show the relative positions of the reference frames of the two

reconstructions before applying the RTS technique as shown in Figure 18a. To show the

alignment after applying RTS, the two reconstructions are plotted in Figure 18b. As a stan-

dard way of showing the registration results, a mixed reconstruction is shown in Figure 18c

where the blue patches are generated by the scanner and the red patches are generated by

the technique X.

46

The average distance dav and coincidence index CI are used to quantify the regis-

tration results. The dav and CI values are computed at each view of the input images set as

shown in Figures 19a and 19b, respectively. Some values of dav and their equivalent CI

values are beyond the expected range of good quality as indicated by the values at views

No. 10, 11, 12, and 13. This situation is predicted since the√

2-distance value, which

is assumed as a reference error, is an expected value not an exact value. The maximum

error distance (dav = 2.13 pixels) or the minimum coincidence index (CI = −3.54dB) is

detected at view No. 21.

The final contour at view No. 21, the maximum-error view, is shown in Figure 20.

At this view, a slight error is noticed which is due to inaccuracy of estimating the angle

θZ . Some errors are expected at certain views that were not used through the optimization

phase, as in view No. 21. Increasing the number of silhouettes in the optimization step

would reduce the overall error, however, with the expense of increasing the overall exe-

cution time. The statistics of the average distance of this experiment show a mean value

of 1.63 pixels and standard deviation (std) of 0.19 pixels. These statistical values indicate

good quality registration.

Another experiment on house data sets is performed. A copy of the registered

ground truth data is transformed by known translation and rotation parameters. The RTS

approach is applied to the transformed data set. Only 5% of the scanner data are used

through the optimization phase using the genetic algorithm. The genetic crossover rate is

set to 0.95 and the mutation rate is 0.01 while the population size is set to 30. The second

optimization step is performed using the simplex method applied to the occluding contour

images. After 100 generations/iterations by the genetic algorithm followed by 150 iter-

ations using the simplex method, the RTS approach was able to converge to the desired

parameters. Figure 21 shows the convergence of the registration parameters to the desired

values indicated by the dashed lines.

47

TABLE 1CONVERGENCE OF THE RTS APPROACH TO THE DESIRED VALUES.

Parameter θX (rad) θY (rad) θZ (rad) tX (mm) tY (mm) tZ (mm)

Initial values 0.025 0 0.025 -5 10 -10

Desired values 0 -0.5 0.01 0 20 -20

Final values 0.0008 -0.4610 0.0106 -0.0308 21.5558 -20.3151

Table 1 shows the initial, desired and final values of the registration parameters.

Slight deviations of the final values of θY and tY parameters from the desired values are

noticed. These deviations are due to the sticking of the simplex approach to a local min-

ima, which is a known disadvantage of the local search methods. The average distance and

the coincidence index measures are plotted for this experiment before and after applying

the RTS approach as shown in Figure 22. This shows the error reduction after applying

the RTS approach. Slight degradation in the mean value of the average distance is noticed

due to errors in θY and tY compared to the previous experiment. Visual results for align-

ment of the scanner contours and the image contours before and after applying the RTS

approach are shown in Figure 23. Note the deviation of θY and tY from the desired values

(Y-direction of 3-D space is the same as of the y-direction of the image).

In general, the accuracy of the registration depends on the number of the distinct

silhouettes used in the optimization phase. Of course, the greater the number the more ac-

curate the results, however in the GA optimization step, the greater the number, the greater

the time required to estimate the objective function especially when a large population is

assumed. That is why we used lower percentages of the scanner data to reduce the run time

in the GA optimization step.

48

D. Summary

In this chapter, a novel technique for 3-D data registration is presented. This tech-

nique is dedicated to the evaluation procedures that aim at localizing errors in the data

under-test. The proposed approach does not rely on the presence of the 3-D reconstruction

under test during the registration phase. This gives a major advantage to this approach,

since the 3-D reconstruction could be of low quality. Such low quality situation is expected

to introduce difficulties to any 3-D registration technique. In addition, if the actual 3-D

reconstruction under test was used in the registration phase, then some errors that the eval-

uation process tries to investigate might disappear during the minimization step used by

any 3-D registration technique. The proposed approach employs silhouette images to align

the given data sets. Undistorted silhouette images can be generated easily, hence permitting

good data sets for the registration process. The approach is simple and efficient as shown

by the experimental results presented in this chapter.

49

(a)

(b)

(c)

(d)

(e)

FIGURE 16 – Registration Through Silhouettes (RTS) Results. (a) Input images: from leftto right, four input images at 0, 90, 180, and 270 angles, respectively. (b) Four silhouettesfrom input images at 0, 90, 180, and 270 angles. (c) Initial silhouettes re-projected fromthe 3-D model generated by the 3-D laser scanner at same angles as in (b) using 10% of thescanner data (d) final silhouettes after applying the genetic algorithm, (e) final silhouettesafter applying the simplex method starting with the parameters generated by the geneticalgorithm.

50

50 100 150 200 250−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0

Iteration No.

θ x (ra

d)

genetic simplex

50 100 150 200 250−0.26

−0.24

−0.22

−0.2

−0.18

−0.16

−0.14

−0.12

−0.1

−0.08

Iteration No.

θ y (ra

d)

genetic simplex

(a) (b)

50 100 150 200 250−0.09

−0.08

−0.07

−0.06

−0.05

−0.04

−0.03

−0.02

−0.01

0

Iteration No.

simplex genetic

θ z (ra

d)

50 100 150 200 25024

26

28

30

32

34

36

simplex

Iteration No.

genetic

t x (m

m)

(c) (d)

50 100 150 200 250−70

−68

−66

−64

−62

−60

−58

t y (m

m)

Iteration No.

simplex genetic

50 100 150 200 25035

36

37

38

39

40

41

42

43

44

Iteration No.

simplex genetic

t z (m

m)

(e) (f)

FIGURE 17 – Registration parameters. (a) The rotation around X-axis, (b) the rotationaround Y-axis, (c) the rotation around Z-axis, (d) the translation in X-direction, (e) thetranslation in Y-direction, (f) the translation in Z-direction.

51

(a) (b)

(c)

FIGURE 18 – 3-D registration visual results. (a) Unaligned 3-D reconstructions, (b) afterregistration using the RTS technique, and (c) selected patches from each reconstruction.

52

5 10 15 20 25 30 35

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

View No.

Ave

rage

Dis

tanc

e (p

ixel

)

(a)

5 10 15 20 25 30 35−7

−6

−5

−4

−3

−2

−1

0

1

View No.

Coi

ncid

ence

Inde

x (d

B)

(b)

FIGURE 19 – 3-D registration quantitative results. (a) The average distance and (b) Coin-cidence Index (CI).

53

(a) (b)

(c) (d)

FIGURE 20 – Rendered views to show the alignment of the ground truth contours (blue)and the image contours (red.) (a) CI = 0.65dB, (b) CI = −1dB, (c) CI = −1.9dB, and(d) CI = −3.5dB.

54

0 50 100 150 200 250−0.01

0

0.01

0.02

0.03

0.04

0.05

Iteration No.

θ X (

rad)

0 50 100 150 200 250−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

Iteration No.

θ Y (

rad)

(a) (b)

0 50 100 150 200 2500.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

Iteration No.

θ Z (

rad

)

0 50 100 150 200 250−6

−5

−4

−3

−2

−1

0

1

Iteration No.

t X (

mm

)

(c) (d)

0 50 100 150 200 2500

5

10

15

20

25

30

Iteration No.

t Y (m

m)

0 50 100 150 200 250−30

−25

−20

−15

−10

−5

0

Iteration No.

t Z (

mm

)

(e) (f)

FIGURE 21 – The convergence of the registration parameters to the desired values. (a)The rotation around X-axis, (b) the rotation around Y-axis, (c) the rotation around Z-axis,(d) the translation in X-direction, (e) the translation in Y-direction, (f) the translation inZ-direction.

55

5 10 15 20 25 30 351

2

3

4

5

6

7

8

9

View No.

Ave

rage

Dis

tanc

e E

rror

(pix

el)

BeforeAfter

(a)

5 10 15 20 25 30 35−16

−14

−12

−10

−8

−6

−4

−2

0

2

View No.

Coi

ncid

ence

Inde

x (d

B)

BeforeAfter

(b)

FIGURE 22 – RTS Evaluation. (a) average distance (b) Coincidence Index (CI).

56

(a) (b)

(c) (d)

FIGURE 23 – Rendered views to show the alignment, with known parameters, of theground truth contours (blue) and the image contours (red). (a) initial alignment whereCI = −13.5dB, (b) initial alignment at orthogonal view where CI = −14dB, (c) finalalignment at the same view in (a) with CI = −1.9dB, and (d) final alignment at the sameview in (b) with CI = 0.6dB.

57

CHAPTER IV

PERFORMANCE EVALUATION: METHODOLOGIES AND MEASURES

Motivated by the objective of standardizing the evaluation process, we propose a

classification criterion. Based on this criterion we can classify the tests that measure the

performance of the 3-D reconstruction techniques into four sets: the operating conditions,

the complexity of data analysis, the generality of the measure, and the position of the test-

ing point. This classification will eventually help in providing a standard rank of different

performance evaluation tests, which is an important factor of deciding to what extent we

trust the results provided by a certain testing methodology.

In addition, we propose three performance evaluation methodologies (tests). The

first test is the Local Quality Assessment (LQA) test. This test quantifies the performance

of a given 3-D reconstruction with respect to a reference 3-D reconstruction provided by the

3-D laser scanner. It is designed to investigate local errors in the given 3-D reconstruction

by decimating it into different patches and measuring the quality of each patch. This makes

the error analysis much easier and permits the integration of different 3-D reconstruction

techniques based on the results of this test.

In contrast to the LQA test, we propose an Image Re-projection (IR) test based on

the assessment of the image quality. The test does not rely on the availability of explicit

ground truth data which, in general, are difficult to generate. The test uses the acquired

images as the reference of comparison with corresponding images, re-projected from the

given 3-D reconstruction. This test also measures the applicability of the 3-D reconstruc-

tion techniques for virtual reality problems.

To avoid errors due to color variations and the re-projection process in the IR test,

58

we propose a Silhouette-Contour Signature (SCS) methodology that extracts shape fea-

tures from silhouette and contour images and permits the inclusion of distinct, cutting,

views from the 3-D ground truth data.

A. Classification of Evaluation Techniques

Seeking for a standardization to the evaluation problem components, we propose

a classification criterion based on which we can classify different tests that measure the

performance of 3-D reconstruction techniques. Based on this classification, it will be easy

to qualify new proposed tests and identify the goals and benefits of applying such tests to

vision techniques. In addition, the classification could lead to a clue about the importance

of applying these tests.

The proposed classification is based on four sets: the operating conditions set A,

the complexity of data analysis set B, the generality of measures set C, and the position of

the test point set D:

Operating Conditions Set: based on the operating conditions we can identify two types

of tests:

• Dynamic tests: in this type, the test is performed under different conditions of light-

ing, interference, calibration, and object complexity. These tests should measure the

immunity of the vision technique to variations.

• Static tests: in this type, the test is performed under constant conditions. Actually,

these tests investigate the basic functionality of the vision technique.

Complexity of Data Analysis Set: tests could be quantitative or qualitative:

• quantitative tests: massive data are analyzed by these tests, statistical analysis can

be a part of these tests. A test is said to be quantitative if the data set under-test Mj ,

59

where Mj ⊂ M, has cardinality > βc card(M) where βc > 0.5.

• qualitative tests: the objective of these tests is to provide a quick figure of merit of

the performance of the vision technique under test. In this case, Mj has cardinality

< γc card(M) where γc < 0.5.

Generality of the Measure Set: measures could be global or local, hence we have two

type of tests:

• Global tests: these tests provide a single measure of the overall performance of the

vision technique under test. Such types of tests are of great importance because they

give a final decision on the technique’s performance.

• Local tests: these tests investigate the local errors provided by the vision technique.

Using local measures provided by the test, enhancement of the technique’s perfor-

mance could be possible.

Position of the Test Point Set: data can be tested in a form of 3-D data, a form that results

after applying a certain transformation to the 3-D data, or a form that requires a certain

transformation or criterion to get the 3-D data form. Based on this form we have three

types of tests:

• Type I tests: these tests are applied directly to the data set M. This means that

transformation C is unity. These tests are highly trusted because they work directly

on 3-D data sets avoiding errors introduced by such transformations.

• Type I+: unlike type I , these tests are applied to the data set D generated by applying

the transformation C to data set M. Errors should be predicted due to this additional

transformation step. As a result, these tests may underestimate the performance of

the given technique under-test. An example of this type is testing the data in the form

of 2-D intensity images.

60

• Type I−: like type I+, these tests are applied to the measured data set, however a

step before getting the data set M. Overestimation of the performance is predicted

when using these types of tests, because we test data in a form prior to the 3-D form.

An example of this type is testing the data in the form of disparity maps, the form of

data that needs further transformation or criterion to get the 3-D data form.

Based on the preceding classification a number of

card(A) × card(B) × card(C) × card(D)

different tests can be accomplished under this classification. The next proposition general-

izes the above formula.

Proposition 2. For disjoint test sets X1,X2, .....,Xk there exists:

card(X1) × card(X2) × .... × card(Xk) number of tests.

Proof : it is a generalization to the above formula.

According to the above classification, there are 2× 2× 2× 3 = 24 different types of tests.

B. Local Quality Assessment (LQA) Methodology

This section describes a proposed methodology for the performance characteriza-

tion of 3-D reconstruction techniques [44]. The given ground truth data and the measured

data are supposed to be registered to each other. A bounding box that contains the given

data is discritized into a number of surface patches, or voxels. A quality index is assigned

to each voxel based on the centroid and deviation from centroid measures applied to data

enclosed by that voxel. Statistical measures are applied to extract global measures from the

quality indices of the given data.

61

1. Performance Evaluation Procedure

Since the measured data, M, and G ′= T (G) have been aligned to each other, a

performance evaluation methodology can be applied to both sets to measure the similarity

between them.

Let U be a superset of M⋃

G ′that is upper bounded by point ub and lower bounded

by point lb.

Xm is a set of uniformly distributed 3-D points xjm, j = 1, 2, ..., Nm, in the space bounded

by ub and lb, and Nm is a user defined parameter.

Assume that M can be expressed as:

M =⋃j

M j, j = 1, 2, ..., Nm (43)

where M j is defined as:

M j = {m : m ∈ M, xjm − ΔX ≤ m ≤ xj

m + ΔX} (44)

where ΔX = (Δx, Δy, Δz), and Δx, Δy, and Δz are elementary distances in the space

whose values are determined by Nm, ub, and lb as:

ΔX =1

2 3√

Nm

(ub − lb) (45)

Similar definitions of G ′and G

′j are as follows:

G ′=⋃j

G′j, j = 1, 2, ..., Nm (46)

and

G′j = {g′

: g′ ∈ G ′

, xjm − ΔX ≤ g

′ ≤ xjm + ΔX} (47)

For each data subset pair (M j, G′j), we define a quality index Qj . First, we compute the

centroid of each subset assuming each 3-D point has unity mass:

CMj =1

card(M j)

card(Mj)∑i=1

(mi ∈ M j) (48)

62

and

CG′j =1

card(G′j)

card(G′j)∑

i=1

(g′i ∈ G

′j) (49)

where, card denotes the cardinality, then we compute the deviation of each subset as:

DMj =

√√√√ 1

card(M j) − 1

card(Mj)∑i=1

d2(mi ∈ M j, CMj) (50)

and

DG′j =

√√√√ 1

card(G′j) − 1

card(G′j)∑

i=1

d2(g′i ∈ G′j, CG

′j) (51)

where d denotes the distance. Define the centroid distance as:

Cjd = d(CMj , CG

′j) (52)

and the deviation ratio as:

RjD =

DMj

DG′j

(53)

then, the quality index Qj is defined as:

Qj =2Rj

D

(RjD)2 + 1

[1 − Cjd

Cmax

] (54)

where Cmax = 2√

(Δx)2 + (Δy)2 + (Δz)2.

The quality index Q has a dynamic range of [0 1] with Q=1 associates with the

highest quality. The quality index consists of two parts: the deviation index = 2RD

(RD)2+1,

shown in Figure 24a, and the centroid index = 1− Cd

Cmax, shown in Figure 24b. The highest

value is reached when the maximum similarity of the measured data and the ground truth

data is achieved. This happens when the the deviation ratio is 1 and the centroid distance

is zero. We also assume maximum similarity if both subsets of the pair (M,G ′) are empty,

however if only one subset is empty, a zero value of Q is assumed.

The definition of the Q measure is based on a feature matching criterion. This gives

an advantage to this measure over the closest distance measures, since it is insensitive to

63

0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RD

Devia

tio

n In

dex

(a)

0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Centroid distance Cd

Cen

tro

id In

dex

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

q

f Q(q

)

α=0.1,β=0.1

α=2,β=2

α=1,β=3 α=3,β=1

(c)

FIGURE 24 – Local Quality Assessment (LQA) methodology. (a) the deviation index, (b)the centroid index, and (c) beta distribution at different shaping parameters.

64

the resolution of the reconstructions under-test [21].

Examples for values of Nm are 1, 8, 27, .... In general Nm = i3, where i is a positive

integer. The smaller the value of Nm, the more tendency of Q to average errors. On the

other hand, the greater the value of Nm, the more sensitivity of Q to the outliers.

Since Q describes the quality of different patches or voxels in the reconstruction

under-test, it can be used to fuse different reconstructions by selecting the best Q of cor-

responding patches from different reconstructions. This finds many applications in areas

such as data fusion.

2. Statistical Modeling of the Quality Index

Sometimes, we need a global description to the quality of a given reconstruction

for comparison purposes. Since it is difficult to get point-to-point comparison, we use the

values of Q to compare reconstructions on patch-to-patch basis. Here we use three methods

of comparison:

• Correlation Coefficient

• Chi-Square test

• Beta modeling

The first two methods are used to get global descriptions to the relative quality of different

reconstructions, while the beta modeling method is used to provide a global description of

the absolute quality.

Correlation Coefficient: The correlation coefficient ρq1q2 of the random variables

Q1 and Q2 is defined as:

ρq1q2 =Cq1q2

σq1σq2

(55)

−1 ≤ ρq1q2 ≤ 1, |Cq1q2 | ≤ σq1σq2

65

where Cq1q2 is the covariance and σq1 and σq2 are the standard deviations of Q1 and Q2

respectively.

Chi-Square Test: The Chi-Square test is used to compare two binned data sets and

determine if they are drawn from the same distribution function:

χ2(HQ1 , HQ2) =N∑

j=1

[C1 · HQ1(j) − C2 · HQ2(j)]2

HQ1(j) + HQ2(j)(56)

where HQ1 and HQ2 denote two histograms with N bins, and

C1 =

√NHQ2

NHQ1

, C2 =1

C1

NHQ1=

N∑i=1

HQ1(i), NHQ2=

N∑i=1

HQ2(i)

The values of X 2(HQ1 , HQ2) have the range 0-1 with near 0 values indicating better match-

ing or higher similarity between two reconstructions.

The above two methods are useful in the tracking of the performance of a certain algorithm

in response to different controlling parameters.

Beta Modeling: The general formula for the probability density function of the beta

distribution is:

fQ(q) =(q − a)α−1(b − q)β−1

B(α, β)(b − a)α+β−1(57)

where α and β are the shape parameters, a and b are the lower and upper bounds, respec-

tively of the distribution, a ≤ q ≤ b; α, β > 0 and B(α, β) is the beta function.

Since Q has a dynamic range from 0 to 1, we set a = 0, b = 1 in the above formula.

Figure 24c shows different plots of fQ(q) with different values of the shaping parameters

α and β. This figure shows the flexibility of the beta distribution in providing different

probability density functions of different shapes. This makes the beta distribution a logical

choice of modeling the quality index Q. The Maximum Likelihood Estimation (MLE) pa-

rameters α̂ and β̂, extracted from a random sample of size n of the random variable Q are

66

defined as:

α̂ = q̄

[[q̄(1 − q̄)

s2

]− 1

](58)

β̂ = (1 − q̄)

[[q̄(1 − q̄)

s2

]− 1

](59)

where q̄ stands for the sample mean and s2 represents the biased sample variance. We use

these estimators to find the quality estimate Pq(Q ≥ q) at different values of q as:

Pq(Q ≥ q) = 1 − 1

B(α̂, β̂)

∫ q

0

tα̂−1(1 − t)β̂−1dt (60)

the value of Pq(Q ≥ q) provides an estimate of the quality of a given reconstruction. The

higher the values of Pq(Q ≥ q), maximum is 1, at higher values of q, maximum is 1, the

most probable that the reconstruction is of high quality. This measure, in contrast to the

above two measures, gives both absolute and relative quality assessment.

Based on the classification of the testing methodologies, the LQA test is dynamic,

quantitative, local, and type I test. It is dynamic, because it does not put any constraints

on conditions of acquiring the data under test. It is quantitative, because it permits massive

analysis of the data under test. It is local, because it provides different quality values to

different subsets of the examined data. It is type I test, because it is applied directly to the

3-D data under test without using any transformation criteria.

An example for the Q values of two different reconstruction, M1 and M2, referenced

to the same 3-D ground truth data, using Nm = 216, is shown in Figure 25a. The M1

reconstruction is similar to the one shown in Figure 14a while the M2 is similar to the

one shown in Figure 14b. Comparing the values of Q measure for each patch of these

reconstructions, the M1 has registered higher quality values than M2. This also can be

shown by the quality estimate Pq(Q ≥ q) in Figure 25b. A specific value of q such as 0.9

can be chosen to get a specific estimate.

Since the LQA testing methodology assumes the availability of 3-D ground truth

data registered to the measured data, the popularity of this test could be limited. Other

67

20 40 60 80 100 120 140 160 180 2000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Voxel No.

Quali

ty In

dex Q

M1

M2

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

P q(Q≥

q)

M1

M2

(b)

FIGURE 25 – LQA test applied to different two reconstructions registered to the groundtruth data. (a) The quality index at each patch, and (b) the corresponding quality estimate.

68

evaluation methods are presented in the next sections to deal with the unavailability of such

ground truth data.

C. Image Reprojection (IR) Test

In cases where a 3-D ground truth is not available, the input images can be used as a

reference for comparison. Unlike [18] we use calibrated images, hence we have the ability

to re-project the given 3-D data into new different views without using prediction tech-

niques that may introduce errors in the generated new views. In addition, this technique is

general for any 3-D reconstruction technique, since we start from the measured 3-D points

not any other transformed form such as the disparity maps [45, 46].

Since image rendering finds many applications in virtual reality, we measure the

ability of the 3-D reconstruction technique to generate almost real images by the re-projection

process. Therefore, we test the re-projected images under an image quality framework un-

like the method in [14].

1. Image Quality Measures

Considering D and G as 2-D images, the signal to noise ratio (SNR) and peak signal

to noise ratio (PSNR) are used as quality measures.

SNR and PSNR are mean-squared (l2-norm) error measures [47]. SNR is defined

as the ratio of average signal power to average noise power. For an M × N image

SNR(dB) = 10 log10

( ∑i,j g(i, j)2∑

i,j (g(i, j) − d(i, j))2

)(61)

for 1 ≤ i ≤ M and 1 ≤ i ≤ N where g(i, j) denotes pixel (i, j) of the standard image and

d(i, j) denotes pixel (i, j) of the data image. PSNR is defined as the ratio of peak signal

69

power to average noise power

PSNR(dB) = 10 log10

(p2

mMN∑i,j (g(i, j) − d(i, j))2

)(62)

where pm is the maximum peak-to-peak swing of the image gray levels (255 for 8-bit

images).

Z. Wang et al. [48] proposed an Image Quality Measure (IQM) which models the

image degradation as structural distortion instead of errors. This quality measure, IQM, is

defined as:

IQM =4σgd(g)(d)

(σg2 + σd

2)((g)2 + (d)2)(63)

where

g = 1MN

∑MNi=1 gi, d = 1

MN

∑MNi=1 di

σg2 = 1

MN−1

∑MNi=1 (gi − (g))2, σd

2 = 1MN−1

∑MNi=1 (di − (d))2

σgd = 1MN−1

∑MNi=1 (gi − (g))(di − (d))

The dynamic range of IQM is [-1,1]. The best value 1 is achieved if and only if gi =

di for i = 1, 2, .....,MN . This quality index models any distortion as a combination of

three different factors: loss of correlation, mean distortion, and variance distortion. The

definition of the quality index can be written as a product of three components:

IQM =σgd

σgσd· 2(g)(d)

(g)2+(d)2 · 2σgσd

σg2+σd

2

The first component is the linear correlation coefficient between G and D, whose dynamic

range is [-1,1]. The second component, with a value range of [0,1], measures how close the

mean values are between G and D. It equals one if and only if g = d. The third component

measures how similar the variances of the signals are. Its range of values is also [0,1],

where the best value is achieved if and only if σg = σd.

Other measures can be used to assess the quality of given images such as the fuzzy

70

image metric (FIM) [52]. This measure is supposed to resemble some features of subjec-

tive assessments of humans rather than the objective assessments provided by the above

measures. However, it is not determined how this measure provides such subjective assess-

ments.

Although PSNR is a common measure of image quality, it may be biased. This is

because most of the image is background and most of the errors occur in the foreground, the

object, and these errors are compared to only one signal value, the peak. This apparently

reflects high signal to noise ratio. This gives the advantage to SNR measure over PSNR

measure. The IQM has the advantage that it provides a specific range of quality, so it

clearly reflects the level of similarity between the ground truth images and the re-projected

images. On the other hand, the SNR measure does not provide crisp values. Values of the

SNR measure greater than 10 dB are roughly considered good quality values, since they

mean that at least the signal power is 10 times the noise (distortion) power.

Figures 26a and 26b show the SNR and IQM values of a certain reconstruction at

12 different views. These figures show the similarity between the two measures in quan-

tifying the quality at each view of the reconstruction under test. An example of a ground

truth image is shown in Figure 27a. The corresponding re-projected image to the image in

Figure 27a is shown in Figure 27b. At this view, values of SNR � 11 and IQM � 0.95

are registered. The value of the IQM measure is more indicative than the value of the SNR

measure as shown by this example. However, the SNR value could provide an indication

about the signal level and the error level. A difference-image that encodes the absolute

error between the images in Figures 27a and 27b is shown in Figure 27c. This image can

provide a sense of the error compared to the signal at this view as provided by the SNR

measure.

71

2. The IR Test Procedure

The procedure of the proposed IR test is summarized as follows:

1. Apply the vision algorithm under test to the acquired sequence of images, sets G, to

generate the data set M

2. Apply Equations (6) and (14) to generate sets D of re-projected images

3. Apply Equation (61), (62), or (63).

4. Average the values obtained in the previous step to get a global measure.

Based on classification presented in section A, the IR test is a quantitative, dynamic, global,

and type I+ test. This means that the IR test of lower rank than the LQA test since it tests

the data in the transformed domain, images, not in the original domain, 3-D space. Errors

due to such transformations are expected (that is why it is classified as I+ test), hence the

lower rank of the IR test.

D. Silhouette-Contour Signature (SCS) Test Methodology

In the IR test, errors in color re-projection can affect accuracy of the measure. In

addition, colors in the 3-D model under-test could not be exact the same colors in the

original images. This variation is due to the color processing during the reconstruction

process or to the use of different sensors for the reconstruction. So, in this section we

propose a testing methodology that employs image silhouettes and their corresponding

contours. This feature lets us to add distinct views captured from different views than

the input views. These views can be generated synthetically using the 3-D ground truth

data. Shape features can be extracted from the re-projected silhouettes and their contours,

then compared to the similar features from the input silhouettes in addition to the distinct,

cutting, views.

72

1 2 3 4 5 6 7 8 9 10 11 125

6

7

8

9

10

11

View No.

SNR

(dB

)

(a)

1 2 3 4 5 6 7 8 9 10 11 120.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

View No.

IQM

(b)

FIGURE 26 – The IR test applied to a reconstruction of 12 input images. (a) The SNRvalues, and (b) the IQM values at each view.

73

(a) (b)

(c)

FIGURE 27 – IR test visual results. (a) An input image, (b) a re-projected image at thesame view in (a), and the difference image between (a) and (b) where darker pixels meanlower error and brighter pixels means larger error.

74

1. Shape Histogram Signature

Using silhouette images, histograms for image rows and columns can be generated.

Since the silhouette image is a binary image, the shape pixels can be counted horizontally

and vertically, then row and column histograms can be generated respectively. For the

image silhouette Is defined as:

Is(k1, k2) =

⎧⎪⎨⎪⎩

L1, if Is(k1, k2) is a silhouette point;

L2, otherwise.(64)

the row histograms Hr, and the column histogram Hc can be computed as follows:

Hr(k1) =Nw∑

k2=1

L2 ⊕ Is(k1, k2) (65)

and

Hc(k2) =

Nh∑k1=1

L2 ⊕ Is(k1, k2) (66)

respectively, where ⊕ denotes the XOR operation. To measure the similarity between the

shape histograms for the projected silhouettes; Hpr , Hp

c and the input silhouettes; H inr , H in

c

we use the χ2 measure in Equation 56, then we can define the dimension similarity measure

DIM as:

DIM = max(χ2(Hpr , H in

r ), χ2(Hpc , H in

c )) (67)

Our interpretation to the shape histograms is that:

• the row histogram computes the effective width of the shape at each row in the sil-

houette image

• the column histogram computes the effective height of the shape at each column in

the silhouette image

hence, the name of the dimension-similarity measure.

75

2. The Error Ratio

The error ratio can be used to get general description of the error if the projected

silhouette, Is, is compared to the input silhouette, Iin as:

Er =

∑∑Is ⊕ Iin∑∑L2 ⊕ Iin

(68)

Er computes the ratio of false matches between the input and projected silhouette to the

total number of silhouette points in the input silhouette.

3. Boundary Signature

Contours can be used to generate boundary signatures. The contour describes the

geometric features of the object. Many signatures for object contours have been proposed

for the purposes of object recognition and registration [53, 54]. However, most of these

signatures are designed to be invariant to certain transformation parameters to fit the re-

quirement of these applications. For the evaluation purposes, the invariant condition can be

relaxed, since the ground truth images and the re-projected images are already registered.

The shape signature that we propose depends on detecting certain shape configurations in

the ground truth images. Similar configurations should be detected in the re-projected im-

ages, otherwise non-zero error should be detected.

For each three adjacent points p1 = [x1, y1]T , p2 = [x2, y2]

T , and p3 = [x3, y3]T

on image contour as shown in Figure 28, where p2 is the middle point, two angles can be

computed to determine almost unique configuration as:

Ψ = cos−1(−−→p2p1 · −−→p2p3

||−−→p2p1|| · ||−−→p2p3||) (69)

where −−→p2p1 and −−→p2p1 are two vectors, and

Φ = tan−1(y3 − y1

x3 − x1

) (70)

76

FIGURE 28 – An example of the 17 possible configurations describing the shape bound-aries.

Combining these angles, 17 different configurations can be detected at image contours.

These configurations are listed in Figure 29. By detecting and counting these shape config-

urations in a ground truth contour image and the corresponding re-projected contour image,

two histograms H ina and Hp

a can be generated. Then, the matching measure χ2(Hpa , H in

a ) is

computed.

Combining the shape histogram and the boundary signature a χ2s measure can be

computed as:

χ2s = ωχ2(Hp

a , H ina ) + (1 − ω)DIM (71)

where ω, 0 < ω < 1, is a controlling parameter. The dynamic range of χ2 is [0,1], where 0

represents the perfect match value.

Figure 30 shows different shapes in both silhouette and contour forms. The col-

umn, the row, and the angle histograms for the rectangular shape and its rotated version are

shown in Figures 31a, 31b, and 31c respectively. The column and row histograms can be

interpreted as the effective height and width of the shape respectively. This gives the ad-

vantages to these shape descriptors of tracking the changes in the sizes of the tested shapes

as reflection to the changes of tested 3-D reconstruction. Changing the orientation of the

rectangle is also detected by the angle histogram as shown in Figures 31c.

The histograms for the circular and elliptic shapes are shown in Figure 32. Use-

ful information about the circle diameter and the major and minor axes of the ellipse can

be extracted from the column and row histograms in Figure 32a and 32b. Also, note the

77

(1) Ψ = π/4,Φ = −π/2 (2) Ψ = π/4,Φ = 0 (3) Ψ = π/4,Φ = π/2

(4) Ψ = π/2,Φ = −π/2 (5) Ψ = π/2,Φ = −π/4 (6) Ψ = π/2,Φ = 0

(7) Ψ = π/2,Φ = −π/4 (8) Ψ = π/2,Φ = −π/2 (9) Ψ = 3π/4,Φ ≈ 0.353π

(10) Ψ = 3π/4,Φ ≈ 0.15π (11) Ψ = 3π/4,Φ ≈ −0.15π (12) Ψ = 3π/4,Φ ≈ −0.353π

(13) Ψ = π,Φ = π/2 (14) Ψ = π,Φ = π/4 (15) Ψ = π,Φ = 0

( (16) Ψ = π,Φ = −π/4 (17) Ψ = π,Φ = −π/2

FIGURE 29 – A number of 17 possible configurations for three adjacent contour points.

78

differences between the angle histogram for the circle and the ellipse in Figure 32c. The

more flatness in the elliptic shape, is reflected by the increased count of configuration #15

(horizontal line) compared to the circular shape.

Examples of re-projected contours from a real reconstruction are shown in Fig-

ure 33. The re-projected contours (red) are compared to the ground truth contours (blue)

using the shape histograms. The similarity of the re-projected contour and the ground truth

contour in Figure 33a is reflected by the similarities of the shape histograms in Figure 34.

In addition, the dissimilarity of the re-projected contour and the ground truth contour in

Figure 33b is reflected by the dissimilarities of the shape histograms in Figure 35.

In general, the presented shape histograms can provide signatures for different

shapes and detect the changes in the boundaries of these shapes. Although the contours pro-

vide useful information about the shape geometry, little variations in these contours due to

the preprocessing techniques, such as edge detection, can affect the angle histogram. How-

ever, this may not affect measuring the similarity of the ground truth and the re-projected

shapes if the same edge detection operator is applied to both shapes.

E. Summary

In this chapter, three testing methodologies are presented. The first test is the Local

Quality Assessment (LQA) test. This test quantifies the performance of a given 3-D recon-

struction with respect to a reference 3-D reconstruction provided by the 3-D laser scanner.

It is designed to investigate local errors in the given 3-D reconstruction by decimating it

into different patches and measuring the quality of each patch. This makes the error analy-

sis much easier and permits the fusion of different 3-D reconstruction techniques based on

the results of this test.


availability of the 3-D ground truth data. The test uses the acquired images as the reference

79

of comparison with the corresponding images, re-projected from the given 3-D reconstruc-

tion. This test also measures the applicability of the 3-D reconstruction techniques for

virtual reality problems.


we propose a Silhouette-Contour Signature (SCS) methodology. The test extracts shape

features form the silhouette and contour images and permits the inclusion of distinct, cut-

ting, views from the 3-D ground truth data.



techniques into 24 types of tests. This classification will eventually help in getting standard

ranking to reflect the validity of such test.

80

FIGURE 30 – Examples of basic geometric shapes: rectangle, rotated rectangle, circle andellipse .

81

0 50 100 150 200 250 3000

20

40

60

80

100

120

140

160

Column No.

Eff

ecti

ve H

eigh

t

rect.rotated rect.

(a)

0 50 100 150 200 250 3000

20

40

60

80

100

120

140

160

180

Row No.

Eff

ecti

ve W

idth

(p

ixel

s)

rect.rotated rect.

(b)

(c)

FIGURE 31 – Shape signatures for the rectangle and the rotated rectangle shapes in Fig-ure 30. (a) The column histogram, (b) the row histogram, and (c) the boundary histogram.

82

0 50 100 150 200 250 3000

50

100

150

200

250

300

Column No.

Eff

ecti

ve H

eigh

t (p

ixel

s)

CircleEllipse

(a)

0 50 100 150 200 250 3000

50

100

150

200

250

300

Row No.

Eff

ecti

ve W

idth

(p

ixel

s)

CircleEllipse

(b)

(c)

FIGURE 32 – Shape signatures for the circle and the ellipse shapes in Figure 30. (a) Thecolumn histogram, (b) the row histogram, and (c) the boundary histogram.

83

(a)

(b)

FIGURE 33 – Examples of ground truth (blue) and measured (red) shapes. (a) almost sim-ilar shapes, and (b) partially similar shapes.

84

0 50 100 150 200 250 3000

20

40

60

80

100

120

140

160

180

200

Column No.

Eff

ecti

ve H

eigh

t (p

ixel

s)

ground truthmeasured

(a)

0 50 100 150 2000

20

40

60

80

100

120

140

160

180

Row No.

Eff

ecti

ve W

idth

(p

ixel

s)

(b)

(c)

FIGURE 34 – Shape signatures for shapes in Figure 33 a. (a) The column histogram, (b)the row histogram, and (c) the boundary histogram.

85

0 50 100 150 200 250 3000

50

100

150

200

250

Column No.

Eff

ecti

ve H

eigh

t (p

ixel

s)

(a)

0 50 100 150 2000

20

40

60

80

100

120

140

Row No.

Eff

ecti

ve W

idth

(p

ixel

s)

ground truthmeasured

(b)

(c)

FIGURE 35 – Shape signatures for shapes in Figure 33 a. (a) The column histogram, (b)the row histogram, and (c) the boundary histogram.

86

CHAPTER V

Experimental Evaluation of the Space Carving Technique: A Case Study

Space Carving is a common technique for the 3-D reconstruction from a sequence

of images. The key advantage of the space carving is that it has relaxed many constraints on

the commonly used stereo techniques and effectively solved the occlusion problem; a major

problem in the stereo vision. In this chapter, we provide an experimental evaluation of the

space carving technique. Based on the framework presented in this work, the effects of

key parameters of space carving on its performance are examined. In addition, evaluation

remarks are presented to draw conclusions about the performance of the space carving

technique.

A. Shape Recovery by Space Carving

The space carving technique exploits the fact that points on Lambertian surfaces are

color-consistent, i.e. they have the same color in all images that can see them. The method

starts with an arbitrary number of calibrated images of a scene and an initial volume of ar-

bitrary resolution such that the volume encloses the captured scene. Each volume element

(voxel) in the initial volume is projected to the set of images from which it is visible. If

the voxel is projected onto inconsistent colors in the images then it is carved, otherwise it

is retained and assigned a color. The algorithm stops when all examined voxels pass the

photo-consistency check, i.e. when there are no more voxels to carve.

The performance evaluation of the space carving algorithm (or its variants) is mostly

treated in the literature in a qualitative sense [13, 49], or using synthetic data [50]. Brief

87

quantitative evaluation using real data is presented in [51]. Based on our proposed frame-

work for performance evaluation, we introduce an extensive evaluation to the space carving

algorithm. We study the effects of different key parameters of the space carving technique.

We emphasize in this study the arbitrary-tuned parameters and the ideal assumptions used

in the space carving algorithm.

Specifically, we study the effects of the arbitrarily selected number of input images,

the camera pose and the initial volume resolution on the performance of space carving. The

effects of the ideal assumption of Lambertian surfaces and the noise level on the validity of

the photo-consistency check are examined as well. The testing methodologies presented in

the previous chapter will be applied to space carving technique through this study.

B. Experimental Evaluation of Space Carving

In this section we apply the proposed testing methodologies to the space carving

approach as a case study. We study the effects of the following on the performance of

space carving approach:

• the number of input images

• the distribution of cameras (or camera pose)

• the effect of selecting the photo-consistency-check threshold

• the effect of noise

• the effect of the resolution of the initial volume

meanwhile we watch how the proposed methodologies can converge to the same conclusion

about the performance of the space carving approach.

88

1. The Effect of the Number of Input Images

We study the effect of the input number of images on the space carving perfor-

mance using four sets of images: a superset of 36 images and three subsets of the superset

with 18, 12, 9 input images. An initial volume of 241×241×241 voxels with dimensions

1.25×1.25×1.25 (mm)3 is used through this experiment. The space carving is applied to

each set of images and the output is examined using three types of tests: Local Quality

Assessment (LQA), Image Re-projection (IR), and Silhouette-Contour Signature (SCS) as

follows.

a. LQA Test The RTS registration technique is used to align the output of space

carving to the ground truth data using four input silhouettes, then the LQA test is applied

to the registered data sets. The values of the quality index, Q, for Nm = 216, are calcu-

lated for reconstructions generated from 36, 18, 12, and 9 inputs. Histograms for the Q

values for each reconstruction are shown in Figure 36a. As shown from the figure, there

is a noticeable difference between the 9-reconstruction and all other reconstructions in this

example. To get a quantitative measure, we find the probability estimate Pq(Q ≥ q) of the

given reconstructions based on samples of their Q values. The quality estimate of the 36-,

18-, 12-, and 9-reconstruction are plotted in Figure 36b. To get a specific and standard mea-

sure we use q = 0.9 and hence P0.9(Q ≥ 0.9) to indicate the level of quality of the given

reconstruction. For example, the 36-reconstruction achieves P0.9(Q ≥ 0.9) = 0.59 which

indicates that almost 60% of the surface patches have quality index equal to or greater than

0.9 in a probabilistic sense. The P0.9 values for the other reconstructions are summarized

in Table 1. These values indicate the lower quality of the 9-reconstruction than the others

that scored close values.

The reason for the degradation of the 9-reconstruction can be indicated from the

final number of voxels in each competitive reconstruction. Table 3 shows that the 9-

reconstruction has 6,000 more voxels than the 36-reconstruction. This indicates that the

89

TABLE 2THE EFFECT OF THE NUMBER OF INPUT IMAGES ON THE PERFORMANCE OF

SPACE CARVING.

No. of input images 36 18 12 9

P0.9(Q ≥ 0.9) 0.5927 0.5583 0.5426 0.4189

μsnr 9.1243 9.0617 9.0634 7.3907

μχ2 0.0141 0.0171 0.0178 0.0289

μEr 0.0854 0.0997 0.1063 0.1815

9-reconstruction has a larger size, i.e. it experiences a fattening problem. More interpreta-

tions to this problem can be extracted from Figure 37.

The quality index values for each patch in both 36- and 9-reconstruction are shown

in Figure 37. The 9-reconstruction scored many quality indices of zero-value while the

36-reconstruction scored values greater than zero at the same patch. This means that the 9-

reconstruction patches are in complete mismatch with the ground truth patches. This leads

to the conclusion that at these zero-value patches either the ground truth reconstruction pro-

vides empty set while the 9-reconstruction is not or vice versa. Since the 36-reconstruction

has a good match with the ground truth data at these patches and has a lower size than the

9-reconstruction, then the 9-reconstruction is expected to be extended beyond the ground

truth reconstruction, hence it experiences a fattening problem. This fattening effect can be

visually seen from the results of the IR and SCS tests in the following sections.

b. IR Test The IR test is applied to the same reconstructions. A number of 36

re-projected images are computed and compared to the original images to find the values of

the SNR measure. Figure 38 shows the SNR values for each view for different number of

input images. As shown in this figure, the SNR values are almost the same for the 36, 18,

and 12 cases, however they are lower in the case of 9 images. Table 2 shows the mean, μsnr

of the SNR values in each case. The value of μsnr for the 9-reconstruction has the lowest

90

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

Pq(Q

≥ q)

3618129

(b)

FIGURE 36 – LQA test results when the number of input images to the space carving ischanged. (a) the histograms of the quality index at different input images and (b) thequality estimate.

91

0 20 40 60 80 100 120 140 160 180 200−0.5

0

0.5

1

1.5

Patch No.

Q

369

FIGURE 37 – The quality index for two different reconstructions. Two plots for the qualityindex for a reconstruction using 36 input images and a reconstruction using 9 input images.

TABLE 3THE FINAL NUMBER OF VOXELS AND THE RUN TIME, ON ONYX2 SGI

MACHINE, FOR 36-, 18-, 12- AND 9-RECONSTRUCTION.

No. of input images 36 18 12 9

Final No. of Voxels 81535 83233 83076 87591

Run time (minutes) 126.8 55.7 39.7 26.8

92

value among the other reconstructions.

Figures 38b, and 38c show two re-projected images at the same views for the 9- and

12-reconstruction, respectively. These re-projected images are subtracted from the orig-

inal images. The result of subtracting the images is the absolute-error encoded-images

where darker pixels indicate lower error value. The difference images are shown in Fig-

ures 38d and 38e for 9- and 12-reconstruction, respectively. As shown in Figure 38d, the

9-reconstruction has a larger size than the 12-reconstruction. This proves the consistency

between the visual assessment and the quantitative results provided by the LQA and the IR

tests.

c. SCS Test Since the SCS test depends only on the silhouette images and their

corresponding contours, not on intensity images, a cutting view can be generated from the

registered ground truth data given by the 3-D laser scanner. The cutting view is supposed to

provide distinct information about the object under concern. A top/bottom image to the ob-

ject is considered distinct since it is projected to a perpendicular plane to the input images.

Figures 39a, b, c, and d show cutting contour images projected from the 36-, 18-, 12-, and

9-reconstruction respectively. The blue contours in these images represent the ground truth

contour while the red contours represent the under test contours. Note how the fattening

effect is clear in the 9-reconstruction case.

The χ2 and the Er values at each view for each reconstruction under-test are shown

in Figures 40a and 40b respectively. Both measures indicate higher values for the 9-

reconstruction than the other reconstructions. The mean values, μχ2 and μEr , of these

measures are summarized in Table 1. Note the higher values of μχ2 and μEr for the 9-

reconstruction case, which indicate lower quality reconstruction.

d. Evaluation Remarks The house-object used in the current experiment has

large homogenous, same intensity, areas. So, the 9 images used in this experiment are

not enough to guide the space carving to carve more inconsistent voxels, hence the larger

93

size reconstruction. In other words, voxels reconstructed at incorrect depths are shown

to be consistent since they are projected to pixels of the same intensity due to the large

homogenous areas of the object.

On the other hand, using 36 images did not enhance the reconstruction much more

than the 12- and 18- reconstructions. This means that adding more input images could not

enhance the reconstruction if they are not able to put more constraints on the shape under

reconstruction. In this case, redundant input images should be detected then removed to

permit faster reconstruction. As shown in Table 3, the 36-reconstruction needs almost

three times more run time than the 12-reconstruction, even the enhancement in the output

reconstruction is marginal. The following can be concluded about the performance of space

carving due to the effect of the number of input images:

• Lower number of input images to the space carving could result in fatter reconstruc-

tions since the low number of images provides less constraints on the shape hence

the algorithm stops carving. This effect is maximized when the object under recon-

struction has large homogenous areas.

• Higher number of input images can help the space carving to provide good approx-

imation to the shape only if these images provide enough constraints on the shape.

Otherwise, some of these images are considered redundant.

2. Effect of the Camera Pose

As concluded from the above discussion, for the house-object, 9 images may not

be enough for better quality reconstruction, because the object has large homogenous areas

that need more input images to help the space carving putting constraints on the shape of

the object. However, we can enhance the 9-reconstruction slightly if we redistribute the

camera positions.

94

a. LQA Test Since we have a superset of 36 images we can divide them into 4

sets: S1, S2, S3, and S4 where each set has 9 images, then apply the space carving technique

to each set. The histograms of the Q values of the four sets and the quality estimates are

shown in Figure 41a and 41b respectively. From the histograms we note that the set S3 has

the best result among all other 9-reconstruction sets. Better results are scored for S2 and S4

over S1. Quantitative results using the probability estimate P0.9 for each 9-reconstruction

set are shown in Table 4.

Noting that the set S1 was used in the comparison with the 12-, 18-, and 36- recon-

structions in the previous experiment, better reconstructions could be reached if the views

used in the reconstruction are changed even when the number of views is kept unchanged.

Visual results would help understanding this effect, when displaying with the results of the

IR and the SCS tests.

b. IR Test The SNR values are computed for each view whether it is used in the

reconstruction under test or not. Figure 42a shows the SNR values for each 9-reconstruction

set. These results show some variations in the SNR values for the reconstruction under-test

at each view. Some reconstructions have better SNR values at certain views but worse at

other views. This means that the output reconstruction is view-dependent, i.e. based on

the views used in this reconstruction some parts of the object can be reconstructed better

than the others. The same well-reconstructed parts of the object in the previous reconstruc-

tion may not be well-reconstructed if other views were used for the current reconstruction.

However, on the average we can select the best reconstruction among all others.

Table 4 shows that the S3-reconstruction has the best μsnr among all other recon-

structions in this experiment. Figures 42b and 42c, show re-projected images at one view

for the S3- and S4-reconstruction, respectively. The difference images are shown in Fig-

ures 42d and 42 for the S3- and S4-reconstruction, respectively. Compared with Figure 38d

for the S1-reconstruction, the fattening effect has been slightly reduced for S4 set and nearly

95

5 10 15 20 25 30 350

2

4

6

8

10

12

14

16

Image No.

SNR

(dB

)

3618129

(a)

(b) (c)

(d) (e)

FIGURE 38 – IR test results when the number of input images to the space carving ischanged. (a) the SNR measure values at different views for different reconstructions, (b) Arendered view of the 9-image reconstruction, (c) A rendered view of the 12-image recon-struction, (d) A difference image between the rendered view in (b) and the original imageat the same view, and (e) A difference image between the rendered view in (c) and theoriginal image at the same view.

96

(a) (b)

(c) (d)

FIGURE 39 – Rendered cutting-views (red) for different reconstructions when the numberof input images to the space carving is changed. The reference cutting view is shown inblue. (a) 36-image reconstruction, (b) 18-image reconstruction, (c) 12-image reconstruc-tion, and (d) 9-image reconstruction.

97

5 10 15 20 25 30 350

0.02

0.04

0.06

0.08

0.1

0.12

View No.

χ2

3618129

(a)

5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

View No.

Er

3618129

(b)

FIGURE 40 – SCS test results when the number of input images to the space carving ischanged. (a) χ2 test measure and (b) the error ratio measure.

98

TABLE 4THE EFFECT OF THE CAMERA POSE ON THE PERFORMANCE OF SPACE

CARVING.

9-reconstruction Set S1 S2 S3 S4

P0.9(Q ≥ 0.9) 0.4189 0.4316 0.4908 0.4300

μsnr 7.3907 7.9933 8.4108 7.4498

μχ2 0.0289 0.0257 0.0206 0.0273

μEr 0.1815 0.1484 0.1281 0.1704

reduced for S3 set. The cutting-view, from top, can provide a clue about the size changes of

the 9-image reconstructions used in this experiment. This will be shown among the results

provided by the SCS test in the next section.

c. SCS Test Using the cutting view in addition to the 36 input views and com-

puting the χ2 and Er measures at each view, we can get the same conclusion about the

quality of 9-reconstructions sets as given by the previous tests. The cutting view for each

reconstruction is plotted with the reference cutting-view as shown in Figure 43. Note how

the fattening effect strongly appears in some parts of the object and is reduced at the same

parts when the input views are changed.

Figures 44a and 44b show that at the cutting view, the χ2 and Er values are higher.

That is because the cutting view shows the fattening appeared in all side views. The overall

assessment by the χ2 and Er measures is shown in Table 4. From the results in the table,

the S3-reconstruction scored the best values among the other 9-reconstructions. This also

shows the consistency of the applied tests in judging the quality of the given reconstruc-

tions.

d. Evaluation Remarks One of the advantages of the space carving algorithm

is that it permits arbitrary camera positions. However, different arbitrary views could lead

to different quality reconstructions. Here we conclude that:

99

• Different camera distributions provide different reconstructions. So, “arbitrary” should

not be used as an absolute word when selecting the camera positions for the space

carving technique. Some camera positions can provide good reconstruction but oth-

ers may not.

• The geometry and shape features should be considered when selecting the poses of

cameras to the space carving technique. Assigning more cameras to featureless areas

and cameras that capture the geometric features of the shape could help getting better

reconstructions.

3. Effect of the Photo-consistency Threshold

The standard space carving technique uses a global threshold to determine the

photo-consistency of pixels. The variance is computed for pixels that are candidate pro-

jections of a given voxel. The voxel is carved when the variance is greater than a global

threshold (Th). So, selecting the value of this threshold affects the whole performance of

the space carving technique. The smaller the value of Th, the more the space carving is

discarding voxels from the output reconstruction. The larger the value of Th, the more the

space carving is retaining voxels that may incorrectly increase the size of the output recon-

struction. The three types of tests are applied to the space carving at different threshold

values to show the threshold effect on the performance of space carving.

a. LQA Test Space carving is applied to a set of 36 images and the threshold

values are changed from 30 to 100. The histograms of the quality index values for selected

reconstruction are shown in Figure 45a. It shows that Th=40 is a critical value. This can be

shown in Figure 45b. The 40-reconstruction shows the best quality among the others. The

30-reconstruction has the worst reconstruction because the space carving wrongly classi-

fied some voxels as photo-inconsistent then carved them. For Reconstructions of threshold

100

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

Pq(Q

≥ q)

S1

S2

S3

S4

(b)

FIGURE 41 – LQA test results when the camera pose is changed. (a) the histograms of thequality index at sets of 9-image reconstructions and (b) the quality estimate.

101

5 10 15 20 25 30 350

2

4

6

8

10

12

14

16

Image No.

SNR

(dB

)

S1

S2

S3

S4

(a)

(b) (c)

(d) (e)

FIGURE 42 – IR test results when the camera pose is changed. (a) SNR measure value fordifferent 9-image reconstructions. Re-projected images of a 3-D reconstruction by spacecarving given 9 input images of: (b) set S4, and (c) set S3. Difference images between there-projections of 3-D reconstruction and the input images at the same view, given: (d) setS4 and (e) set S3. The fattening effect is reduced as shown in (e).

102

(a) (b)

(c) (d)

FIGURE 43 – Rendered cutting-views (red) for different reconstructions when the camerapose is changed using sets: (a) S1, (b) S2, (c) S3, and (d) S4.

103

5 10 15 20 25 30 350

0.02

0.04

0.06

0.08

0.1

0.12

View No.

χ2

S1

S2

S3

S4

(a)

5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

View No.

Er

S1

S2

S3

S4

(b)

FIGURE 44 – SCS test results when the camera pose is changed. (a) The χ2 test measureand (b) the error ratio measure.

104

more than 40, the space carving wrongly added inconsistent voxels to the output recon-

struction. That is why the degradation in the quality increases with the increase of the

threshold value. The values of P0.9 measure for selected thresholds are shown in Table 5.

This means that the optimal threshold value can be found between Th=30 and Th=50 and

is expected to be closer to 40. Visual results are also provided in the next sections to show

the threshold effect.

b. IR Test The SNR values are computed for each reconstruction as shown

in 46a. It shows that the best quality is registered for the 40-reconstruction. This is also

shown by the values of μsnr in Table 4. Difference images are shown in Figures 46b, c,

d, and e for the 30-, 40-, 50-, and 100-reconstruction, respectively. The over-carving is

clearly shown in Figure 46b for the 30-reconstruction, while the under-carving is shown in

Figures 46d, and e for the 50-, and 100-reconstruction, respectively. The 40-reconstruction

is over-carved at some areas, so a little increase of the threshold value over 40 may fix

the reconstruction at these areas, however with the possibility of under-carving other areas.

This means that the choice of the optimal value of the threshold is a tricky task.

c. The SCS Test The cutting-views of the reconstructions under-test are shown

in Figure 47. The 30-reconstruction is shown to be strongly damaged while, the 50- and

100-reconstruction are shown to be over-sized.

Figures 48a and 48b show the values of the χ2 and Er at each view. It is shown

that the value of the χ2 measure at the cutting-view, view No. 37, is higher while the Er

is lower at the same view for the 30-reconstruction. This is because the angle signature

is dominating the value of the χ2 measure since there are many curvature dissimilarities

with the ground truth at this view. However, the Er measures count the erroneous pixels

disregarding the curvature. By the same reason, we can explain why the μχ2 value in

Table 5 indicates better quality for the 50-reconstruction than the 40-reconstruction. The

over-carved in the 40-reconstruction is penalized much by the χ2 measure than the Er

105

measure because of the curvature dissimilarities in the 40-reconstruction.

d. Evaluation Remarks The space carving algorithm is based on the Lamber-

tian assumption. However, this assumption is ideal, i.e. it is practically invalid. To cope

with this invalidity, a certain threshold should be set to manage the carving process. How-

ever, the choice of an optimal value of this threshold is tricky and the inaccurate choice

may lead to over-carved or under-carved the output reconstruction. Treating the photo-

consistency threshold under a probabilistic framework can reduce its effect on the perfor-

mance of the space carving as shown in [51].

4. Effect of Noise

Zero-mean Gaussian noise of standard deviation of 1% of the intensity value at each

pixel is added to the input images. Again, the threshold value is varied from 30-100.

In this experiment we track the combined effect of noise and threshold on the per-

formance of the space carving. The histograms in Figure 49a and the quality index values

plotted in Figure 49b show that the 30-reconstruction is strongly affected. The degradation

in the quality happened because of the noise, which turned some voxels from being con-

sistent to being inconsistent. This effect is also shown by Figures 50a and Figures 52a and

b. Visual results are shown in Figure 50b and Figure 51a. On the other hand, the noise has

slightly enhanced the 40- and the 50-reconstruction as shown by the different measures in

Table 6 compared to Table 5. This is somewhat a logical result of adding noise since it

changes the color distribution of the images. So, it can change the status of the voxel from

being photo-consistent to being photo-inconsistent and vice versa.

Evaluation Remark Noise affects the performance of the space carving technique.

It changes the status of a voxel from being inconsistent to being consistent and vice versa.

The value of the threshold can be changed to cope with noise problems, however with

the risk of accepting inconsistent voxels when higher thresholds are used. On the other

106

TABLE 5THE EFFECT OF THE PHOTO-CONSISTENCY THRESHOLD ON THE

PERFORMANCE OF SPACE CARVING.

Threshold (Th) 30 40 50 100

P0.9(Q ≥ 0.9) 0.4318 0.6177 0.5963 0.4838

μsnr 4.3757 10.6051 10.0889 7.8691

μχ2 0.0678 0.0138 0.0122 0.0185

μEr 0.2973 0.0663 0.0767 0.1250

TABLE 6THE EFFECT OF NOISE AT DIFFERENT THRESHOLDS ON THE PERFORMANCE

OF SPACE CARVING.

Threshold (Th) 30 40 50 100

P0.9(Q ≥ 0.9) 0.3770 0.6413 0.6097 0.4848

μsnr -0.8645 10.6290 10.2561 7.8251

μχ2 0.1754 0.0135 0.0120 0.0188

μEr 0.6198 0.0615 0.0706 0.1235

107

hand, noise could be helpful if it were to “Lambertianize” the input images. Enhancing

the histograms of the input images can lead to better photo-consistent/photo-inconsistent

check disregarding the validity of the Lambertian assumption. In other words, an image

processing step can be applied to the input images to enforce the Lambertian assumption.

5. Effect of the Initial Volume Resolution

Three different initial volume resolutions are examined in this experiment with cu-

bic voxel dimension δ = 1.25 mm, higher resolution, δ = 2.00 mm, and δ = 2.50 mm.

The LQA test is applied to the output reconstruction of each resolution. The quality esti-

mate values for these reconstructions are shown in Figure 53a. As shown in the figure, the

output reconstructions at the given resolution achieve almost the same quality. This means

that at these resolutions, the geometric features of the 3-D shape are almost the same, hence

the close quality estimates. Visually, there are differences, as shown in Figures 53c, d, and

e compared to Figures 53b. But this also confirms that the geometric features of the object

are still preserved. This leads to the conclusion that if the geometric features of an ob-

ject are preserved at somewhat lower resolutions, then the resultant could be of acceptable

quality. Accepting low-resolution reconstructions will save much of the run time needed

for higher-resolution reconstructions. The run time for reconstructions at different resolu-

tions is shown in Table 7.

Evaluation Remark If the geometric features of an object are preserved at somewhat

lower resolutions, then the resultant could be of acceptable quality with the gain of lower

run time.

108

TABLE 7THE RUN TIME OF THE SPACE CARVING ALGORITHM AT DIFFERENT

RESOLUTIONS AND DIFFERENT NUMBERS OF INPUT IMAGES.

Parameter δ=1.25 mm δ=2.00 mm δ=2.50 mm

initial No. of voxels 13997521 3442951 1771561

No. of Input Images =36

final No. of voxels 81535 32361 20063

Run time (minutes) 126.8 18.5 7.8










109

C. Summary

In this chapter, an experimental evaluation of the space carving technique is pre-

sented. The evaluation procedures used in this study are based on the presented perfor-

mance evaluation framework. In this study, we track the response of the space carving to

the changes in the key controlling parameters of the algorithm.

The number of input images to the space carving algorithm is a key parameter. This

study has shown that a minimum number of input images should be applied to the algo-

rithm to achieve acceptable results. This number is dependent on the geometric features

and textures of the object under reconstruction. Higher number of images may lead to

better reconstructions only if the added images introduce constraints on the shape of the

object. In addition, the distribution of the cameras that capture the scene should not be

totally arbitrary, since different distributions can provide different-quality reconstructions.

The photo-consistency check threshold is another key parameter. The selection of

this threshold is tricky. Incorrect selection to this parameter could lead to over- or under-

carved reconstructions. “Lambertianizing” the input images could provide a way to avoid

tuning such tricky parameter.

The resolution of the initial volume, hence the resolution of the output reconstruc-

tion, can be coarse if the geometric features of the output reconstruction are preserved. This

permits getting fast reconstructions, hence the applicability to real time systems.

Similar studies can be applied to other 3-D reconstruction techniques to characterize

their performance, since the presented framework is independent of the 3-D reconstruction

technique under-test.

110

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

Pq(Q

≥ q)

Th=30Th=40Th=50Th=60Th=70Th=80Th=90Th=100

(b)

FIGURE 45 – LQA test results when the photo-consistency threshold is changed. (a) thehistograms of the quality index at different thresholds and (b) the quality estimate.

111

5 10 15 20 25 30 35−10

−5

0

5

10

15

Image No.

SNR

(dB

)

30405060708090100

(a)

(b) (c)

(d) (e)

FIGURE 46 – IR test results when the photo-consistency threshold is changed. (a) the SNRmeasure values at different views for different reconstructions at different thresholds. Arendered view of a reconstruction using: (b) Th=30 (c) Th=40, (d) Th=50, and (e) Th=100.

112

(a) (b)

(c) (d)

FIGURE 47 – Cutting-view image for different reconstructions at different thresholds. (a)Th=30, (b) Th=40, (c) Th=50, and (d) Th=100.

113

5 10 15 20 25 30 350

0.02

0.04

0.06

0.08

0.1

0.12

View No.

χ2

30405060

(a)

5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

View No.

Er

304050100

(b)

FIGURE 48 – SCS test results when the photo-consistency threshold is changed. (a) χ2 testmeasure and (b) the error ratio measure.

114

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

Pq(Q

≥ q)

Th=30Th=40Th=50Th=60Th=70Th=80Th=90Th=100

(b)

FIGURE 49 – LQA test results when Gaussian noise is added and the photo-consistencythreshold is changed. (a) the histograms of the quality index at different thresholds and (b)the quality estimate.

115

5 10 15 20 25 30 35−10

−5

0

5

10

15

Image No.

SNR

(dB

)

30405060708090100

(a)

(b) (c)

(d) (e)

FIGURE 50 – IR test results when Gaussian noise is added and the photo-consistencythreshold is changed. (a) the SNR measure values at different views for different recon-structions at different thresholds. A rendered view of a reconstruction using: (b) Th=30 (c)Th=40, (d) Th=50, and (e) Th=100.

116

(a) (b)

(c) (d)

FIGURE 51 – Cutting-views image for different reconstructions at different thresholds withGaussian noise is added to the input images. (a) Th=30, (b) Th=40, (c) Th=50, and (d)Th=100.

117

5 10 15 20 25 30 350

0.05

0.1

0.15

0.2

0.25

View No.

χ2

304050100

(a)

5 10 15 20 25 30 350

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

View No.

Er

304050100

(b)

FIGURE 52 – SCS test results when Gaussian noise is added and the photo-consistencythreshold is changed. (a) χ2 test measure and (b) the error ratio measure.

118

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

q

Pq(Q

≥ q)

δ =1.25 mmδ =2.00 mmδ =2.50 mm

(a)

(b) (c)

(d) (e)

FIGURE 53 – Effect of the initial volume resolution on the space reconstruction quality.(a) the quality estimate at different resolutions, (b) one out of 36 input images of the. Re-projected images at different resolutions of 3-D reconstruction by space carving with voxelsare represented by their centers at: (c) δ=1.25 mm, (d) δ = 2.00 mm and (e) δ = 2.5 mm.

119

CHAPTER VI

APPLICATIONS (POST-EVALUATIONS)

The current 3-D laser scanners can provide good reconstructions. However, the

laser projection is not guaranteed on all surfaces especially those that exhibit occlusion

problems. In addition, the standard methods for extracting range data from optical triangu-

lation scanners are accurate only for planar objects of uniform reflectance illuminated by

an incoherent source. Using these methods, curved surfaces, discontinuous surfaces, and

surfaces of varying reflectance cause systematic distortions of the range data. Coherent

light sources such as laser introduce speckle artifacts that further degrade the data [55].

In this chapter, we integrate the output reconstructions by a 3-D laser scanner and

the space carving technique such that the output reconstruction contains best features of

both reconstructions. To achieve this goal, we employ performance evaluation methodolo-

gies to investigate the local quality of each reconstruction assuming full-alignment of the

two reconstructions.

Here, we introduce a simple fusion algorithm that uses the contours of a given 3-D

object under-reconstruction to guide the fusion decision task. The fusion decision in the

proposed technique is a challenging problem since there is no reference 3-D reconstruction

that may guide the fusion process. The object contours extracted from the given images of

that object are used to take the fusion decision, we call them the Ground Truth Contours

(GTC). Similar contours of the given object are also extracted from the 3-D reconstruc-

tions under-fusion, we call them the Measured Contours (MC). The 3-D surface patches

that have the closest MC to the corresponding GTC are selected in the final 3-D recon-

struction [56].

120

The system design is another application of the performance evaluation framework.

A draft design for a 3-D scanner is presented. Specifications for the draft scanner can be

computed after an evaluation phase of the scanner components. The evaluation remarks

from the first evaluation phase are then used to redesign the scanner. The evaluation-

redesign cycle may be repeated to get the final design.

A. A 3-D Fusion Methodology

Assume that there are two 3-D reconstructions Ω1 and Ω2 that are fully aligned and a

quality index per patch/voxel is assigned to each pair of patches of the two reconstructions.

Assume that one of the given reconstructions is derived from a set I of calibrated images

of cardinality N .

A sequence of pre-processing techniques such as image segmentation, filtration and

edge detection are applied to the set I to generate a set C of contour images defined as:

C = {cl : cl ⊂ C, l = 1, ....., N}, ωl ⊂ cl (72)

where ωl is the contour at view l. Using the projection matrix at each view as sets I and

C, the two reconstructions Ω1 and Ω2 are projected to the same view and the generated

silhouette images are then processed to generate the contour images C1 and C2 where

C1 = {cl1 : cl

1 ⊂ C1, l = 1, ....., N}, ωl1 ⊂ cl

1 (73)

and

C2 = {cl2 : cl

2 ⊂ C2, l = 1, ....., N}, ωl2 ⊂ cl

2 (74)

respectively.

A synthetic set of images K is generated such that

K = {kl : kl ⊂ K, (ωl ∪ ωl1 ∪ ωl

2) ⊂ kl, l = 1, ....., N} (75)

121

and

kl(x, y) =

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

L1, if cl(x, y) ∈ ωl;

L2, if cl1(x, y) ∈ ωl

1;

L3, if cl2(x, y) ∈ ωl

2;

L4, otherwise.

(76)

where L1, L2, L3, and L4 are different gray levels, 1 ≤ x ≤ Nh, 1 ≤ y ≤ Nw, and Nh×Nw

is the cardinality of kl.

Each image kl is uniformly divided into image windows W lj , where j = 0, ....., Nm−

1 and Nm is the number of windows W lj in image kl. A number Nc < Nm of windows W l

j

that contain the contour subsets ξjω, ξj

ω1, and ξj

ω2, as shown in Figure 54a, are elected for the

closest point and the closest contour tests as follows.

1. The Closest Point Test

For each point piw ∈ ξj

ω, i = 1, ......, card(ξjω) where card(ξj

ω) is the cardinality of

ξjω, the closest points, pci

w1and pci

w2are calculated as:

pciw1

= pcr1w1

(77)

such that

d(piw, pr1

w1) = min

h∈{1,....card(ξjω1

)}d(pi

w, phw1

) (78)

and

pciw2

= pcr2w1

(79)

such that

d(piw, pr2

w2) = min

h∈{1,....card(ξjω2

)}d(pi

w, phw2

) (80)

where d denotes the Euclidean distance.

122

2. The Closest Contour Test

To determine which of ξjω1

and ξjω2

is closer to ξjω, the average distances dav(ξ

jω, ξj

ω1)

and dav(ξjω, ξj

ω2) are calculated as:

dav(ξjω, ξj

ω1) =

1

card(ξjω)

card(ξjω)∑

i=1

di(piw, pci

w1) (81)

and

dav(ξjω, ξj

ω2) =

1

card(ξjω)

card(ξjω)∑

i=1

di(piw, pci

w2) (82)

then the closest contour segment is the one that has the minimum of dav(ξjω, ξj

ω1) and

dav(ξjω, ξj

ω2).

It is important to note that other methods that can extract information about the

shape of the contours, e.g. gradient methods, can be used to determine the closest contour

in addition to the closest Euclidean distance as described above.

3. The Fusion Decision

In 3-D space, the surface segments corresponding to the the 2-D contours are de-

termined during the object projection phase. Therefore the surface segment corresponding

to the closest contour is already known in the 3-D space. As shown in Figure 54b, the

3-D segment Ξ0Ω1

is corresponding to the closest contour segment ξ0ω1

in Figure 54a. A

cubic voxel V0 that has the same centroid as Ξ0Ω1

is constructed to include Ξ0Ω1

. The surface

patches/voxels inside V0 are elected from the 3-D reconstruction Ω1 to be in the final re-

construction. The process is repeated for each contour and surface segment to reconstruct

the final output.

123

4. Experimental Results

An experiment is performed on the fusion of the 3-D reconstructions by the 3-D

scanner and the space carving technique. Figure 55a shows screen captures of the 3-D

reconstruction by the 3-D scanner for a bear object. Projecting sharp and thin laser lines

on the surface of this bear is not guaranteed, hence the errors in the output reconstruction.

Filling gaps in the surface can fix some errors in the smoothed parts, the back of the bear,

while for complex parts that have discontinuities, the front of the bear, the filling does not

provide significant enhancements as shown in Figure 55b.

The space carving technique is applied to a number of 12 images of the bear. A sam-

ple of the input images is shown in Figure 56a. A ground truth silhouette image extracted

from Figure 56a is shown in Figure 56b. The measured silhouette images at the same view

as in Figure 56b extracted by projecting the 3-D reconstructions by space carving and the

3-D laser scanner are shown in Figure 56c and Figure 56d, respectively. The measured

silhouettes indicate differences between the reconstructions by space carving and the 3-D

laser scanner.

Figure 57a shows a contour image for the ground truth contour (GTC), white, ex-

tracted from the silhouette image in Figure 56b and a measured contour (MC), black, ex-

tracted from the silhouette in Figure 56c. A similar image is shown in Figure 57b, however

the MC is extracted from the silhouette in Figure 56d. These contour images give a clue

about which reconstruction at this view and at specified surface segment is closest to the

desired reconstruction. Some contour segments at the back of the bear in Figure 57a and

b show that the reconstruction by the 3-D scanner is closer to the desired reconstruction,

however at the top of the bear’s head the reconstruction by the space carving is the closer.

The proposed 3-D fusion technique is applied to the given reconstructions of the

bear by the 3-D scanner, shown in Figure 55a, and by the space carving shown in, Fig-

ure 58a. The fusion results are shown in Figure 58b. As shown in the figure the fusion pro-

124

(a)

(b)

FIGURE 54 – Basic idea of the 3-D fusion methodology. (a) the ground truth contour(GTC) and the measured contours (MC) form two different reconstructions, and (b) the 3-Dreconstructions at certain view from which the above contours in Figure 54a are extracted.

cess can enhance the 3-D reconstruction of a given object by selecting well-reconstructed

surface segments from each reconstruction and integrate them into one reconstruction. It is

important to note that for objects that have concavities, the fusion decision can not be taken

based on the closest contour method. This represents a limitation of this technique.

B. System Design

One of the applications of the performance evaluation is the system design. A draft

design can be assumed, then a sequence of performance evaluation methodologies are ap-

125

(a)

(b)

FIGURE 55 – Screen captures of a 3-D reconstruction by 3-D laser scanner. (a) withoutfilling the gaps (b) after filling the gaps.

126

(a) (b)

(c) (d)

FIGURE 56 – Silhouette images for the 3-D fusion technique. (a) An example of inputimage to the space carving technique, (b) a silhouette image extracted from the image inFigure 56a, (c) a silhouette image extracted from the reconstruction by space carving at thesame view as in Figure 56b, and (d) a silhouette image extracted from the reconstructionby a 3-D laser scanner at the same view as in Figure 56b.

127

(a) (b)

FIGURE 57 – Contour images for the 3-D fusion technique. (a) A contour image shows theground truth contour (GTC), white, extracted from the silhouette image in Figure 56b anda measured contour (MC), black, extracted from the silhouette in Figure 56c and (b) sameimage as in Figure 57a however the MC is extracted from the silhouette in Figure 56d.

(a) (b)

FIGURE 58 – Snapshots for 3-D reconstruction. (a) the reconstruction by space carving,and (b) the fusion of the scanner and space carving reconstructions based on the closestcontour method.

128

plied to the system output. The effect of changing some system parameters can be tracked

using the evaluation methods. Adjustments and modifications can be applied to the draft

design to enhance the system performance. The design-evaluate cycle can be repeated until

the optimal design is reached.

The experimental test-bed presented in Chapter II can be used as a stand alone pas-

sive 3-D scanner if we exclude the function of the laser as shown in Figure 59. The space

carving is employed to find the 3-D reconstruction of the object under concern. From the

evaluation results presented in the previous chapter for the space carving technique, we can

set the initial specifications of the scanner as given in Table 8.

The values in Table 8 may not be the optimal parameters. As shown from the cut-

ting views captured for the space carving reconstructions in the previous chapter, more

images should be added from other views. A top view for the house object is shown in

Figure 60a. A screen capture for a space carving reconstruction is shown in Figure 60b.

These two views show that the top of the house is not sharply reconstructed, because all the

input images were from side views. Adding a top camera CCD 2, as shown in Figure 61,

can provide more constraints on the shape of the top part of the house-object. The value of

the angle θ should be selected to permit overlapping between the images acquired by CCD

1 and CCD 2. Another cycle of evaluation should be performed to test this design. The

design-evaluate cycle should be repeated until satisfactory results are reached.

C. Summary

In this chapter, two examples of the post-evaluation processes are presented. The

first example is a technique for the 3-D fusion of different 3-D reconstructions based on a

closest contour criterion. The fusion decision is taken based on the evaluation of the quality

of the reconstructions under-fusion. The output reconstruction of this fusion procedure is

assumed to have better quality than a single reconstruction.

129

FIGURE 59 – A draft design to a passive 3-D scanner.

Another post-evaluation process is the system design. A draft design for a passive

3-D scanner is presented based on the evaluation results in the previous chapter. Modifica-

tions are applied to the draft design to enhance the performance of the system.

130

TABLE 8INITIAL SPECIFICATIONS OF A PASSIVE 3-D SCANNER BASED ON THE

RECONSTRUCTION BY THE SPACE CARVING TECHNIQUE.

Parameter min typical max

δ (mm) 2.50 1.25 -

Number of input images 12 36 -

Th 40 45 50

FIGURE 60 – Top views for the house object. (a) original image and (b) screen capture forthe 3-D reconstruction.

131

FIGURE 61 – Another draft design to a passive 3-D scanner.

132

CHAPTER VII

Conclusions and Future Directions

The 3-D reconstruction from sequence of images finds many applications in mod-

ern computer vision such as virtual reality, vision-guided surgeries, autonomous navigation,

medical studies and simulations, reverse engineering, and architectural design. The very

basic requirement of these applications is to find accurate and realistic reconstructions.

While many 3-D reconstruction approaches are proposed to achieve the above re-

quirement, there is still a lack of standard and widely accepted methodologies of quantify-

ing the performance of these approaches.

Motivated by the fact that the performance evaluation process plays an important

role in guiding and measuring the progress in the field which, in turn, will lead to im-

provements in both theory and applications of 3-D reconstruction research, we introduce a

computational framework for the performance characterization of 3-D reconstruction tech-

niques from sequence of images.

In this work, we proposed a unified computational framework to rectify the situa-

tion of the lack of global ground truth data sets and the testing methodologies applicable

to different 3-D reconstruction approaches. The contributions of this dissertation can be

stated as follows.

A. Contribution to Data Acquisition and System Design

This dissertation introduces a new design for an experimental setup that integrates

the functionality of laser scanners and CCD cameras. The system is able to collect very

133

dense ground truth data and their corresponding intensity images. The system contains very

efficient data acquisition modules that guarantee generating high quality intensity data. The

intensity data set is calibrated, segmented, and automatically registered to the ground truth

data. These data sets can be used by different 3-D reconstruction techniques, including the

stereo and the volumetric based approaches. This unique feature of this setup motivates us

to build an evaluation database that includes ground truth data, input images, calibration

parameters of the camera, and the 3-D registration parameters. This database will be avail-

able for the public-use to bridge the gap caused by the unavailability of global experimental

data sets.

B. Contribution to the 3-D Data Registration

A novel technique for 3-D data registration is presented. This technique is dedi-

cated to the evaluation procedures that aim at localizing errors in the data under-test. The

approach, unlike the conventional 3-D data registration techniques, does not rely on the

presence of the 3-D reconstruction under test during the registration phase. This gives a

major advantage to this approach, since the 3-D reconstruction could be of low quality

that might add difficulties to any 3-D registration technique. In addition, if the actual 3-D

reconstructions under test were used in the registration phase, then some errors that the

evaluation process tries to investigate might disappear during the minimization step used

by any 3-D registration technique. The approach employs silhouette images to align the

given data sets. Undistorted silhouette images can be generated easily, hence permitting

good data sets for the registration process. The approach is simple and efficient and can be

applied to any 3-D registration problem assuming the availability of a calibrated sequence

of images describing one of the data sets under registration.

134

C. Contribution to the Performance Evaluation Methodologies and MeasuringCriteria

Three testing methodologies are presented. The first test is the Local Quality As-

sessment (LQA) test. This test quantifies the performance of a given 3-D reconstruction

with respect to a reference 3-D reconstruction provided by the 3-D laser scanner. It is

designed to investigate local errors in the given 3-D reconstruction by decimating it into

different patches and measure the quality of each patch. This makes the error analysis

much easier and permits the integration of different 3-D reconstruction techniques based

on the results of this test.


availability of 3-D ground truth data. The test uses the acquired images as the reference

of comparison with corresponding images, re-projected from the given 3-D reconstruction.

This test also measures the applicability of the 3-D reconstruction techniques for virtual

reality problems.


we propose a Silhouette-Contour Signature (SCS) methodology that extracts shape features

from silhouette and contour images and permits the inclusion of distinct cutting views from

the 3-D ground truth data.



techniques into 24 types of tests. This classification will eventually help in getting a stan-

dard ranking to reflect the validity of such test.

D. Contribution to the Experimental Evaluation of 3-D Reconstruction Techniques

An experimental evaluation of the space carving, as a recent common technique for

3-D reconstruction from a sequence of images, is presented. The evaluation procedures

135

used in this study are based on the presented performance evaluation framework. In this

study, we track the response of the space carving to the changes in the key controlling pa-

rameters of the algorithm.

The number of input images to the space carving algorithm is a key parameter. This

study has shown that a minimum number of input images should be applied to the algo-

rithm to achieve acceptable results. This number is dependent on the geometric features

and textures of the object under reconstruction. A higher number of images may lead to

better reconstructions only if the added images introduce constraints on the shape of the

object. In addition, the distribution of the cameras that capture the scene should not be

totally arbitrary, since different distributions can provide different-quality reconstructions.

The photo-consistency check threshold is another key parameter. The selection of

this threshold is a tricky. Incorrect selection to this parameter could lead to over- or under-

carved reconstructions. “Lambertianizing” the input images could provide a way to avoid

tuning such a tricky parameter.

The resolution of the initial volume, hence the resolution of the output reconstruc-

tion, can be coarse if the geometric features of the output reconstruction are preserved. This

permits faster reconstructions, hence the applicability to real time applications.

Similar studies can be applied to other 3-D reconstruction techniques to characterize

their performance, since the presented framework is independent of the 3-D reconstruction

technique under-test.

E. Applications

Two applications for the performance evaluation framework are presented. The first

application is the 3-D data fusion of different 3-D reconstructions. A fusion technique

based on the image contour comparison is presented. The technique rectifies the 3-D re-

construction based on the closeness of its projected contours to the ground truth contours.

136

The method is used to combine reconstructions generated by a 3-D laser scanner and the

space carving technique.

The second application is the system design. A draft design for a passive 3-D scan-

ner is presented. The design is based on the experimental results of evaluating performance

of the space carving. The proposed scanner should be able to reconstruct surfaces that the

commercial 3-D laser scanner may not be able to reconstruct.

F. Future Extension

In future work, we will investigate the following extensions to the proposed frame-

work.

1. Data acquisition

• Add more cameras to provide cutting views. Instead of using the ground truth data

(which need to be registered to the measured data) to provide cutting views for the

SCS test, one or more cameras can be added to the evaluation setup to provide such

views. However, this will add extra work for the camera calibration process. Design

of a multi-planar calibration pattern to calibrate this cluster of cameras, in addition

to the main camera, can facilitate the calibration process.

• Extend the evaluation database to include different data sets for different complexity

test-objects.

2. 3-D Data Registration

• Find a relation between the error in the projected silhouettes and the actual error in

the 3-D space to expedite the registration process.

137

• Using the color clue in the registration process. Maximizing the mutual information

between the two different sources of colors provided by the CCD camera and the 3-D

laser scanner, can be a solution to aligning images from different modalities.

• Investigate using deterministic optimization approaches such as graph cut instead of

the genetic algorithms.

3. Testing Methodologies and Measures

• The SCS methodology can be extended to the 3-D space. Matching signatures from

the 3-D ground truth surfaces and the measured surfaces can provide a way of inves-

tigating the quality of a given reconstruction.

• Investigate using subjective measures for the IR test. As these measures aim to simu-

late the human sensing of visual cues, they are difficult to design and computationally

expensive.

4. The Performance of Space Carving Technique

• Investigate using methods such as the invariant features or the optical flow to deter-

mine the minimum number of images, given a superset of input images, required by

the space carving technique to enhance the output reconstruction.

• Investigate using image processing enhancement techniques to Lambertianize the

input images for the space carving to enhance the photo-consistency check.

In general, the proposed framework can be applied to different 3-D reconstruction tech-

niques from a sequence of images. Similar studies to that of space carving can be applied

to the level set approaches for shape recovery. Evaluation of stereo techniques based on

3-D ground truth data can also be a future study.

138

REFERENCES

[1] E. Trucco and A. Verri, Introductory Techniques for 3-D Computer Vision, PrenticeHall Inc., 1998.

[2] D. Marr and T. Poggio “Cooperative computation of stereo disparity”, Science, 194,pp. 283-287, Sep. 1976.

[3] http://cat.middlebury.edu/stereo/data.html

[4] H. Baker, and T. Binford,“Depth from edge and intensity based stereo”, Proceedingsof Seventh International Joint Conference on Artificial Intelligence, Vancouver, 1981,pp. 631-636.

[5] C. Loop and Z. Zhang, “Computing rectifying homographies for stereo vision,”Peoceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR’99), vol. I, Ft. Collins, CO, June 23-25, 1999, pp. 125-131.

[6] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy and minimizationvia graph cuts,” Proceedings of IEEE International Conference On Computer Vision(ICCV’99), Kerkyra, Greece, Sep. 20-27, 1999, pp. 377-384.

[7] S. Roy and I. J. Cox, “A maximum-flow formulation of the N-camera stereo corre-spondence problem,” Proceedings of International Conference on Computer Vision(ICCV’98), Bombay, India, Jan. 4-7, 1998, pp. 492-499.

[8] M. Okutomi and T. Kanade, “A multiple baseline stereo,” IEEE Transactions on Pat-tern Analysis and Machine Intelligence, vol. 15, no. 4, pp. 353-453, April 1993.

[9] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereocorrespondence algorithms”, International Journal for Computer Vision, 47(1):7-42,May 2002.

[10] A. Laurentini, “The visual hall concept for silhouette-based image understanding,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, no. 2, pp.150-162, Feb. 1994.

[11] G. Cheung, T. Kanade, J-Y. Bouguet, and Holler, “A real time system for robust 3Dvoxel reconstruction of human motion,” Proceedings of IEEE Conference on Com-puter Vision and Pattern Recognition, vol. 2, South Carolina, June 13-15, 2000, pp.714-720.

139

[12] S. Seitz and C. Dyer, “Photorealistic scene reconstruction by voxel coloring,” Pro-ceedings of Computer Vision and Pattern Recognition Conference (CVPR’97), PuertoRico, June 17-19, 1997, pp. 1067-1073.

[13] K. Kutulakos and S. Seitz. “Theory of shape by space carving,” Proceedings of IEEEInternational Conference on Computer Vision, Proceedings of IEEE InternationalConference On Computer Vision (ICCV’99), Kerkyra, Greece, Sep. 20-27, 1999, pp.307-314.

[14] W. Culbertson, T. Malzbender, and G. Slabaugh, “Generalized voxel coloring,” Inter-national Workshop on Vision Algorithms, Corfu, Greece, 1999, pp. 100-115.

[15] O. Faugeras and R. Keriven, “Variational principles, surface evolution, PDE’s, levelset methods and the stereo problem”, IEEE Transactions on Image Processing, vol. 7no. 3 pp. 336-344, 1998.

[16] C. Dyer, “Volumetric scene reconstruction from multiple views”, In L.S. Davis, editor,Foundations of Image Understanding, pp. 469-489. Kluwer, Boston, 2001.

[17] G. Slabaugh, B. Culbertson, T. Malzbender, and R. Schafer, “A survey of methodsfor volumetric scene reconstruction from photographs”, In K. Mueller and A. Kauf-mann, editors, Proceedings of the Joint IEEE TCVG and Eurographics Workshop(VolumeGraphics-01), Wien, Austria, June 21-22, 2001, pp. 81-100.

[18] R. Szeliski “ Prediction error as a quality metric for motion and stereo” Proceedingsof IEEE International Conference on Computer Vision (ICCV’99), Kerkyra, Greece,Sep. 20-27, 1999, pp. 781-788.

[19] R. Bolles, H. Baker, and M. Hannah, “The JISCT stereo evaluation”, Proceedings ofDARPA Image Understanding Workshop, 1993, pp. 263-274.

[20] R. Szeliski and R. Zabih “ An experimental copmarison of stereo algorithms,” Inter-national Workshop on Vision Algorithms, Corfu, Greece, 1999, pp. 1-19.

[21] J. Mulligan, V. Isler, and K. Daniilidis “Performance evaluation of stereo for tele-presence,” Proceedings of IEEE International Conference on Computer Vision, vol.II, Vancouver, Canada, July 7-14, 2001, pp. 558-565.

[22] P. J. Besl and N. McKay,“A method for registration of 3-D shapes,” IEEE Trans.Pattern Analysis and Machine Intelligence, vol. 14, no. 2 pp. 239-256, March 1992.

[23] Z. Zhang, “Iterative point matching for registration of free form curves and surfaces,”International Journal of Computer Vision, 13:119-152, 1994.

[24] A. Fitzgibbon,“Robust registration of 2D and 3D points,” Proceedings of British Ma-chine Vision Conference (BMVC’01), vol. II, Manchester, UK, Sep. 10-13, 2001, pp.411-420.

140

[25] D. Chetverikov, D. Svirko, D. Stepanov, and P. Kresk,“The trimmeed iterative clos-est point algorithm,” Proceedings of International Conference of Pattern Recognition(ICPR’02), vol. III, Quebec, Canada, Aug. 11-15, 2002, pp. 545-548.

[26] S. Yamany and A. Farag,“Surface Signatures: An orientation independent free-formsurface representation scheme for the purpose of objects registration and matching,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 8, pp. 1105-1120,2002.

[27] C. S. Chua and R. Jarvis,“Point signatures: A new representation for 3d object recog-nition,” International Journal of Computer Vision 25:63-85, 1997.

[28] C. Dorai and A. K. Jain,“Cosmos-a representation scheme for 3d free form objects,”IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 8, pp. 1115-1130,1997.

[29] A. Johnson and M. Helbert, “Surface matching for object recognition in complexthree-dimensional scenes,” Image and Vision Computing, vol. 16, pp. 635-651, 1998.

[30] A. Eid, S. Rashad and A. Farag, “Validation of 3D reconstruction from sequence ofimages,” Proceedings of the International Conference on Signal Processing, PatternRecognition, and Applications (SSPRA’02), Crete, Greece, June 25-28, 2002, pp. 375-380.

[31] S. Seitz, J. Kim,“The space of all stereo images,” Proceedings of Eighth IEEE Inter-national Conference on Computer Vision (ICCV’01), vol. II, Vancouver, Canada, July7-14, 2001, pp. 558-565.

[32] A. Elgammal, D. Harwood, and L. Davis, “Non-parametric model for backgroundSubtraction,” 6th European Conference on Computer Vision (ECCV’00), vol. II,Dublin, Ireland, June 26-July 1, 2000, pp. 751-767.

[33] A. Eid, S. Rashad and A. Farag, “A general purpose platform for 3D reconstructionfrom sequence of images,” Proceedings of Fifth International Conference on Infor-mation Fusion (IF’02), vol. I, Annapolis, MD, July 7-11, 2002, pp. 425-413.

[34] R. Hartley and A. Zisserman, Multiple View Geometry in computer vision, CambridgeUniversity Press., 2000.

[35] O. Faugeras and Q. Luong, The Geometry of Multiple Images, The MIT Press., 2001.

[36] R. Tsai, “An efficient and accurate camera calibration technique for 3d machine vi-sion,” Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR’86), Miami Beach, FL, 1986, pp. 364-374.

[37] L. Robert. “Camera Calibration without feature extraction,” Computer Vision and Im-age Understanding, 63(2):314-325, March 1996.

141

[38] A. Eid and A. Farag, “Design of an experimental setup for performance evaluation of3-D reconstruction techniques from sequence of images,” Eighth European Confer-ence on Computer Vision (ECCV’04), Workshop on Applications of Computer Vision,Prague, Czech Republic, May 11-14, 2004, pp. 69-77.

[39] A. Eid and A. Farag, “A unified framework for performance evaluation of 3-D recon-struction techniques,” IEEE Conference on Computer Vision and Pattern Recognition(CVPR’04), Workshop on Real-time 3-D Sensors and their Use, Washington DC, June27-July 2, 2004.

[40] H. Lensch, W. Heidrich and H. Seidel, “A silhouette-based algorithm for texture reg-istration and stitching,” Graphical Models vol. 63, no. 4, pp. 245-262, 2001.

[41] A. Agarwal and B. Triggs, “3D human pose from silhouettes by relevance vector re-gression,” Proceedings of IEEE Conference on Computer Vision and Pattern Recog-nition, vol. II, Washington DC., June 27-July 2, 2004, pp. 882-888.

[42] S. Sinha, M. Pollefeys, and L. McMillan, “Camera network calibration from dy-namic silhouettes,” Proceedings of IEEE Conference on Computer Vision and PatternRecognition (CVPR’04), vol. I, Washington DC., June 26-July 2, 2004, pp. 195-202.

[43] D. E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning.Addison-Wesley Publishing Company Inc., 1989.

[44] A. Farag and A. Eid, “Local quality assessment of 3-D reconstructions from sequenceof images: a quantitative approach,” Advanced Concepts for Intelligent Vision Systems(ACIVS’04), Brussels, Belgium, Aug. 31-Sep. 3, 2004, pp. 161-168.

[45] A. Eid and A. Farag, “On the performance characterization of stereo and space carv-ing,” Proceedings of Advanced Concepts for Intelligent Vision Systems (ACIVS’03),Ghent, Belgium, Sep. 2-5, 2003, pp. 291-296.

[46] A. Eid and A. Farag, “On the performance evaluation of 3-D reconstruction tech-niques from a sequence of images,” EURASIP Journal on Applied Signal Processing,to appear 2005.

[47] N. Damera-Venkata, et al. “Image quality assessment based on a degradation model,”IEEE Transactions on Image Processing, vol. 9, no. 4, pp. 636-650, April 2000.

[48] Z. Wang, A. Bovik, and L. Lu, “Why is image quality assessment so difficult,” Pro-ceedings of IEEE International Conference on Acoustics, Speech, and Signal Pro-cessing (ICASSP’02), vol. IV, Orlando, FL, May 13-17, pp. 3313 -3316.

[49] A. Yezzi, G. Slabaugh, A. Broadhurst, R. Cipolla and R. Schafer, “A surface evolutionapproach to probabilistic space carving,” Proceedings of First International Sympo-sium on 3D Data Processing Visualization and Transmission (3DPVT’02), Padova,Italy, June 19-21, 2002, pp. 618-621.

142

[50] A. Broadhurst and R. Cipolla, “A statistical consistency check for the space carvingalgorithm,” Proceeding of Eleventh British Machine Vision Conference (BMVC’00),Bristol, UK, Sep. 11-14, 2000, pp. 282-291.

[51] A. Broadhurst T.W. Drummond and R. Cipolla, “A probabilistic framework for spacecarving,” Proceedings of Eigth IEEE International Conference on Computer Vision,vol. I, Vancouver, Canada, July 7-14, 2001, pp. 388-393.

[52] J. Li, G. Chin, and Z. Chi, “A fuzzy image metric with application to fractal coding,”IEEE Transactions on Image Processing, vol. 11, no. 6, pp. 636-643, June 2002.

[53] A. Sajjanhar and G. Lu, “A comparison for techniques for shape retrieval,” Proceed-ings of International Conference on Computational Intelligence and Multimedia Ap-plications, Monash University, Gippsland Campus, Feb. 9-11, 1998, pp. 854-859.

[54] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition usingshape contexts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no.24, pp. 509-522, 2002.

[55] B. Curless and M. Levoy,“Better optical triangulation through spacetime analysis,”Proceedings of Fifth International Conference on Computer Vision (ICCV’95), Cam-bridge, MA, June 20-23, 1995, pp. 987-994.

[56] A. Eid and A. A. Farag, “On the fusion of 3-D reconstruction techniques,” Proceed-ings of Seventh International Conference on Information Fusion (IF’04), Stockholm,Sweden, June 28-July 1, 2004, pp. 856-861.

143

APPENDIX I

PROJECTIVE GEOMETRY

Euclidean geometry describes our world well. However for the purpose of describ-

ing projections, projective geometry is a more adequate framework. The parallel railroad

tracks are parallel lines in 3-D space, however they are not in their images, and they seem

to intersect at a vanishing point at the horizon. Projective geometry is an extension to

Euclidean geometry, which describes a larger class of transformations than just rotations

and translations, including in particular the perspective projection performed by a camera.

Simply, it makes it possible to describe naturally that phenomenon at infinity. The most im-

portant aspect of projective geometry is the introduction of homogenous coordinates which

represent a projective transformation as matrix multiplication. This allows for using simple

matrix algebra for most computations, which was a difficult task if Euclidean geometry

were used. In the next sections we will describe the projective representations of the basic

geometrical entities in both 2-D and 3-D space. In addition, a brief description of basic

transformation ranging from Euclidean geometry to projective geometry is presented.

A. 2-D Projective Geometry

a. Points and Lines in P2 In homogenous coordinates the representation of

lines and points is augmented by a third coordinate in addition to the inhomogeneous coor-

dinates in R2. A line l in plane is represented by the equation:

ax + by + c = 0 (83)

144

that can be described by the vector (a, b, c)T .

The vectors (a, b, c)T and k(a, b, c)T represent the same line for any non-zero scal-

ing factor k. An equivalence class of vectors under this scaling relation is known as a

homogeneous vector. Any particular vector (a, b, c)T is a representative of the equivalence

class. The set of equivalence classes of vectors in R3−(0, 0, 0)T forms the projective space

P2 [34]. A point x = (x, y)T lies on line l = (a, b, c)T if and only if ax + by + c = 0, or in

vector notations:

xT l = lTx (84)

An arbitrary homogeneous vector representation of a point is of the form x = (x1, x2, x3)T ,

representing the point x = (x1/x3, x2/x3)T in R2. Points, then as homogeneous vectors

are also elements of P2. The point x can also be defined as the intersection of two lines l1,

and l2 as

x = l1 × l2 (85)

The points and lines are duals in P2 then; the line l joining two points x1 and x2 is defined

as:

l = x1 × x2 (86)

The intersection of lines is fully described in P2 even they are parallel. This leads to the

definition of points and lines at infinity. Consider, the two parallel lines l1 = (a, b, c1)T

and l2 = (a, b, c2)T where c1 �= c2. The intersection of l1 and l2 is the homogenous point

x = (b,−a, 0)T which is a point at infinity (b/0,−a/0)T in R2. The vector (b,−a)T

represents the direction of lines l1 and l2. If we think of all points that have the form

x = (x1, x2, 0) as points at infinity we will find a line l = (0, 0, 1)T that joins these

points at infinity. This is verified by computing xT l = 0 for all points x at infinity. The

description of points and lines at infinity is of great importance in computer vision, thanks

to the projective geometry.

145

FIGURE 62 – Representation of points and lines in P2.

b. The Projective Plan P2 We can think of P2 as a set of rays in R3. The set

of all vectors k(x1, x2, x3)T as k varies, forms a ray through the origin. Such a way may

be thought of as representing a single point in P2. In this model, the lines in P2 are planes

passing through the origin. Points and lines may be obtained by intersecting this set of rays

and planes by the projective plane at x3 = 1. As shown in Figure 62 the ray representing

points and lines at infinity are parallel to the plane x3 = 1.

c. 2-D transformations The 2-D projective geometry is defined as the study of

the properties of the projective plane P2 that are invariant under a group of transformations

known as projectivities or, homographies. A projectivity h is defined as the invertible

mapping from P2 to itself such that three points x1, x2, and x3 lie on the same line if and

only if h(x1), h(x2) and h(x3) do. One of most important projectivity in computer vision

is the central projection, since it is used to model the finite cameras. The central projection

maps points from one plane to another and also maps lines to lines as shown in Figure 63.

This planar projective transformation is a linear transformation on homogeneous 3-vector

represented by 3×3 non-singular matrix H as:

x2 = Hx1 (87)

146

FIGURE 63 – The central projection as a planar projectivity.

H is defined up to scale factor, so it has 8 degrees of freedom. To compute H that maps one

plane to another at least 4 corresponding points in each plane should be known provided

that 3 of them are not collinear. Figure 64, shows an image for ceiling tiles in the CVIP

lab. As a perspective image, it undergoes a perspective distortion which causes mapping

of parallel lines into intersecting lines. We can remove this distortion if we select 4 planar

points for a distorted shape and suppose that we know the proper shape and have 4 cor-

responding points in that proper shape. Solving for the projectivity that maps points into

proper ones, we can correct for all points in the plan that have the same type of distortion.

This is shown in Figure 64b after computing the projectivity matrix H and applying it to

points in Figure 64a. The same technique is used to rectify the image, Figure 65a, of the

Kent School building at the University of Louisville to make the front of the building facing

the viewer. As shown in Figure 65b, the front of the building is rectified however, other

points in different planes are distorted, since it is supposed that the projectivity applies only

to planes.

d. Hierarchy of Transformations Projection transformations form a group of

transformations called the projective linear group. Subgroups of projective transformation

are considered specializations of the projective group. Here we will summarize the defini-

tions of these subgroups and the geometrical entities invariants under these transformations.

147

(a) (b)

FIGURE 64 – Computing homographes. (a) The ceiling tiles image at CVIP lab (b) therectified image.

(a) (b)

FIGURE 65 – Computing homographes.(a) Kent School at the University of Louisville, (b)the rectified image.

148

The hierarchy of these transformations is: Euclidean (isometry), Similarity, Affinity, and

Projectivity.

The Euclidean transformation is described by a 3×3 matrix that has 3 degrees of

freedom; one for the rotation angle θ and the other two for translations, tx, and ty. This

matrix I is defined as:

I =

⎡⎢⎢⎢⎢⎣

i cos (θ) − sin (θ) tx

i sin (θ) cos (θ) ty

0 0 1

⎤⎥⎥⎥⎥⎦ (88)

where i = ±1. If it is -1, the orientation is reversed.

The similarity provides isotropic scaling, s, in addition to the rotation and transla-

tion provided by the isometry, hence the similarity has 4 degrees of freedom. The matrix S

is defined as:

S =

⎡⎢⎢⎢⎢⎣

s cos (θ) −s sin (θ) tx

s sin (θ) s cos (θ) ty

0 0 1

⎤⎥⎥⎥⎥⎦ (89)

The affinity AF is defined as:

AF =

⎡⎢⎢⎢⎢⎣

a1 a2 tx

a3 a4 ty

0 0 1

⎤⎥⎥⎥⎥⎦ (90)

where the matrix A =

⎡⎢⎣ a1 a2

a3 a4

⎤⎥⎦ can always be decomposed as:

A =

⎡⎢⎣ cos (θ) − sin (θ)

sin (θ) cos (θ)

⎤⎥⎦⎡⎢⎣ cos (−φ) − sin (−φ)

sin (−φ) cos (−φ)

⎤⎥⎦⎡⎢⎣ λ1 0

0 λ2

⎤⎥⎦⎡⎢⎣ cos (φ) − sin (φ)

sin (φ) cos (φ)

⎤⎥⎦

(91)

which can be interpreted as a rotation by φ followed by non-isotropic scaling by λ1, and

λ2 followed by rotation back by −φ and finally a rotation by θ . The affinity has 6 degrees

149

of freedom. The additional two over the similarity come from the rotation φ, the shearing

direction, and the non-isotropic scaling by the ratio λ1 : λ2.

The projectivity as we mentioned before has 8 degrees of freedom. These two

additional degrees of freedom came from adding two parameters v1 and v2 responsible for

the perspective projection. As a result of these effects, points at infinity are converted into

finite points. In addition, parallel lines are converted into intersecting lines. The projectivity

H can be written as:

H =

⎡⎢⎢⎢⎢⎣

a1 a2 tx

a3 a4 ty

v1 v2 1

⎤⎥⎥⎥⎥⎦ (92)

where v is a real value. Note, the zeros in the third row of I, S, and AF are no longer

zeros in H, hence the perspective projection effects. A summary for the discussed trans-

formations and the geometrical entities preserved under such transformations is shown in

Table 9 [35]. Figure 66(b-d) shows the effects of the above transformations on the image

in Figure 66a.

B. 3-D Projective Geometry

e. Representation of Points in P3 The point X = (x1,x2,x3,x4)T with x4 �=

0 is the homogeneous representation of the point (X,Y,Z)T , of R3 where X = x1/x4,

Y = x2/x4, and Z = x3/x4.

f. Representation of Planes in P3 In P3 planes and points are dual, while lines

are self-duals. A plan Π in 3-D may be written as:

π1X + π2Y + π3Z + π4 = 0 (93)

where the homogeneous representation of a plane is the 4-vector Π = (π1, π2, π3, π4)T .

The point X lies on a plane Π if and only if

ΠT X = XTΠ = 0 (94)

150

(a) (b)

(c) (d)

(e)

FIGURE 66 – 2-D transformations. (a) original image (b) after isometry (c) after similarity(d) after affinity (e) after projectivity.

151

TABLE 9SUMMARY OF TRANSFORMATIONS.

isometry similarity affinity projectivity

transformation

rotation, translation × × × ×

isotropic scaling × × ×

non-isotropic scaling × ×

perspective projection ×

invariants

distance ×

angles, ratios of distances × ×

Parallelism, center of mass × × ×

Incidence, cross ratio × × × ×

152

In general, points and planes are related to each other in 3-D space by the following rela-

tions:

A plane Π is defined uniquely by 3 distinct points X1, X2, and X3 provided that

the points are not collinear ⎡⎢⎢⎢⎢⎣

XT1

XT2

XT3

⎤⎥⎥⎥⎥⎦Π = 0 (95)

which represents a system of linear equations that can be solved for the unknown plane.

A point X is defined uniquely by the intersection of 3 distinct planes Π1, Π2, and

Π3 (the dual of the above relation) such that:⎡⎢⎢⎢⎢⎣

ΠT1

ΠT2

ΠT3

⎤⎥⎥⎥⎥⎦X = 0 (96)

which represents a system of linear equations that can be solved for the unknown point.

g. Representation of lines in Π1 A line is defined by a joint of two points or the

intersection of two planes. The line has 4 degrees of freedom in 3-D space. Suppose X, Y

are two (non-coincident) space points. Then the line joining these points is represented by

the span of the row space of the 2×4 matrix L composed of XT and YT as rows:

L =

⎡⎢⎣ XT

YT

⎤⎥⎦ (97)

with the span of LT is the pencil of points μX + λY on the line where μ and λ are real

values. The dual representation of a line as the intersection of planes Π1, Π2 is

L1 =

⎡⎢⎣ ΠT

1

ΠT2

⎤⎥⎦ (98)

with the span of (L1)T is the pencil of planes μ1Π1 + λΠ2.

The plane Π defined by the point X and line L is the solution of the following

153

equation: ⎡⎢⎣ L

XT

⎤⎥⎦Π = 0 (99)

in addition, the point X defined by the intersection of line L with plane Π is the solution

of the following equation: ⎡⎢⎣ L

ΠT

⎤⎥⎦X = 0 (100)

Note the duality principle of points and planes in the previous two equations.

h. Plucker matrices Here the line L is represented by a 4×4 skew-symmetric

homogeneous matrix. The line joining the two pints X and Y is

L = XYT − YXT (101)

and the dual representation in terms of two intersecting planes Π1, Π2 is

L = Π1ΠT2 − Π2Π

T1 (102)

using the Plucker representation it is easier to directly determine points and planes joining

or intersecting a certain line. For example, the plane Π defined by the point X and line L

is:

Π = LX (103)

and the point X defined by the intersection of the line L with the plane Π is

X = LΠ (104)

once again note the duality property of planes and lines in 3-D space.

154

APPENDIX II

CAMERA CALIBRATION

The geometrical analysis of a single view helps to understand the relation between

a scene in 3-D space and its image captured by a camera. As images provide an abstract

description of 3-D scenes, understanding of the geometrical laws that govern the formation

of these images can help in the recovery of 3-D information absent in the 2-D images.

Therefore, it is important to understand the camera anatomy and techniques of camera

modeling.

A. Camera Modeling

A camera is a mapping between the 3-D world (object space) and a 2-D image. Of

our interest in this dissertation is the central projection camera or, the finite camera. Gen-

eral projective camera models include in addition to the finite cameras, the infinite cameras.

The term finite and infinite refers to the position of the optical center of the camera in the

3-D space.

A very common model for finite cameras is the pinhole model. As shown in Fig-

ure 67, The model consists of a plane Π, the image plane, and a 3-D point C, the optical

center or the focus of projection. The distance between Π and C is the camera focal length

f . The line through C and perpendicular to Π is the optical axis. The intersection of the

optical axis and Π is the image center or the principle point o. The point x on Π is the

image of the 3-D point X. The point x is the intersection of the straight line, the optical

ray, joining C and X and the image plane.

155

FIGURE 67 – Modeling of a central projection camera

Consider C as the origin of the camera coordinate system (Xcam, Ycam, Zcam) and

o is the origin of the image coordinate system (xim, yim), then the relation between the

image x = (x, y)T and the 3-D point X = (X,Y, Z)T is written as:

x = fX

Z(105)

y = fY

Z(106)

In practice, the principle point may not be at the origin of the camera coordinate system.

In addition, in CCD cameras when the coordinates are measured in pixels, it is common to

have non-squared pixels; i.e. they have different scaling in x-axis and y-axis directions. It is

also possible to have, even though it rarely happens, non-orthogonal image axis. Assuming

that the scaling in x-axis and y-axis are mx (pixel/unit length) and my (pixel/unit length)

respectively. Taking these effects into consideration and expressing the 2-D and 3-D coor-

dinates in their homogeneous form, the relation between a 3-D point X = (X,Y, Z, 1)T in

3-D space and the image point x = (x, y, 1)T is written as:

sx = PX (107)

156

where s is a scale and P = [K|0] is the camera calibration matrix. The matrix K is defined

as:

K =

⎡⎢⎢⎢⎢⎣

αx −αx cot (θ) px

0 αx/ sin (θ) py

0 0 1

⎤⎥⎥⎥⎥⎦ (108)

where θ is the angle between the image axes, αx = fmx, αy = fmy, and (px, py)T is the

image center expressed in the pixel dimensions. The matrix K has 5 degrees of freedom px,

py, αx, αy, and θ. These parameters are the intrinsic calibration parameters of the camera.

In fact, camera calibration is defined as the process of estimating two sets of parameters:

the intrinsic parameters and the extrinsic parameters.

The intrinsic parameters are the parameters necessary to link the pixel coordinates

of an image point with the corresponding coordinates in the camera reference frame. These

parameters are the entries of the intrinsic parameters matrix K. The extrinsic parame-

ters define the location and the orientation of the camera reference frame with respect to

a known world reference frame. The extrinsic parameters thus defined as any set of geo-

metric parameters that identify uniquely the transformation between the unknown camera

reference frame and a known reference frame, named the world reference frame.

A typical choice for describing the transformation between the camera and the

world frame is to use a 3-D translation vector, t = [tx, ty, tz]T , describing the relative

positions of the origins of the two reference frames, and a 3×3 rotation matrix, R, an or-

thogonal matrix (RTR = RRT = I) that brings the corresponding axes of the two frames

onto each other. We can express the 3-D rotation as the result of three consecutive rota-

tions around the coordinate axes by angles α, β and γ. The angles are then the three free

157

parameters of R. The rotation matrix R can be expressed in terms of α, β and γ as:

R =

⎡⎢⎢⎢⎢⎣

cos β cos γ − cos β sin γ sin β

sin α sin β sin γ + cos α sin γ − sin α sin β cos γ + cos α sin γ sin α cos β

− cos α sin β cos γ + sin α sin γ cos α sin β sin γ + sin α cos γ cos α cos β

⎤⎥⎥⎥⎥⎦

(109)

The extrinsic parameters matrix, D, can be expressed in terms of t and R as:

D =

⎡⎢⎣ R t

OT3 1

⎤⎥⎦ (110)

where D is 4×4 matrix and O3 = [0, 0, 0]T . Taking the effect of the extrinsic parameters,

additional 6 degrees of freedom, into the projection matrix then P has the general form:

P = KPprojD (111)

Where Pproj depends on the type of the projection as follows:

for finite cameras

Pproj = Pfinite =

⎡⎢⎢⎢⎢⎣

1 0 0 0

0 1 0 0

0 0 1 0

⎤⎥⎥⎥⎥⎦ (112)

for infinite cameras

Pproj = Pinfinite =

⎡⎢⎢⎢⎢⎣

1 0 0 0

0 1 0 0

0 0 0 1

⎤⎥⎥⎥⎥⎦ (113)

taking into account that the image center is undefined for this type of projection, hence it

is replaced by zeros in the matrix K. Since the finite cameras are of our interest in this

dissertation, we will describe how the projection matrix is computed and provide anatomy

of this matrix.

158

B. Anatomy of the Projection Matrix

A general projective camera may be written as P = [M|p4], where M is a 3×3

matrix composed of the first three columns of P and p4 is the fourth column. M is an

important matrix since from which we can determine whether P is for a finite or infinite

camera. In other words, a camera is finite if M is non-singular, otherwise it is infinite [34].

A- Camera Center: The camera center C is the 1-dimentional right-null space of P such

that:

PC = 0 (114)

C =

⎡⎢⎣ M−1p4

1

⎤⎥⎦ (115)

for finite cameras where M is non-singular matrix

C =

⎡⎢⎣ d

1

⎤⎥⎦ , Md = 0 (116)

B- Column vectors of P: The column vectors p1, p2, p3, and p4 of the projection matrix P

have the following geometrical meaning:

• p1, p2, and p3 are the vanishing points of the world coordinate Xw, Yw and Zw axis

respectively. For example the X-axis has the direction Dx = (1, 0, 0, 0)T , which is

imaged at p1 = PDx. See Figure 68.

• p4 is the image of the world origin Ow = (0, 0, 0, 1)T .

C- Row Vectors of P: The row vectors of P can be interpreted as (see Figure 69:

• PT1 is the y-axis plane, since the image of any point in this plane is (0, y, w)T

• PT2 is the x-axis plane, since the image of any point in this plane is (x, 0, w)T

• PT3 is the principle plane; the plane through the camera center parallel to the image

plan. A point X lies on the principle plane if and only if PT3 X = 0. In fact, the

159

FIGURE 68 – The geometrical interpretation of the projection matrix columns

FIGURE 69 – The geometrical interpretation of the projection matrix rows.

principle plane consists of the set of points X which are imaged on the line at infinity.

Explicitly, PX = (x, y, 0)T .

D- The Principle Point: The principle point is computed as:

x0 = Mm3 (117)

E- The Optical Ray: The vector that consists of all points X that project to point x and it

can be defined as:

X(λ) =

⎡⎢⎣ M−1(λxp4)

1

⎤⎥⎦ (118)

160

CURRICULUM VITA

NAME: Ahmed Hamad Mohamed Eid

ADDRESS:Electrical and Computer Engineering DepartmentUniversity of LouisvilleLouisville, KY 40292

EDUCATION:• M.Sc. in Electrical Communications Engineering 1999 Mansoura University.• B.Sc. in Electronics Engineering 1994 Mansoura University.

TEACHING:• Teaching Assistant for the following courses at university of Louisville: Digital

signal processing, Image processing, Pattern recognition, Random variables and stochasticprocesses, Computer vision.

• Teaching Assistant for the following courses at Mansoura University, Egypt: Elec-tronic circuits I, II, III and IV, Instrumentations and Measurements, Microprocessor Design,Applied Statistics.

PREVIOUS RESEARCH:• Vision-guided autonomous refueling systems.• Vision-based 3-D modeling of the human jaw.• Design and analysis of electronic circuits.

AWARDS AND SCHOLARSHIPS:• Who’s Who Among Students of American Universities and Colleges 2004.• Graduate Research Assistant, CVIP Lab, University of Louisville, 2003-Present,• SGI Award for Excellence in Computational Sciences and Visualization (spon-

sored by Silicon Graphics Inc.), Speed School of Engineering, University of Louisville,2003.

• Graduate Teaching Assistant, Department of Electrical and Computer Engineer-ing, University of Louisville, 2000-2003.PUBLICATIONS:JOURNALS[1] A. Eid and A. Farag, “On the performance evaluation of 3-D reconstruction techniquesfrom a sequence of images,” EURASIP Journal on Applied Signal Processing, to appear

161

2005.

[2] A. Farag and A. Eid, “Video reconstructions in dentistry”, Orthod. Craniofacial Res. 6(Suppl.1), pp. 108-116, Aug. 2003.

CONFERENCES[1] A. Farag and A. Eid, “Local quality assessment of 3-D reconstructions from sequenceof images: a quantitative approach,” Advanced Concepts for Intelligent Vision Systems(ACIVS’04), Brussels, Belgium, Aug. 31-Sep. 3, 2004, pp. 161-168.

[2] A. Eid and A. A. Farag, “On the fusion of 3-D reconstruction techniques,” Proceedingsof Seventh International Conference on Information Fusion (IF’04), Stockholm, Sweden,June 28-July 1, 2004, pp. 856-861.

[3] A. Eid and A. Farag, “A unified framework for performance evaluation of 3-D re-construction techniques,” IEEE Conference on Computer Vision and Pattern Recognition(CVPR’04), Workshop on Real-time 3-D Sensors and their Use, Washington DC, June 27-July 2, 2004.

[4] A. Eid and A. Farag, “Design of an experimental setup for performance evaluationof 3-D reconstruction techniques from sequence of images,” Eighth European Conferenceon Computer Vision (ECCV’04), Workshop on Applications of Computer Vision, Prague,Czech Republic, May 11-14, 2004, pp. 69-77.

[5] A. Eid and A. Farag, “On the performance characterization of stereo and space carv-ing,” Proceedings of Advanced Concepts for Intelligent Vision Systems (ACIVS’03), Ghent,Belgium, Sep. 2-5, 2003, pp. 291-296.

[6] A. Farag, E. Dizdarevic, A. Eid, and A. Lorincz, “Monocular, vision based, autonomousrefueling system”, Proceedings of Sixth IEEE Workshop on Application of Computer Vision(WACV’02), Orlando, Fl, Dec. 3-4, 2002, pp. 309-313.

[7] A. Eid, S. Rashad and A. Farag, “A general purpose platform for 3D reconstructionfrom sequence of images,” Proceedings of Fifth International Conference on InformationFusion (IF’02), vol. I, Annapolis, MD, July 7-11, 2002, pp. 425-413.

[8] A. Eid, S. Rashad and A. Farag, “Validation of 3D reconstruction from sequence ofimages,” Proceedings of the International Conference on Signal Processing, Pattern Recog-nition, and Applications (SSPRA’02), Crete, Greece, June 25-28, 2002, pp. 375-380.

[9] M. Ahmed, A. Eid, A. Farag, “3D reconstruction of the human jaw: a new approach andimprovements”, International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’01), Netherlands, Oct. 2001, pp. 1007-1014.

162

[10] M. Ahmed, A. Eid, and A. Farag, “3-D reconstruction of the human jaw using spacecarving,” IEEE International Conference on Image Processing (ICIP’2001), vol. II, Greece,Oct. 2001, pp. 323-326.

[11] H. Soliman, A. Hamad (Eid), and N. Hamdy, “A video-speed switched resistor A/Dconverter architecture”, Proc. of the 43rd IEEE Midwest Symposium on Circuits and Sys-tems, Michigan, Aug. 2000.

[12] N. Hamdy, H. Soliman and A. Eid, “A vertical successive approximation A/D con-verter architecture for High-Speed Applications”, Proc. of the 41st Midwest Symposium onCircuits and Systems, Notre Dame, Aug. 1998.

163

Documents

· ACKNOWLEDGMENTS I would ﬁrst like to thank my advisor, Dr. Aly Farag, for his guidance and support duringthiscourseofstudy. IamindebtedtoDr. Farag,whohashadatremendousinﬂuence