Incremental 3D Reconstruction Using Stereo Image Sequencesocho.uwaterloo.ca/Research/Theses/moyungmasc.pdf · 2001-11-24 · Incremental 3D Reconstruction Using Stereo Image Sequences

Incremental3D ReconstructionUsingStereoImageSequences

by

Tai JingMoyung

A thesis

presentedto theUniversityof Waterloo

in fulfilment of the

thesisrequirementfor thedegreeof

Masterof AppliedScience

in

SystemsDesignEngineering

Waterloo,Ontario,Canada,2000

c�

Tai JingMoyung2000

I herebydeclarethatI amthesoleauthorof this thesis.

I authorizethe University of Waterlooto lend this thesisto otherinstitutionsor individuals for

thepurposeof scholarlyresearch.

I further authorizethe University of Waterlooto reproducethis thesisby photocopying or by

othermeans,in total or in part,at therequestof otherinstitutionsor individualsfor thepurpose

of scholarlyresearch.

ii

The University of Waterloorequiresthe signaturesof all personsusing or photocopying this

thesis.Pleasesignbelow, andgiveaddressanddate.

iii

Abstract

In the last two decades,a tremendousamount of researchhas been done in the area of

reconstructingthree-dimensionalobjects from two-dimensionalcameraimages. One major

challengeof the reconstructionproblemis to find featurecorrespondences,that is, to locate

the projectionsof the samethree-dimensionalgeometricalor textural featureon two or more

images.Classicalapproachesto reconstructionfocuson estimatingstructureeitherfrom stereo

imagepairsor from monocularimagesequences.Limitations in bothof theseapproacheshave

motivatedagrowing interestin computingstructurefrom stereoimagesequences;however, most

existing techniquesin this areaassumethat featurecorrespondencesareestablishedin a previ-

ousstep,or thatthey usedomainspecificassumptionsthatareinappropriatein otherapplications.

In this thesis,I presenta robust, incremental3D reconstructionalgorithm using stereoimage

sequences.Theproposedmethodaddressesthe problemof establishingaccuratefeaturecorre-

spondences.Furthermore,the algorithmdevelopsan incrementallydenserepresentationof the

reconstructedobjectthroughabootstrapfeaturematchingprocess.Wearespecificallyinterested

in theapplicationof this approachin thespacecontext for suchpurposesassatelliteidentifica-

tion, grasping,dockingandfault diagnosis.Resultsdemonstratingthepotentialof this approach

arepresented.Conclusionsaredrawn anda list of possibilitiesfor futurework arediscussed.

iv

Acknowledgements

First and foremost, I would like to thank my supervisor, Dr. Paul Fieguth, for his support

and encouragementfor the past two years. I am greatly indebted to him for the many

occasionsin which he has gone out of his way to assistin the timely completionof this

thesis. Dr. Fieguth has beenan ongoing inspiration for me not only in terms of academic

endeavors, but his charismaticpersonalityhas also mademy graduatestudiesa much more

interestingand rewarding experience. As the first studentto completea Master degree un-

derhis full supervisionfrom startto finish,I hopethisthesishasnotdisappointedhim in any way.

I would like to thankNaturalSciencesandEngineeringResearchCouncil (NSERC)of Canada

for funding the researchin this thesisthrougha PostgraduateScholarship(PGSA), andMac-

DonaldDetwiller SpaceandAdvancedRoboticsInc. for motivatingmy researchandproviding

test data. I would also like to thank my readers,Dr. Carolyn MacGregor and Dr. Medhat

Moussa,for their valuablecommentson the thesis. Furthermore,I thank Dr. Ed Jernigan

and all the other membersof the Vision and Image Processinggroup for the eye-opening

discussionson variousresearchinterests,andespeciallyfor the engagingconversationsduring

my procrastinationin thedayandthecompany duringmy productivehourslateatnight in thelab.

Many otherpeoplehave contributed to the completionof this thesisin much more intangible

ways. Specialthanksgo to Daniel,Carey, Emily, Amy andNat who hadthe“privilege” to hear

a lot of my whining andsighing,andof course,fellow Exec Committeemembersof UWCCF,

who patientlyheardmy complaintsthat“I have beenvery unproductive” in everyweeklyprayer

meeting.Thegirls from Eric Hamberandundergraddays,aswell asotherbrothersandsisters

from CCFalsodeserve my gratitude.

I owe themostto my family for their love andsupport,especiallyto my “baby” nephew Kester,

whohasgivenmemany laughablemomentsandtaughtmehow to becuriousonceagain.

Finally, I thank God for delivering me throughmany difficult times and giving me countless

blessings,mostimportantof all, thenew life thatHehasgivenmethroughtheLord JesusChrist.

v

Contents

1 Intr oduction 1

1.1 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 ThesisOverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 7

2.1 TheCameraModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 3D Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3 FeatureExtraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 StructureFromStereo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4.1 StereoGeometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.2 Reconstructionby Triangulation . . . . . . . . . . . . . . . . . . . . . . 16

2.4.3 EpipolarConstraint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.4 StereoMatchingTechniques. . . . . . . . . . . . . . . . . . . . . . . . 19

2.4.5 AdvantagesandDisadvantages. . . . . . . . . . . . . . . . . . . . . . . 21

2.5 StructureFromMotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.1 Motion Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5.2 Motion andStructureFromOpticalFlow . . . . . . . . . . . . . . . . . 24

2.5.3 Motion andStructureFromPointFeatures. . . . . . . . . . . . . . . . . 27

2.5.4 Long ImageSequences. . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5.5 AdvantagesandDisadvantages. . . . . . . . . . . . . . . . . . . . . . . 29

2.6 Structurefrom StereoImageSequences. . . . . . . . . . . . . . . . . . . . . . 30

2.6.1 AssumedFeatureCorrespondences. . . . . . . . . . . . . . . . . . . . 30

2.6.2 DirectEstimationor Inference . . . . . . . . . . . . . . . . . . . . . . . 31

2.6.3 ConstrainedMatching . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

vi

3 Incremental3D Reconstruction 36

3.1 ProblemDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Overview of theIncrementalReconstructionAlgorithm . . . . . . . . . . . . . . 38

3.3 Two DimensionalFeatureTracking. . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.1 Motion andMeasurementModels . . . . . . . . . . . . . . . . . . . . . 42

3.3.2 PredictionandUpdate . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.3 ModelPriors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.3.4 Relationto StereoMatchingandMotion Estimation . . . . . . . . . . . 45

3.4 ThreeDimensionalFeatureTracking . . . . . . . . . . . . . . . . . . . . . . . . 46

3.4.1 Motion andMeasurementModels . . . . . . . . . . . . . . . . . . . . . 47

3.4.2 PredictionandUpdate . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4.3 ModelPriors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.5 Multiple HypothesisTrackingandStereoMatching . . . . . . . . . . . . . . . . 50

3.5.1 HypothesisGeneration. . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.5.2 HypothesisManagement. . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Simulations 57

4.1 Descriptionof Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.2 Two DimensionalTracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.3 ThreeDimensionalTracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.4 IncrementalReconstruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 ExtensionsFor Real ImageProcessing 72

5.1 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.1.1 LeastSquaresEstimation. . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.1.2 AssessingEstimateAccuracy . . . . . . . . . . . . . . . . . . . . . . . 75

5.1.3 Modificationto 3D DynamicModel . . . . . . . . . . . . . . . . . . . . 75

5.2 Resultsof IncorporatingMotion Estimation . . . . . . . . . . . . . . . . . . . . 77

5.3 Adding New Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.4 RealImageSequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6 Conclusions 91

6.1 ThesisAchievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.2 LimitationsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

vii

A Camera ParametersFor Simulationsand Experiments 97

B WeightedLeastSquaresEstimation of 3D Motion 100

Bibliography 104

viii

List of Tables

1.1 Different3D reconstructionmethodsandtheirproperties[1]. . . . . . . . . . . . 2

A.1 Parametersof syntheticcamerasystemusedin thesimulationexperiments. . . . 98

A.2 Parametersof PulnixCCD camerasusedin therealimageexperiment. . . . . . . 99

ix

List of Figures

1.1 Two successive imagesof a robotic arm graspinga micro-satellitein spacefor

anddocking.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Thepinholecamera.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Cameraandimagecoordinatesystems.. . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Parametersthatrelatetheworld, camera,andimagecoordinatesystems.. . . . . 9

2.4 Therelationshipbetweentheworld andcameracoordinatesystems. . . . . . . . 9

2.5 3D reconstructionfrom multiple2D intensityimages . . . . . . . . . . . . . . . 11

2.6 A typical stereocameraconfigurationusedfor capturingstereoimages. . . . . . 14

2.7 3D reconstructionby triangulation.. . . . . . . . . . . . . . . . . . . . . . . . . 15

2.8 Epipolargeometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.9 Motion field of amoving plane. . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.10 Theapertureproblem.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.11 Thefour-framemodelfor stereoimagesequenceprocessing . . . . . . . . . . . 33

3.1 Theiterative processbetweenmotionestimationandreconstruction.. . . . . . . 39

3.2 Flowchartof theincrementalreconstructionalgorithm. . . . . . . . . . . . . . . 41

3.3 Constraintsin 2D featuretracking. . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Constraintsin 3D featuretracking. . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.5 Deferralof matchingdecisionsby multiple hypothesistracking. . . . . . . . . . 51

3.6 Outlineof themultiple hypothesistrackingalgorithm . . . . . . . . . . . . . . . 52

3.7 An examplesituationin which redundantstereohypothesesarecreated. . . . . . 54

4.1 Syntheticsatellitemodelusedfor simulations . . . . . . . . . . . . . . . . . . . 58

4.2 Samplesyntheticdatapoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.3 Demonstrationof multiple hypothesistwo-dimensionaltracking(1). . . . . . . . 61

4.4 Demonstrationof multiple hypothesistwo-dimensionaltracking(2). . . . . . . . 62

x

4.5 Demonstrationof multiple hypothesisthree-dimensionaltracking(1). . . . . . . 64

4.6 Demonstrationof multiple hypothesisthree-dimensionaltracking(2). . . . . . . 65

4.7 3D pointsreconstructedusingonly 2D featuretrackingandno measurementnoise. 66

4.8 Summaryof resultsfor 2D trackingaloneandno measurementnoise. . . . . . . 67

4.9 3D pointsreconstructedusingonly 2D featuretrackingandmeasurementnoise��. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.10 Summaryof resultsfor 2D trackingaloneandnoise��

). . . . . . . . . . . . 69

4.11 3D pointsreconstructedusingonly 3D feature trackingandmeasurementnoise��. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.12 Summaryof resultsfor 3D trackingaloneandnoise��

. . . . . . . . . . . . . 71

5.1 Shapeof 3D pointestimateuncertaintyat two differentdepths. . . . . . . . . . . 74

5.2 An illustrationof how featuretrackingis affectedby theuncertaintyof predictions. 78

5.3 Resultsof motionestimation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.4 3D pointsreconstructedusingcombined2D and3D featuretrackingandmea-

surementnoise��

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.5 Comparisonof reconstructionresultsbetweenusing and not using 3D motion

estimation.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.6 Theset-upfor capturinga realstereoimagesequence.. . . . . . . . . . . . . . . 85

5.7 Samplestereoimagepairsin therealimagesequence.. . . . . . . . . . . . . . . 86

5.8 Resultsof reconstructionusingrealimagesequencewith replenishingfeatures. . 88

5.9 Summaryof resultsfor therealimagesequence.. . . . . . . . . . . . . . . . . . 89

xi

Chapter 1

Intr oduction

In very broadterms,humanvision usually refersto both the sensoryandperceptualprocesses

associatedwith whatwe normallycall “seeing.” Similarly, computervision is a very broadfield

of researchthat is intendedfor helpingcomputersandrobots“see.” It includesa setof compu-

tationaltechniquesaimedat estimatingor makingexplicit thegeometricanddynamicproperties

of the three-dimensionalworld from digital images[1, 2]. With theadvancesof digital camera

andimagingtechnology, computervision is playinganincreasinglyimportantrole in automating

tasksthat involve visualsensoryinput. Someexamplesincludeindustrialassemblyandinspec-

tion, robot obstacledetectionandpathplanning,autonomousvehiclenavigation of unfamiliar

environments,imagebasedobjectmodelling,surveillanceandsecurity, medicalimageanalysis,

andhuman-computerinteractionthroughgestureandfacerecognition.

As we live in an ageof informationandspaceexploration, the demandfor satelliteandother

spacerelatedtechnologyhasled to a rapidgrowth of theaerospaceindustry, in which computer

vision hasalsofound its place. Someexamplesincludeautonomousprecisionlanding,survey-

ing, loadingandunloadingequipment,andsatelliteservicingandrepair[3]. Oneapplicationin

which MacDonaldDettwiler SpaceandAdvancedRoboticsLtd.1 hasinterestis theuseof com-

putervision to guidethe retrieval anddockingof micro-satellitesor otherspacemoduleswith

spacecrafts.Camerason boardthe spacecraftprovide the necessaryvisual feedback.The use

1MD Roboticsor simply MDR, is a wholly ownedsubsidiaryof MacDonaldDettwiler andAssociatesLtd. Itsfacilitiesarelocatedin Brampton,OntarioCanada.

1

CHAPTER1. INTRODUCTION 2

Method Numberof Images TypeStereo 2 or more passiveMotion 2 or morein sequence active/passiveFocus/defocus 2 or more activeZoom 2 or more activeContours single passiveTexture single passiveShading single passive

Table1.1: Different3D reconstructionmethodsandtheirproperties[1].

of computervision mayassisthumanoperatorsin this taskandimprove precisioncontrol. This

applicationservesasaspecificmotivationfor thework in this thesis.

1.1 3D Reconstruction

In many of the aforementionedapplications,oneof the necessarycomputervision tasksis the

recoveryof three-dimensionalstructurefrom two-dimensionaldigital cameraimages.Duringthe

imageformationprocessof thecamera,explicit 3D informationaboutthesceneor objectsin the

sceneis lost. Therefore,3D structureor depthinformationhasto beinferredimplicitly from the

2D intensityimages.This problemis commonlyreferredto as3D reconstruction.

The establishedmethodsfor recovering 3D structurediffer in termsof the cuesthat they ex-

ploit, thenumberof imagesrequired,andwhetherthemethodsareactiveor passive[1]. Active

methodsarethosein which theparametersof thevision systemaremodifiedpurposively for 3D

reconstruction.Table1.1 lists the commonlyknown methodsandtheir classification.Among

thesemethods,structure-from-stereo[4, 5, 6] andstructure-from-motion[7, 8, 9] take advantage

of additionalinformationprovidedby morethanoneimageanddo not requirespecialhardware

suchasa motorizedlens.As a result,they arevery popularapproachesfor recovering3D struc-

turefrom digital images.

Structur e From Stereo

Structure-from-stereousescameraimagesthat aretaken from differentviewpoints. For classic

binocularstereo,a singlepair of imagesof thesameobjector sceneis takensimultaneouslyby

two cameraslocatedat two differentspatiallocationsandsometimeswith differentorientation


aswell. 3D structureis recoveredin a way analogousto humanstereopsis.Computationaltech-

niquesusethelocationoffsetof thecontentbetweenthetwo imagesto perceive depth.However,

the searchfor the correspondingelementsin the two imagesremainsto be a challengingand

unsolvedproblem.

Structur e From Motion

Structure-from-motionusesa monocularsequenceof imagesthataresampledin time. Over the

courseof thesequence,eitherthecamera,thescene,or both thecameraandthesceneundergo

someform of motion. Biological visual systemsusevisual motion to infer propertiesof the

three-dimensionalworld [1]. In a similar manner, theanalysisof theapparentmotionof objects

in digital imagesprovidesastrongvisualcuefor recoveringstructure.Althoughconceptually, 3D

reconstructionfrom motion is similar to that from stereo,thecomputationaltechniquesarevery

differentbecauseof the differentpropertiespossessedby the availableimagesin eachmethod.

Onedrawbackusingmotionis thattheestimatedstructureis only exactto a scalefactorandany

noiseinvolvedin theprocesshasasignificantimpacton theaccuracy of thereconstruction.

Combination of Stereoand Motion

More recently, many researchershave turnedtheir attentionto using stereoimagesequences

to recover 3D information [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]. Multiple stereopairs

of imagesthat arecloselysampledin time arecaptured,which provide both stereoandvisual

motioncuesfor understandingstructure.Furthermore,stereoandmotioncomplementeachother

in a particularfashionwhenthey areintegratedinto a singlereconstructionsystem.Theresults

from pastwork show thattheuseof stereoimagesequencesis apromisingdirectionto pursue,but

existingmethodsapproachtheproblemfrom all differentdirections,eachaddressingaparticular

aspectof thereconstructionproblemwithoutmuchconsiderationof theotheraspects.

1.2 ThesisOverview

Thisthesisisprimarily interestedin theproblemof recovering3D informationaboutarigid object

in a scenefrom digital cameraimages;it builds on the work of many pastefforts to solve the

problemof 3D reconstructionusingstereoimagesequences.Thework in this thesisis motivated

by threemajorobservations:


Figure 1.1: Two successive imagesof a robotic arm retrieving a micro-satellitein spacefordocking.Notethesharpshadows anddifferencesin lighting conditions.


1. Despitethelargeamountof previouswork on thetopic, theproblemof 3D reconstruction

is still unsolved.

2. It hasbeenobserved that for harshenvironmentssuchas outer space,extremelighting

conditions,largedynamicrangeof imagebrightness,specularreflectionandhardshadows

exist (seeFigure1.1). Underthesecircumstances,featureextractionmaynot alwaysgen-

eratereliableresults;thereforethequestionof how onecanstill perform3D reconstruction

well in this casearises.

3. Pastwork on usinga stereoimagesequenceto perform3D reconstructionhasaddressed

specificaspectsof theproblem.However, thereis a lackof unifiedframework for integrat-

ing thedifferentresults.

For thesereasons,this thesisfocuseson developinganalgorithmicframework for reconstructing

3D pointsusingstereoimagesequenceswith thefollowing goals:

� The algorithm builds an incrementallyaccurateand denserepresentationof the recon-

structedobjectusing3D featurepoints.

� Only stereogeometryandmotionconstraintswill beused,henceminimizing themethod’s

relianceon any informationaboutthe extractedfeaturepoints,which may sometimesbe

inconsistentfrom frameto frameor unreliable.

� The framework makesprovisionsfor addressingpossibleproblemsin featureextraction,

suchasmisseddetectionsandfalsefeatures.

� Althoughthis thesisis by no meansattemptingto solve theproblemof 3D reconstruction

in space,the original motivation for examining3D reconstructionwasMDR’s aerospace

application.Therefore,somespecificswith respectto this problemwill beconsideredfor

thedevelopmentof thealgorithm.

Theremainderof this thesisis organisedasfollows:

� Chapter2 providesa literaturereview of existing researchon the3D reconstructionprob-

lem usingstereoand/ormotion information. Theoverall strengthsandweaknessesof es-

tablishedmethodswill be identified. It alsointroducessomeof themathematicalnotation

thatwill beusedthroughoutthis thesis.


� Chapter3 definestheproblemthatthis thesisis trying to tacklein aqualitativemanner. The

majorcharacteristicsof andtheapproachtakenby this researcharediscussedandjustified,

followedby adescriptionof thebasictheoreticaldevelopmentof theproposedalgorithm.

� Simulationresultsbasedonamock-upsatellitemodelarepresentedin Chapter4, showing

the applicationof the theorydescribedin Chapter3. The limitations of the performance

aredeterminedandexplained.

� Chapter5 discussestwo modifications/extensionsto thework in Chapter3 thatarespecif-

ically important for experimentationon real imagesequences.The resultsof integrat-

ing theseextensionsarealsopresentedin experimentson both syntheticandreal image

squences.

� Finally, the contributionsof this thesisaresummarisedin Chapter6, alongwith a list of

futureresearchrecommendations.

Chapter 2

Background

In this chapter, the imagingmodelandtheproblemof 3D reconstructionaredefined.A survey

of threeexisting approachesto solve this problemwill beprovided: structurefrom stereo(Sec-

tion 2.4),structurefrom motion(Section2.5),andstructurefrom stereoandmotion(Section2.6).

Thelimitationsanddrawbacksof thesetechniqueswill bediscussed.

2.1 The Camera Model

A simplegeometricmodeldescribingthe imageformationprocessof a camerais the pinhole

camera model[1, 2]. As shown in Figure2.1, thecamerais representedasa smallholethrough

which light travels; an intensity of an object is formed on the camera’s imageplanethrough

perspective projection.In orderto determinehow three-dimensionalobjectsin theworld appear

in two-dimensionalcameraimagesgeometrically, we needto definethreedifferentcoordinate

systemsin which to representtheseobjects: the world coordinatesystem(WCS), the camera

coordinatesystem(CCS)andtheimagecoordinatesystem(ICS).

The WCS is a fixed, three-dimensionalframe of referencefor representingthree-dimensional

objectsandscenesin theworld. It is definedby theorthogonal� �� axesandanorigin� � . TheCCSis anotherthree-dimensionalcoordinatesystembut it correspondsto thecamera’s

locationandorientation.As shown in Figure2.2, theCCSis definedby the � �� axes;

7

CHAPTER2. BACKGROUND 8

retinal plane

image planef

object

l

pinhole

Figure 2.1: The pinhole cameramodel. Typically, the perspective projectionis definedwithrespectto the imageplane,which is separatedfrom the pinholeby a distanceof �� , the focallengthof thecameralens.

its origin, � � , correspondsto thecamera’s optical centre (thepinhole). The � axis,orthogonal

to theimageplane,is referredto astheopticalaxis. TheICS is a frameof referencefor thepixel

coordinatesof a two-dimensionalcameraimage;it is definedby the �� axes,with its origin

locatedat thetop left cornerof theimage.Theintersectionof thecamera’s opticalaxiswith the

imageplaneis calledtheprincipal point, ��! "��# $� , which is expressedin termsof theICS.

Thethreecoordinatesystemsdescribedabovearerelatedto eachotherby two setsof parameters,

intrinsic andextrinsic, asdepictedin Figure2.3.Extrinsicparametersconsistof a %!&'% orthogonal

rotationmatrix ()� andatranslationvector *#� , whichdescribethelocationandorientationof the

CCSwith respectto theWCS.Figure2.4illustratestherelationshipbetweenthesetwo coordinate

systems.

Let +�, �.- ��/��'021 bea 3D point expressedin theWCS,thenits coordinatesin theCCS,

+�3 �.- �4�!�5��60 1 , are

+ � � (7�)+ �98 *#�;:

Intrinsicparameters,determinedby theopticalanddigital sensorpropertiesof acamera,describe

theperspective projectionof a three-dimensionalpoint ontoa two-dimensionalimage.Let < �- �4�=0 1 bethepixel coordinatesin ICS of a point asit appearson animage;its locationin terms


Z

X

Y

x

y

(x ,y )o o

oc

(0,0)

image plane

optical axis

opticalcentre

c

c

c

CCS

ICS

principal point

Figure2.2: Cameraand imagecoordinatereferenceframes,definedby the � �>�� and�� axesrespectively.

3D WorldCoordinates

2D ImageCoordinates

3D CameraCoordinates

IntrinsicParameters

ExtrinsicParameters

Figure2.3: Parametersthatrelatetheworld, camera,andimagecoordinatesystems.

pc

wp

WCS?

CCS@

Figure2.4: Therelationshipbetweentheworld andcameracoordinatesystems.


of thepoint’s coordinatesin theCCSis

� � ��A �B=C ��8 � �

� � ��D��B#E ��8 � � (2.1)

where

� �� is thefocal lengthof thecameralens(mm),

�FB$G and B$H aretheeffective pixel width andheightof thecamera(mm),and

� �� is theprincipalpoint of thecamera.

Alternatively, if we defineaprojective matrix

I �JLKM C N � N JOKM E � N N � (2.2)

where

�QP�>PR P

� I ��

�

then

� � �SPR P � � � �>PR P :

As onecanseefrom (2.1), cameraprojectionreducesa three-dimensionalrepresentationof the

world to two dimensions,losingall depthinformation. Furthermore,perspective projectionis a

non-linearprocess,which complicatesthedevelopmentof computervision algorithmsdesigned

to reversethe processto recover the missingdimension. Several approximationsandsimplifi-

cationsarecommonlyusedby the researchcommunityto addressthis problem. Theseinclude

orthographic[21], weakperspective [22], andparaperspective [23] projections.Without lossof

generality, this thesiswill usethefull perspective modelfor cameraprojection.


o1

2o

Figure2.5: 3D reconstructionfrom multiple 2D images.�ST and ��U aretheopticalcentresof thecamerascapturingimage

�andimage V respectively.

2.2 3D Reconstruction

3D reconstructionis the problemof recovering depthinformationfrom intensity images. One

commonapproachof 3D reconstructionusesmultiple images. It is basedon the principle that

a physicalpoint in spaceis projectedontodifferentlocationson imagescapturedfrom different

viewpoints, asshown in Figure2.5. The differencein the projectedlocationsis usedto infer

depthinformation.

Specifically, considera rigid object representedby a set of W 3D points, X�+�Y�Z��Z[ , on some

coordinatesystemat frame � . Eachpoint + Y Z�� is projectedonto an image,�Z\ Z�� , which is

capturedfrom aviewpoint ] . Then< \Y Z�� , thepoint’s coordinatesin theICScanbeexpressedin

termsof avector-valuednon-linearfunction ^ :

< \Y_Z��

� ^`Z]$�Z+�Y�Z��Z�Z: (2.3)

Notethat ^ is asimplificationof theperspective equationsin (2.1) for aspecifiedcamera.

Given a set of imagesthat are taken from different viewpoints ] (structure-from-stereo) or at

differenttimeframes� (structure-from-motion), wemaybeableto reconstructthepoints X�+ Y Z��Z[from acompletesetof theirprojectionsX�< \Y Z��Z[ . Detailsonhow this is accomplishedin thetwo

caseswill bediscussedin Sections2.4and2.5respectively.


Therearetwo computationalsubproblemsassociatedwith 3D reconstructionfrom two or more

images:

1. Featurecorrespondence,

2. Structureestimation.

Thefirst problemis bestexplainedin anexample.For instance,a physical3D point is projected

ontoImageA aspoint 1 andImageB aspoint 2. Points1 and2 aresaidto becorrespondences.

Hence,the featurecorrespondenceor featurematchingproblemis to find wherepoint 2 is on

imageB giventhelocationof point1 onimageA. Humanvisionissuperbin solvingthisproblem,

but theautomationof thisprocessby computersis ratherdifficult. It essentiallyrequiresasearch

on the whole imageB. Applying properconstraintscan narrow down the search,but without

sufficient constraints,theproblembecomesill-posedandambiguitiesarise.

Thesecondproblem,structureestimation,is relatively easyin comparison.It is thecomputation

of thepointset X�+ Y [ afterthecorrespondenceproblemis solved.Thedifficulty of thissubproblem

dependson theamountof a priori informationavailable.If theintrinsicandextrinsicparameters

of the camera(s)areknown for the whole setof images,thenan exact reconstructionin abso-

lute coordinatesis possible.However, theaccuracy of the reconstructedstructureis sensitive to

theaccuracy of theseparameters.Moreover, any errorsin solving the correspondenceproblem

betweentwo imagesalsoaffect the accuracy of the reconstruction.As a consequence,even if

intrinsic andextrinsic parametersareknown, the challengeremainsfor developingalgorithms

thatreducetheeffectsof errorsin thepreprocessingstepson thestructureestimate.

2.3 FeatureExtraction

In the previous section,we have usedthe term feature looselywithout giving it a precisedefi-

nition. In some3D reconstructionapplications,it maybenecessaryto estimatethestructureof

a scenefor every point in the image. However, sometimeswe mayonly be interestedin recon-

structingthe depthof an objector sceneat certainparts. Imagefeaturesusually refer to parts

of animagethathave specialproperties,andthey maycorrespondto partsof anobjector scene

that have structuralsignificance,to regions that have visually identifiabletexturesor intensity


patterns,or to any otherderived propertiesthat canbe localisedon an image. Somecommon

examplesareedges,lines,corners,junctions,ellipses,andzero-crossingsof imagegradients.

Featureextractionis the processof locatingtheseparticularelementson an imageandit is an

intermediatestepfor many computervision applications.The choiceof featuresto extract for

reconstructionvery oftendependson thepropertiesof theobjectsin thescene.Someimportant

factorsto considerareinvariance,easeof detectability, andhow they areeventuallyused.For our

purposes,wewill concentrateonpoint features, featuresthatcanbelocalisedin two dimensions.

Point featuresareeasyto representmathematicallyand they candirectly correspondto three-

dimensionalpoints in space. Many featuresthat can be localisedto a point are usually easy

to detect,andarerelatively consistentacrossdifferent imagescomparedto other featuressuch

asedgesandlines. Therearemany mathematicaldefinitionsof localisedimagestructuresand

sometimesthey arebroadlyidentifiedascorners. The literatureon cornerdetectionis immense

anda few examplesare[24, 25].

Featureextractionis not a focusof this thesissowe would like to avoid discussingthedetailsin

thissection.Weuseapublicly availableimplementationof thework by TomasiandKanade[26],

which is basedon theearlierwork of [27], to extractcornerfeaturesin our experiments.For all

otherdiscussions,weassumethatimagepoint featuresarereadilyavailablefrom aseparate,pre-

processingstep.Sometimeswe will alsorefer to featureson anobject,andthey refer to visible

geometricalor textural elementson a three-dimensionalobjectevenif the featureextractormay

notbeableto detectthem.

2.4 Structure From Stereo

The use of stereopsisfor depth perceptionin humanvision is a well known phenomenon.

Structure-from-stereosimply refersto the classof computervision algorithmsthat appliesthe

sameprinciple for inferring depthinformationfrom imagestaken from differentviewpoints. A

typical binocularstereocamerasystemis illustratedin Figure2.6. In summary, thetwo cameras

aremountedsuchthat their optical axes(the � -axes)arecoplanarandalignedin parallel. The

separationbetweentheopticalcentresof theleft andright camerasis calledthebaseline, andit is

usuallycreatedby atranslationbetweenthecameras’opticalcentresalongtheircommon -axis.

Theleft andright camerasin thestereosystemcapturea pair of images,X �Za Z��Z� �Zb Z��Z[ , simul-


X X

Z Z

Y Y

baseline

Figure2.6: A typical stereocameraconfigurationusedfor capturingstereoimages.

taneouslyor separatelywhenno changeshave occurredin thescenebetweentheacquisitionof

the two images.In stereovision, thedifferencein theprojectedpositionsof a point on the left

andright imagesis referredto asthedisparity, andthecollectionof disparityvaluesfor a whole

imageis known asthedisparitymap.

2.4.1 StereoGeometry

One of the advantagesof the structure-from-stereoapproachto 3D reconstructionis that the

geometricalrelationshipbetween�Za Z�� and

�Zb Z�� isalreadyknown dueto thefixedconfiguration

of most stereosystems.If both the intrinsic andextrinsic parametersof the camerasarepre-

determinedby camera calibration [1], theproblemof structureestimationcanbesolvedusinga

simpleprocedureknown as triangulation [1]. The letters c and d will be usedconsistentlyas

subscriptsor superscriptsthroughoutthis thesisto denotenotationassociatedwith the left and

right camerasrespectively. Sinceonly asinglestereoimagepair is consideredin thissection,any

referenceto theframenumber� is omitted.

Usually, anobjectin thescenemayberepresentedwith respectto a fixedWCSor with respect

to a CCS,in which casetherepresentationwould differ from onecamerato anotherif they have

differentpositionsand/ororientations.Thechoicelargely dependson theapplication.For this

thesis,theleft cameracoordinatesystemwill beusedconsistently, becauseanobjectrepresenta-

tion relative thecamerais desiredfor ourspecificapplication(seeChapter1) . Thereforewefirst

definethe relationshipbetween3D pointsexpressedin the left CCSandthoseexpressedin the

right CCS.


e#f e=g

h f h g

i f

jlk�mon$pq gq f

r q f

s q g

Figure2.7: 3D reconstructionby triangulation.

Let + a and + b be the left and right cameracoordinatesof the samepoint + , in space,and

Z( �ut �v* �ut � and Z( �uw �v* �uw � theextrinsic parametersof theleft andright camerasrespectively,

suchthat

+ a � (7�utx+ , 8 *=�ut+ b � (7�uw6+ , 8 *=�uw

Then + b canbeexpressedin termsof + a :

+ b � (�+ a 8 * (2.4)

where

( � ()�uw�(7y�z�ut � * ��{ (7�uw�(7y�z�ut *|�ut 8 *#�uw�: (2.5)

Using (2.4), any coordinatesin the left CCScanbechangedto thosein the right CCSandvice

versa.


2.4.2 Reconstructionby Triangulation

For reconstructionfrom stereoimages,the problemof featurecorrespondenceis equivalent to

finding thesetof correspondingprojectionsX�< aY �Z< bY [ . We will assumethat this problemhas

alreadybeensolved,i.e.,wehave X~}< a ��}< b [ , theestimatedprojectionsfor thepoint + a .

Let � a and � b be the three-dimensionalvectorsexpressingthe directionof < a and < b with

respectto theopticalcentresof thetwo cameras.As shown in Figure2.7, theobjective of trian-

gulationis to find theintersectionbetweenthetwo vectorsextrapolatedfrom � a and � b .

LetI a

andI b

betheprojective matricesasdefinedby (2.2).Thenby applyingthereverseof the

projectionon thehomogeneouscoordinatesof }< a and }< b , thetwo vectorsresult:

� a � I y�za }< a� �� b � I y�zb }< b� :

Dueto errorsin featureextractionandcameracalibration,theextrapolatedvectorsmaynot inter-

sectexactly. Consequently, a commonandsimplemethodis to estimate+ asthemidpointof the

segmentorthogonalto both � a and � b [28].

Let �x��]�� be scalarvariables.Along with (2.4), the relationshipsdepictedin Figure2.7 canbe

expressedin theleft CCSasfollows:

�x� a { ( 1 Z]O� b { *#� � � - � a &�( 1 o� b { *|��0�� (2.6)

Simplifying (2.6)gives

�|� a { ]�( 1 � b { � - � a &�( 1 o� b { *|��0 ��{ ( 1 *x�� a { ( 1 � b � a &�( 1 o� b { *x�

�]�

��{ ( 1 *x: (2.7)

��x��]�� is determinedby solving the linearsystemin (2.7). }+ a , anestimatefor + a , is simply the


p

o oLR

iepipolar plane

left image right image

epipolar line epipolar line

Figure2.8: Epipolarplaneformedby a point + andtheopticalcentresof theleft (� a ) andright(� b ) cameras.

midpointof �|� a and ( 1 Z]�� b { *#� :}+ a � �|� a 8 ( 1 Z]O� b { *|�V :

Throughoutthisdiscussion,it wasassumedthatcorrespondingimageprojectionsof + a areread-

ily available.However, asmentionedpreviously, featurematchingis adifficult problemandneeds

to besolvedprior to reconstruction.Thereforewewill now discussthisproblemin moredetail.

2.4.3 Epipolar Constraint

Let usassumethatfeatureextractionhasbeenperformedto obtaintwo setsof features,onefrom

eachof�Za

and�Zb

; theproblemof featurematchingis to find thecorrespondingfeaturein theright

imagefor eachdetectedfeaturein the left image,or vice versa. Theoretically, every featurein

theright imageis a potentialmatchcandidatefor every featurein theleft image,makingfeature

matchinga large, two dimensionalsearchproblem. In order to solve the problemefficiently,

additionalconstraintshave to be introduced.The mostbasicconstraintusedin structure-from-

stereoapproachesis the epipolar constraint [2, 1], which reducesthe searchproblemto one

dimension.

As shown in Figure2.8, a point in spaceandthe optical centresof the left andright cameras

form a planecalled the epipolar plane. The lines wherethe epipolarplaneintersectsthe two

imageplanesare the epipolar lines, and the projectionsof the point must lie on theselines.


Consequently, giventhelocationof any imagefeaturepointon theleft image,wecannarrow the

searchfor thepoint’s correspondencealongtheepipolarline.

Let � I - < aY 0 bethenormalvectorof theepipolarline of a left imagefeature< aY , where

� I - < aY�0 � � �R :

Theepipolarconstraintimpliesthatthecorrespondenceof < aY in theright imagemustlie on the

line representedby � I - < aY 0 . Mathematically, thismeans

o< bY � 1 � � I - < aY 0 � N � (2.8)

The locationof theepipolarlineson eachof the left andright imagesdependson thegeometry

of thestereosystemandcanbefoundusingtheintrinsicandextrinsicparametersof thecameras.

UsingI a

,I b

, ( and * aspreviously defined,let � beanantisymmetricmatrix suchthat �7� �*�&4� for all 3D vectors� , i.e.,

� � N {�� H� � N {�� G{�� H � G N :

A %;&�% matrix,known asthe fundamentalmatrix [29], is definedas

� � I y 1b ��( I y�zaandit satisfiestherelationship

o< bY � 1 � � < aY� � N : (2.9)

Thenby comparingtheepipolarconstraintin (2.8)with (2.9),theepipolarline ontheright image


canbefound,where

� I - < aY 0 � � < aY� :

In caseswherethedistribution of featureson the imagesaresparse,theepipolarconstraintmay

besufficient to uniquelysolve thefeaturematchingproblemassumingthat thefeatureextractor

reliably detectscorrespondingpointsin both images.However, asshown in Figure2.8, if there

are more than one extractedfeaturepoints in the right imagethat lie in the proximity of the

epipolarline, the depthof the point in concerncannotbe uniquelydetermined.Thereforethe

epipolarconstraintaloneis notalwaysguaranteedto solve thefeaturecorrespondenceproblem.

2.4.4 StereoMatching Techniques

In orderto furtherconstrainthefeaturematchingproblem,a multitudeof correspondencealgo-

rithmshavebeenproposedin thepasttwo decades[4, 5, 6, 30,31, 32]. Themaingoalof mostof

theseefforts is to limit thesearchspaceor minimizethenumberof matchingcandidatesfor each

featurepoint. Decisionsin matchingprimitivesandstrategiesareaffectedby many application

dependentfactorssuchasimaginggeometry, lighting conditions,andsurfaceproperties.

Local stereomatchingmethodsgenerallybelongto oneof two broadcategories:area-basedand

feature-basedtechniques[33]. Both of thesetechniquesarebasedon a measureof similarity

betweena region or featureof interestin oneimageandthatof theotherimage. In additionto

thesetwo local matchingmethods,phase-basedmethodsconstitutethe third category of stereo

matchingtechniques[34].

Area-basedmethodsassumethat theappearance,that is, the intensityvalues,of a small neigh-

bourhoodof pixels remainconstantfrom oneviewpoint to another. Hence,for all pixels in an

image,a correlationmeasuresuchascrosscorrelationor sumof squareddifferences[1] canbe

usedto find correspondingpixels in theotherimagewith a similar looking neighborhood.The

advantageof thismatchingtechniqueis thatadensedisparitymapis achievable,whichis anasset

for reconstructingthe completesurfacestructureeverythingin the cameras’viewpoints. How-

ever, this techniquerelieson the imageshaving highly texturedregions. Moreover, the choice

of the window sizefor computingthe correlationmeasurehasa significantimpacton the per-


formanceof thealgorithm. Very often, thechoiceof window sizeis arbitrarydependingon the

natureof thescene.Someresearchershavetriedto tacklethisproblemby introducingdeformable

windows [35], adaptive window sizes[36], andmultiple windowing [37].

Feature-basedmethods[6, 30, 32] establishcorrespondencebetweensimilar featuresin apairof

images.Discretefeaturessuchasedges,lines,pointswith highintensityvariation,zero-crossings

of gradients,or high level structuresareextractedfrom intensity information,andmatchingis

only performedon thesefeatures.A distancemetric is usedto assessthesimilarity of features

betweentheimages.Theadvantageof thisapproachis thatfeaturesaregenerallymoreinvariant

thanactualimageintensitiesunderlargeviewpoint variation. However, oftenonly a sparsedis-

paritymapis obtainedbecausethenumberof featuresin animagethataremeaningfulor easyto

matchmaybelimited. Theopenquestionsregardingfeature-basedmatchingareoftenrelatedto

whattypesof featuresshouldbeextractedfor specificapplications,andwhatsimilarity measures

anddistancemetricsarethebestfor eachtypeof features.Therearemany combinationsof pos-

sibilities andthecomplexity of someof theexisting methodsalmostseemtoo overwhelmingto

explore.

Althoughbothareaandfeature-basedlocal matchingmethodsprovide constraintsin additionto

theepipolarconstraint,ambiguitiesin featurematchingoftenstill exist. Somecommonassump-

tionsaremadeto furtherrestrictthesizeof thesearchspace.For example,it maybepossibleto

setlower andupperboundson theamountof disparityallowed becausethescenehasa known

maximumandminimum depths. Moreover, for a typical stereocamerasystemsuchasthat in

Figure2.6, the epipolarlines areparallelwith the horizontalimageaxis andstereodisparityis

restrictedto a horizontalcomponent.In this case,< b will alwayshave a smaller� coordinate

than that of < a , otherwisethe reconstructedpoint will have a negative depth. Furthermore,

consistency of the local matchescanbechecked on a global level. Commonglobal constraints

areuniqueness[4] anddisparitysmoothnesswithin a neighborhood[4] or alongcontours[5].

Theseconstraintsareusuallyenforcedby computationtechniquessuchas relaxationlabelling

[38], hierarchicaloptimization[31], or dynamicprogramming[39].

Phased-basedmethods[40, 41, 42, 34] for stereomatchingusea differentapproachfrom the

othertwo localmatchingmethods.ThesemethodsuseFourierphaseinformationcomputedfrom

theimagesdirectly for matching.Thedifferencebetweentheleft andright Fourier-phaseimages

areusedto computedensedisparitymaps.Theadvantageof this typeof algorithmis thatexplicit

featurematchingis not needed.Someproblemsthat phase-basedmethodshave to tackleare


phasediscontinuitiesandunstablephasewrapping[34].

2.4.5 Advantagesand Disadvantages

As mentionedpreviously, in structure-from-stereo,thegeometryof thecamerasystemis usually

known a priori , providing the convenienceof the epipolarconstraintfor finding featurecorre-

spondences.In addition,sincethebaselinein a typicalstereosystemis large,theresultsof trian-

gulationarereasonablyinsensitive to errorsin featureextraction.However, for thesamereason,

geometricdistortion,occlusionanddifferencesin specularpropertiesdueto varyingviewpoints

becomesignificant,suchthat theproblemof featurecorrespondencebecomesincreasinglydiffi-

cult. All themethodsoutlinedin Section2.4.4for solvingthisproblemtendto becomputationally

expensive andrequirecertainassumptionsaboutthecharacteristicsof the imageswhich do not

alwaysapplyin any generalcase.Moreover, singlepairsof stereoimagesby themselvescanonly

provide partial representationsof thescene.View registrationof multiple stereopairswould be

requiredto obtainmorecomplete3D structureinformation.

2.5 Structure From Motion

While structure-from-stereousesimagestaken from differentviewpointsfor 3D reconstruction,

structure-from-motionusesimagestaken at different time frames. The differencebetweenthe

two approachesis thatstructure-from-motiontypically involvesamonocularsequenceof closely

sampledimagesthataretakenover time, andthateitherthesceneor thecameraor bothhasun-

dergonesomeform of motionover theperiodof theimagesequence.Theunderlyingassumption

mostcommonlyassertedfor structure-from-motionis thatthemotionis smallenoughor thatthe

imagesaresampledfrequentlyenoughso that the imagesdo not changevery much from one

frameto thenext. In otherwords,thebaselinebetweenthecamerasin successive framesis small.

In general,two typesof motion may be presentin an imagesequence:cameramotion, and

movementof the scene. The latter may involve differentobjectsmoving with different kinds

of motion, in which casethe problemof motion segmentationarises.For the specificproblem

discussedin this thesis,we will assumethatthereis only one,rigid, relative motionbetweenthe

cameraandthescene.


2.5.1 Motion Model

As demonstratedin Section2.4,3D reconstructionfrom stereoimagesreliesonknowledgeabout

therelative positionandorientationbetweentheleft andright cameras.Similarly, anunderstand-

ing of the motion that inducesvisual changesin a monocularsequenceis useful in estimating

structure.Thereforea modeldescribingthe motion presentin an imagesequenceis necessary.

Sinceonly onecamerais usedin structure-from-motion,referencesto thecameraposition, c and

d , will beomitted;instead,theframenumber� will beusedto distinguishbetweenconsecutive

imageframes.

Themotionof acamerarelative to anobjectcangenerallybedescribedby atranslationalvelocity

vector � ��- �!��>� 0 1 andanangularvelocityvector� �.- ��5��u� 0 1 , definedwith respect

to the cameracoordinatesystem. The instantaneousvelocity of a point +SZ�� expressedin the

CCSis

�+SZ�� { � { ��&�+QZ��Z�where

� ��{�� {�� 8 � � �� {�� {�� 8 � � �� {�� {�� 8 � � �: (2.10)

Alternately, we canexpressthecoordinatesof +SZ�� at frame � 8 � in termsof a rotationmatrix

anda translationvector. Let �.- � � � � � � 0 1 representa translationvectorand d a %)&�% rotation

matrix definedwith respectto theopticalcentreor theorigin of theCCS,where d satisfiesthe

constraints

d¡d 1 � d 1 d �� ¢#£Z��Zd¡� �� :Then

+SZ� 8 � � � du+QZ�� 8 (2.11)


(a) Image1 (b) Image2 (c) Motion field

Figure2.9: Motion field of aplanemoving towardsthecamera.Velocityfieldsof thebackgroundarenot shown.

Onedrawbackof this motionmodelis that thereis only oneaxisof rotation.Any precessionor

tumblingmotion is not taken into account.This doesnot adequatelydescribethemotion of an

objectsuchasa satellitewhich rotatesaboutits axis of symmetry, which in turn rotatesabout

anotherspatiallyfixed axis. To addressthis issue,[43] proposesthe locally constantangular

momentum(LCAM) model. It allows the instantaneousrotationaxis to changeover time, such

that theeffectsof precessionfrom oneimageframeto thenext canbeapproximatedasvarying

amountsof rotation aboutdifferent rotation axes. However, we will focus on using a simple

modellike (2.11)for this thesis.

Noticehow (2.11)is very similar to (2.4),sincethey bothexpressthelocationandorientationof

onecameracoordinatesystemwith respectto another. As mentionedpreviously, theadvantage

of structure-from-stereois that theparametersZ()�v*#� areknown a priori . On theotherhand,the

relative motion parametersZd¡�Z "� may not be known aheadof time, andcanonly be estimated

from visualchangesin thecapturedimages.Hencemotionestimationis oftena complementary

problemto estimatingstructurefrom amonocularimagesequence.

The two commonapproachesto motion andstructureestimationareoptical flow andfeature-

basedmethods.Thenext two sectionswill giveanoverview of thebasicconceptsthatarerelevant

to this thesis.


2.5.2 Motion and Structure From Optical Flow

Whenacamerais moving with respectto anobject,theapparentchangein theimagepositionof

aprojectedpoint, <¤Z�� , canbeexpressedasa two dimensionalvelocityvector

¥ � ¢��¦x¢_�¢��¦x¢��

A collectionof thevelocityvectorsfor differentpointsin thesceneformsthemotionfield [2]. An

exampleof themotionfield for amoving planeis illustratedin Figure2.9.Analysisof themotion

field inducedby two consecutive imagescanbeusedto estimatestructureof thescene.Without

actuallyknowing theunderlyingmotion,themotionfield ontheimagescannotbeknown exactly

but hasto beestimated.Theestimateis referredto asoptical flow [44, 45, 46].

Many differentialtechniquesfor estimatingmotionfield havebeenproposedin thepast[44, 45];

they examinethetemporalchangesin thebrightnesspatternof images.Let §¨��Q�� betheimage

brightnesspattern,or thelight intensityat thepoint ��Q�� of theimageplaneat time � . Thenthe

first orderapproximationof its changeover time is

¢�§¢��ª© §© � ¢��¢��

8 © §© � ¢��¢��8 © §© � :

Oneassumptioncritical to theestimationof opticalflow is thatundertheLambertianmodelof

lighting, thebrightnessof apoint in thesceneremainsconstantover time. Theresultis theimage

brightnessconstancy equation[1]:

¢�§¢_�� N �

which impliesthat

© §© � ¢_�¢_�8 © §© � ¢_�¢��

8 © §© �� N

or,

Z«7§¨� 1 ¥ 8 §x¬ � N (2.12)


where «7§ is thefirst orderspatialgradientof § , and§x¬ is thetemporalderivative.

Since(2.12)containsonly oneindependentvariable,§ , andthemotionfield ¥ hastwo compo-

nents,¥ cannotbeestimatedusing§ atasinglescenepoint. Additionalconstraintsarenecessary.

For example,Horn andSchunck[44] asserta smoothnessconstraint,which makestheassump-

tion that themotionfield variessmoothlyalmosteverywherein the image.Mayhew andFrisby

[5] lists several constraintsproposedby otherresearchers,suchasthe assumptionsthatmotion

field is constantor varieslinearlyoveraregionof theimage,andthatthesecondorderderivatives

of § arealsoconstant.

The estimationof the motion field is analogousto solving the featurecorrespondenceproblem

betweentwo imageframes,becausefor eachpoint < Y Z�� onanimage,theestimatedmotionfield

providesanestimateof thelocationof < Y Z� 8 � � . However, sincethethree-dimensionalmotion

parametersareunknown, motionhasto beestimatedat thesametime, resultingin anestimation

problemwith moreunknownsandlessconstraintsthanthestructure-from-stereoproblem.

Usingtheperspective projectionequationsin (2.1),theapparentmotionof apointona2D image

canbeexpressedasa functionof thepoint’s actualthree-dimensionalinstantaneousvelocity. In

components,

� C � �� { ��

� E � �� { � �� : (2.13)

By substituting(2.10)andconsequently(2.1) into (2.13),two equationscanbederived:

� C � B C � ��®� { �� 8 B C � �¯®� ®�

��{ B C � ��®�

��{ �� 8 B C � ��®�

� E � B E �� ®� { ��

{ B C � ��®� ®��

8 B C � �)®� ��

8 �� °�±{ B²C �u� ®� (2.14)

where

®� � � { � � ®� � � { � :

Additional constraintssuchasobjectsurfacesmoothnessandrestrictedmotion arerequiredto


direction of spatial gradient

vv

Figure 2.10: Demonstrationof the apertureproblem. The left figure illustratesthat only thecomponentof themotionfield in thedirectionof thespatialintensitygradientcanbeestimated.Theright oneshows thetruemotionfield inducedby themotionof theline.

solve for thesevenunknown parameters,X � � � � � � � � � � � � � � � � � ��7[ , thuscompletingthemo-

tion andstructureestimationproblem. The formulationof the solutionmay becomequite in-

volved so it is omittedhere. The interestedreaderis referredto [47] for a survey of existing

techniques.

Two interestingobservationsfrom(2.14)arethatthemotionfield isasumof twocomponents,one

dependingon translationonly andtheotheron rotationonly. Furthermore,only thetranslational

componentcontainsdepthinformation[1]. Thesetwo observationscanbeusedto determinethe

limitationsof usingopticalflow for motionandstructurerecovery.

Oneof theadvantagesof usingopticalflow for 3D reconstructionis that it doesnot necessarily

requirefeatureextraction. In other words, a velocity vector for every point in an imagecan

beestimatedusingspatialandtemporalderivativesof theimagebrightnesspattern.This is often

usefulin reconstructingadensesurfacemodelof ascene.However, onelimitation of using(2.12)

to estimatemotion field is the apertureproblemand is bestdemonstratedusing an example.

In Figure 2.10, only the componentof the motion field in the direction of the spatial image

gradientcanbe determined.In addition,optical flow is computedunderthe assumptionof the

Lambertianreflectancemodel. Underconditionssuchasextremelighting andhighly specular

surfaces,this assumptionmay not be reasonable.It hasbeenshown in [46] that even under

Lambertianreflectance,optical flow determinedfrom (2.12) is equivalent to motion field only

for puretranslationalmotionor any rigid motion in which theangularvelocity is parallelto the

illumination direction. Moreover, the assumptionaboutsmoothsurfacesrendersoptical flow

techniquesincapableof handlingoccludingboundariesverywell withoutapreprocessingstepof

imagesegmentation.


2.5.3 Motion and Structure From Point Features

Thesecondcategoryof structure-from-motionmethodsis feature-based.Similar to feature-based

stereomatching,discretefeaturesin amotionsequencehave to beextractedfrom theimagesand

thecorrespondenceproblemhasto solvedexplicitly. In structure-from-motion,featurematching

is to establishtemporalcorrespondingpairs X�< Y Z��Z�Z< Y Z� 8 � �Z[ . Again, for the time being,

we will assumethat featurecorrespondenceshave alreadybeenestablishedandfirst discussthe

structureestimationaspectof thereconstructionproblem.

Sincethe recovery of structureis coupledwith motion estimation,therearetwo differentcate-

goriesof algorithmsdependingon whetherstructureor motion is determinedfirst. An example

of a “structurefirst” algorithm,ascited in [7], usesrigidity constraints.For a rigid body, the

distancebetweentwo points,+ Y Z�� and +�³vZ�� , doesnot changeover time from oneimageframe

to another, which impliesthat

+ Y Z�� { +�³�Z�� 1 + Y Z�� { +�³vZ�� + Y Z� 8 � � { + ³ Z� 8 � � 1 + Y Z� 8 � � { + ³ Z� 8 � � (2.15)

Using theprojective relationshipsin (2.1) andtheknown imagecoordinatesof < Y Z�� , < ³ Z�� ,< Y Z� 8 � � and < ³ Z� 8 � � , (2.15)canbeexpressedin termsof thefour unknown � values,each

of X�+�Y�Z��Z�Z+ ³ Z��Z�Z+�Y´Z� 8 � �Z�Z+ ³ Z� 8 � �Z[ . With five pairsof correspondingpoints,thereareten

unknownsandninehomogeneousequations.Iterativealgorithmscanbeusedto obtainasolution

within ascalefactor.

Linear, “motion first” algorithmsusesa relationshipanalogousto thefundamentalmatrix theory

in (2.9) to estimatetheunknown motionparametersprior to recovering thestructure.Again, letµbe an antisymmetricmatrix suchthat

µ � � �&�� for all 3D vectors� , the %¶&·% essential

matrix [2], � , is definedas

� � µ d¡�andit satisfiestherelationship

+ Y Z� 8 � � 1 �¯+ Y Z�� N �


aswell as

o< Y Z� 8 � � 1 � I y 1b � < Y Z�� I y�za � N (2.16)

Thenineunknown parametersin � arethenestimatedby usingeightpairsof imagepoint corre-

spondences.Consequently, themotionparametersd and� , aswell asthedepthvaluesof thepoint

features,canberecoveredwithin ascalefactor. Referto [8, 7] for thedetailsof thealgorithmand

asurvey of otherfeature-basedtechniques.

2.5.4 Long Image Sequences

Onecommonaspectbetweenoptical flow andfeature-basedtechniquesfor recovering motion

andstructureasdiscussedso far is that only two imageframesareconsidered.As mentioned

earlier, the baselinebetweenthe camerasin a monocularsequenceis relatively small. The re-

sultingmotionandstructureestimatesfrom two views alonemaybevery sensitive to noiseand

inaccurate.Thereforesomeof theresearchin this areaconcentrateson erroranalysisandusing

morethantwo imagesfrom amonocularsequenceto improve robustness[9, 48, 49].

Anotherdirectionof researchis to usealongimagesequence;generallytherearetwo approaches:

batchandrecursive. Thebatchapproachassumesthatall theimagesof avery long sequenceare

readilyavailableat oncesothatmoredatais available,reducingtheeffectsof noiseandoutliers

on themotionandstructureestimates.An exampleis thefactorizationmethodin [21] andother

relatedwork [23, 50]. Therecursive approachfocuseson iteratively refiningtheaccuracy of ini-

tial motion andstructureestimatesasmoreimageframesareavailable. Someexamplesin this

category are[51, 52, 53, 54]. Amongthoseusingthelatterapproach,theapplicationof Kalman

filtering [55] for dynamicstateestimationis commonbecauseit explicitly incorporatesan un-

certaintymodelandintegratesnew measurements(e.g. positionsof featurepoints)to iteratively

refinecurrentestimatesof structureandmotion.

Whenprocessinglong imagesequences,theproblemof featurecorrespondenceis no longerlim-

ited to finding matchesbetweena singlepair of imageframes,but is to locatethesamefeatures

over many frames. The small camerabaselinein a motion sequenceallows featureson an im-

ageto be tracked over time becausetheir appearanceaswell astheir positionson the imagedo


not changevery muchfrom oneframeto thenext. Featurepoint trackingalgorithmsmayutilise

opticalflow techniques,suchasthosein [56, 57], to computevelocityfieldsusingspatialandtem-

poral imagegradients,or mayapplya smoothnessconstrainton themotion to establishsmooth

trajectoriesassumingthatall theimageframesareavailableat once[58, 59]. Techniquesdevel-

opedfor trackingmultiple targetsin radarimagery[60] have alsobeenextendedsuccessfullyto

establishmotioncorrespondencesin imagesequences[61, 62, 63, 64].

2.5.5 Advantagesand Disadvantages

Comparedto stereocorrespondence,the problemof motion correspondenceis easierto solve.

Tracking techniquescanmuch more reliably establishfeaturematchesbetweenimageframes

thanstereomatching.Unfortunately, thestructureestimatedusingtwo framesaloneis verynoise

sensitive and inaccurate.Batch processingof long sequencescan combatthis problembut it

requiresthat all of the imagesbe availableat the time of processing.This may not be an ideal

solutionfor thefollowing reasons:

1. Assumptionslike constantmotionmaybeviolatedoversucha longsequence;

2. Batchprocessingimposesheavy datastoragerequirementsfor retaininginformationon

many images;

3. Motion andstructurecanonly beestimatedafterall theimagesareavailable.

For thesereasons,theideaof recursively refiningmotionandstructureestimatesasmoreimages

becomeavailableseemsto be the mostviable solution. However, thereremainsthe drawback

that 3D reconstructionfrom motion without knowing the magnitudeof the relative translation

betweenthecameraandthescene,thedepthsof objectpointscannotbedeterminedexactly, but

only within a global scalefactor. For example,if oneobject is two timesasfar asanother, but

twiceasbig, andit is translatedat twice thespeed,theresultingimagesof thetwo objectswould

be exactly the same. This characteristicmay not be problematicfor an applicationlike object

recognition,but would definitelybea concernfor applicationsin which theabsolutelocationof

objectsrelative to thecamerais desired,suchasthecomputervisionguidedgraspingof satellites

in outerspace.


2.6 Structure fr om StereoImage Sequences

Theshortcomingsof bothstructure-from-stereoandstructure-from-motiontechniquesfor 3D re-

construction,asdescribedin the previous sections,have motivateda new directionof research

for the integrationof bothstereoandmotion informationin developing3D reconstructionalgo-

rithms.While featurecorrespondenceis adifficult taskfor stereo,it is relatively easyfor motion,

but stereoprovidesbetterstructureestimates.Theadvantageof integratingbothstereoandmo-

tion is thatthey cancomplementeachotherto overcometheir individual weaknesses.

A commonimageacquisitionsystemfor this approachincludesa setof stereocamerasmounted

onaplatformor roboticarm.Eitherthecamerasor theobjectsin thesceneor bothundergo some

form of motion. This set-upprovides a sequenceof stereoimagepairs, X �Za Z��Z� �Zb Z��Z[ , that

vary over time. Many of theproposedmethodsin this areaincorporatevariouscombinationsof

existing techniquesfrom bothof thetraditionalapproachesof stereoandmotion.Thedifferences

amongthemgenerallydependon thefollowing factors:

1. thedegreeto which theuseof themotionassistsin stereomatching,

2. theway in which thisassistanceis providedif any, and

3. how theuseof thestereosequenceimprovesstructureestimates.

Wewill discusstheseaspectsamongthreebroadcategoriesof pastresearchin thisarea.

2.6.1 AssumedFeature Corr espondences

Someof thework on usingstereoimagesequencesassumethataccuratestereocorrespondences

have alreadybeenestablishedby someexternalprocess.The focusof this groupof work is to

usetheaddedmotioninformationto improve or refinemotionanddepthestimates.Like similar

work doneusingmonocularsequences,MatthiesandKanade[65] employ the Kalmanfilter to

recursively refinestructureestimatesusingknown motion. Accuratestereocorrespondencesor

a 3D representationof the sceneareassumedto be available initially. The differencebetween

this andthemonocularsequenceapproachis that 2D measurementsarenow madeon both the

left andright imagestreams.Motion correspondencesareestablishedimplicitly throughKalman

filter tracking.Detailson theformulationwill beprovidedin Chapter3 of this thesis.


AyacheandFaugeras[66] appliedthesametechniques,payingspecialattentionto how depthun-

certaintiesof differentgeometricfeaturesarepropagated.In additionto refiningdepthestimates,

they alsodiscussaboutrefiningmotionestimates,by using3D pointcorrespondencesestablished

betweentwo pairsof stereoframes.YoungandChellappa[67] takeanextrastepandassumethat

bothstereoandmotioncorrespondencesareestablished,suchthat themeasurementsarereadily

3D featurepoints.Thefocusof their work is to iteratively refinemotionanddepthestimates.

Someother researchers,like Navab et. al. [10], aremainly interestedin the useof combining

stereoandmotionto assistin motionestimation,since3D point or line motioncorrespondences

provide moreaccurate,uniquemotionestimates.

2.6.2 Dir ectEstimation or Inference

In this classof 3D reconstructionalgorithms,theadditionalconstraintsprovidedby bothmotion

andstereoareuseddirectly to computeor infer thelocationof stereocorrespondences.

Onecommontechniqueis to applyopticalflow to stereoimagesequences.In Section2.5.2,we

have seenhow optical flow techniquescanbe usedto estimateapparentmotion and therefore

the structureof a sceneor object from a pair of imagesin a motion sequence.Shi et. al. [68,

11, 69]have extendedthe imagebrightnessconstancy constraintto stereoimagepairs,andthey

referredto this as unifiedoptical flow field (UOFF). In summary, they assumethat the image

brightnessof a point in thescenenot only remainsconstantover time, it alsoremainsconstant

from onecameraviewpoint to another. Theimagebrightnesspatternin this formulationbecomes

a functionof four parameters:

§ � §!��L��]$�Z�where� and � in this caserefersto the imagelocationof a point in thescene,which in turn are

functionsof � , time, and ] , the viewpoint. Using c and d asthe viewpoints, brightnesstime

invarianceimpliesthat

§¨��Q��L��c��Z��ç�Z��ç� � §¨��Q�� 8 � ��ç�Z�� 8 � ��ç�Z�� 8 � ��c��Z� (2.17)


andbrightnessspaceinvarianceimpliesthat

§¨��Q��L��c��Z��ç�Z��ç� � §¨��Q��L��d¡�Z��d¡�Z��L��d¡�Z: (2.18)

Combining(2.17)and(2.18)givesthetime andspaceinvarianceconstraint:

§¨��ç�Z��L��c��Z��L��c�� §¨�� 8 � ��d¡�Z�� 8 � ��d¡�Z�� 8 � ��d¡�Usingthis constraint,opticalflow quantitiesarecomputedacrossbothtime andviewpoint using

any establishedoptical flow techniquessuchas thosementionedin Section2.5.2. Along with

the optical flow quantitiescomputedusing (2.17) and (2.18), the motion andstructurecanbe

estimatedfrom asystemof equations.

SteinandShashua[12] usethe samebrightnessconstancy assumptionto first establishcorre-

spondencesfor both point and line features,andthenappliedrigidity andepipolarconstraints

to estimatemotionandstructure.Realisingthat this is a strongassumptionfor imagescaptured

from largely varyingviewpoints,they usedacoarseto fineapproachto processtheimages.

Another interestingexamplewhich directly infers stereocorrespondencesfrom imagedatais

[13], andit is built on thefactorizationmethod[21] for processingmonocularsequences.Stereo

geometryis addedto the formulationof the problem. The methodstill hasa batchapproach,

requiringall imagesto beavailableatthetimeof processing,but theauthorsshow thatthenumber

of framesrequiredfor reasonablyaccuratestructureestimatesusingthestereo-motionapproach

is muchlessthanthatof usingmotionalone.

2.6.3 ConstrainedMatching

In both traditional structure-from-stereoand structure-from-motiontechniques,featurecorre-

spondencesestablishedbetweena singlepair of stereoor temporalimagesarenot necessarily

correct.A commonformulationfor constrainedmatchingis to useamodelconsistingof thefour

imageframes,X �Za Z��Z� �Zb Z��Z� �Za Z� 8 � �Z� �Zb Z� 8 � �Z[ , asin Figure2.11.Thestructuralinformation

derivedfrom any combinationof thesefour imagesshouldbeconsistent.Thisconsistency canbe

usedto bootstrapthe featurematchingprocess,or provide additionalstructuralinformationnot

availablein asinglepairof stereoimages.


I (f+1)L I (f+1)R

I (f)L I (f)R

Temporal match¹

Stereo matchº

Stereo matchº

Temporal match¹

Figure2.11:Thefour-framemodelfor stereoimagesequenceprocessing.All four setsof stereoandtemporalmatchesareconsistentif all the2D featurepointsareimageprojectionsof thesamereal3D point.

For example,Chebaroet.al. [14] first usetraditionalmatchingmethodsto find four setsof feature

correspondences:two stereoandtwo temporal,basedon line segmentsandplanarregions.Using

thefour-framemodel,theconsistency of thesefour setsof matchesarechecked. If thereis any

inconsistency, temporalmatchesarefavoredandtheconflictingstereomatchesarerejected.

[15] formulatesstereoandtemporalfeaturematchinginto a high dimensionalgraphmatching

problem.Onefeaturefrom eachof thefour imagescorrespondto anodein afour-nodegraphand

theedgesbetweeneachpair of nodeshave weightsreflectingthesimilarity betweenthefeatures

associatedwith thenodes.Optimalmatchesarefoundby usinga greedy-typesearchalgorithm

basedon maximizingan objective function. Similarly, [16] associatesa probability valuewith

eachpair of matchingcandidates.Global consistency is enforcedby examiningall candidate

pairsandapplyingrelaxationlabelling.Theapproachof thesetwo methodsaresomewhatsimilar

to traditionalstereomatchingmethodsexceptthat two pairsof stereoimagesareusedinsteadof

one,andtemporalconsistency becomesanadditionalconstraint.

Someother techniquesexplicitly usemotion informationto reject falsestereomatchesfrom a

numberof candidatesor simply to confirmthevalidity of astereomatch,or vice versa.An early

exampleis whatWaxmanandDuncanreferto asbinocularimageflows[70]. An importantresult

of this work is relativeflow, or binoculardifferenceflow. Opticalflow for boththeleft andright

imagesareestimatedindependentlyto establishtwo separateflow fields. Assuminga parallel

cameraconfigurationasin Figure2.6, the imagevelocitiesof correspondingfeaturesin thetwo


camerasaredenotedby ¥ t �� t �� t � and ¥ w �� t 8¼» �� t � , where»

is thedisparity. Relative flow

is thenthedifferencebetweenthesetwo quantities,thatis,

½ ¥ t6��St¾��¨t6� » � � ¥ w!��Qt 8F» ��¨t¿� { ¥ t6��Qt��¨t¾�andcanbeexpressedasa functionof therigid motionparametersof thecamera,

½ ¥ t6��Qt¾��¨t�� » � � � � � � » 8 ��¨t � � { �St � � � » �½ � t6��Qt¾��¨t�À » � � N : (2.19)

The authorsdemonstratedthat by using theseconstraints,only two componentsof the overall

motion facilitatein stereomatching:� �

and� �

. For a featurelocatedat ��Qt��¨t6� , the correct

matchin� b

shouldsatisfytherelationshipsin (2.19),therefore(2.19)canbeusedto estimate»

directly or to rejectunlikely candidates.Oneassumptionusedin theexperimentsshown is that

themotionof thecamerais known.

Anothermethodusingopticalflow is [17]. Thealgorithmfirst determinesthedepthsof detected

featurepointsby usingoptical flow techniqueson bothof the left andright imagestreamssep-

arately, obtainingtheestimatedquantities� a and � b . Thenfeaturepointsin the left andright

imagesarematchedagainsteachother. For a stereomatchto be correct,the calculateddepth

from thedisparityvaluemustbeconsistentwith thevaluesestimatedby theleft andright optical

flow fields,thatis, assertingthat � a � � b � �ÂÁ .

Matthiesusesa three-framemodel [18]. A densedepthmap is first developedusing closely

sampledimages,suchas�Za Z�� and

�Za Z� 8 � � . This depthmap is thenusedto constrainthe

determinationof the disparitymap between�Za Z� 8 � � and

�Zb Z� 8 � � . Matching is doneby

correlationandlimited a priori motioninformationis available.

Both [19] and[20] arerecursive algorithmsthatusemultiple hypothesistestingtechniquesand

motionconstraintsto validatestereomatchingcandidates.JenkinandTsotsos[19] initialise their

algorithmby assumingasetof accuratestereocorrespondencesin thefirst imagepair, andstereo

matchingin the future framesareconstrainedby the predictedlocationsof theseinitial feature

points.By usingmultiplehypothesistracking,theresolutionof ambiguitiesin temporalmatching

is delayed.Yi andOh [20], on theotherhand,doesnotassumeaccuratestereocorrespondences.

Virtual 3D tokensaregeneratedfrom pairsof stereomatchingcandidatesandKalmanfiltering


is usedto predict their locationsthroughthe stereosequence.The motion of incorrectstereo

matcheswill not conformto the predictedpathsandhencecanbe rejected. In this work, it is

assumedthat the sceneis composedof objectsmoving in differentdirectionsand they are far

enoughfrom the camerasthat they arerepresentedas individual point featureson the images.

Their motion is approximatedby puretranslation,therefore,it is simpleenoughto includethe

motion parametersin the statevectorto be estimated.This formulationhasthe advantagethat

motionestimatescanbeautomaticallyupdatedby theKalmanfilter.

2.6.4 Summary

The use of a stereoimage sequencefor 3D reconstructionprovides the advantagesof both

structure-from-stereoandstructure-from-motiontechniques.Wehaveseenmany innovativeideas

onhow stereo-motioncanbeintegratedto assistin thetask;however, noneof themfully satisfies

thespecificdemandsof theproblemwe aretrying to solve.

Somemethodssimply assumethat the featurecorrespondenceis solved. However, for a com-

plete3D reconstructionsolution,it is insufficient to assertthisassumption.Theinherentproblem

in applyingoptical flow techniquesfor stereocorrespondenceis that the imagebrightnesscon-

stancy assumptionmost likely doesnot hold in the lighting conditionsin space. As we have

seenin Figure1.1, the changesin the lighting andshadows causethe samephysicalpoints to

look very differentfrom frameto frame.For a free-floatingobjectin space,thesceneis dynamic

andmotion is involved. The problemof usingbatchprocessingof the stereosequencein this

context is thatanupdated3D representationwould only beavailableat theendof a long period.

However, for suchtasksasvision-guidedposeestimation[71, 72], a recursive approachwould

bemoreappropriate.Mostconstrainedmatchingmethodsuseafeature-basedapproach,andthey

explicitly takeadvantageof bothstereoandmotioninformationto constrainthefeaturematching

problem. This is a promisingdirection,but thereis a lack of existing methodsthataddressthe

specificissuesof reconstructingasinglerigid bodyundergoingunknown motion.

All theseshortcomingssuggestthat thereis a needfor finding a new, unified framework for an

algorithmthatwould take advantageof bothstereoandmotioninformation.

Chapter 3

Incr emental3D Reconstruction

In this chapter, we definein detailsthespecific3D reconstructionproblemthatwe areinterested

in solving,andidentify theaspectsof theproblemthat this thesiswill examine.An incremental

3D reconstructionalgorithmthat incorporatesboth pastandnew researchideasfor solving this

problemis proposed.Thebasiccomponentsof thealgorithmwill bediscussed.

3.1 ProblemDefinition

The goalof the researchin this thesis,asmentionedin Chapter1, is to develop an algorithmic

framework for recoveringdepthinformationaboutobjectsin ascenefrom 2D digital images.The

3D informationmaybeusedwith theresultsof othercomputervision tasksfor applicationssuch

astheaerospaceapplicationin whichMDR holdsinterest.

After examiningthedifferentaspectsof theproblemandthepastresearchin Chapter2 on3D re-

construction,ourresearchwill focusondevelopinganincremental3D reconstructionalgorithm

with thefollowing characteristics:

� thereconstructionis basedon extractedfeatures;

� astereoimagesequenceis used;

36

CHAPTER3. INCREMENTAL 3D RECONSTRUCTION 37

� featurematchingis addressedexplicitly;

� thesystemacquiresanincrementallydenseandaccuraterepresentationof thereconstructed

objectby bootstrappingfeaturematchingandmotionestimation.

Wewill now discusseachaspectin moredetailbelow:

Feature-based: As mentionedin Chapter1, the environmentof outer spaceimposesspecific

challengesfor 3D reconstructionalgorithms.For example,theextremelighting conditions

renderinappropriatetechniquessuchasopticalflow or any methodsthathighly rely on the

assumptionof imagebrightnessconstancy. Theuseof extractedfeaturesmayalleviatethis

problemandis themoresensiblein orderfor thework of this thesisto beapplicablein the

futureto spaceapplications.

Stereoimagesequence:Wehaveseenin Section2.6thattheuseof astereosequencecanover-

comethe individual weaknessesof eitherusingstereoor motionmethodsalone.This is a

worthwhiledirectionto pursuein ourown investigation.

Feature matching: The featurecorrespondenceproblem,definedin Section2.2, is the most

challengingproblemfor 3D reconstruction,andstill remainsto besolved. We would like

to furtherexplorethisproblemin thecontext of astereoimagesequence.

Incremental: The quality of the imagesavailable at any short periodof time may not afford

a goodreconstructiondue to the lack of pertinentvisual informationor the difficulty of

featureextractionin somecircumstances.Along with the difficulty of the featurecorre-

spondenceproblemitself, sometimestheamountof depthinformationavailablefrom afew

framesis not sufficient for providing overall structureof objects.Theability to integrate

structuralinformationover a long sequenceof stereoimageswould be a desiredcharac-

teristic of the algorithm. Using this approach,it would alsobe unnecessaryto solve the

problemof stereocorrespondencefully at any time frame,asambiguitiescanberesolved

over time.

Oneof theareasin which a well definedframework seemsto be lacking in pastresearch

is on how a stereoimagesequencecanbe usedefficaciouslyto bootstrapfeaturematch-

ing andmotion estimationsimultaneouslywhenboth structureandmotion areunknown.

Therefore,this thesiswill primarily focuson thisaspect.


Beforewediscussthedetailsof theproposedalgorithm,thescopeandassumptionsof thecurrent

researchwill beoutlinedasfollows:

Features: In Section2.3, the advantagesof usingpoint featuresover otherssuchas lines and

ellipsesarediscussed.Becauseof thoseadvantages,our researchwill focusonusingpoint

featuresandreconstructing3D points.However, no restrictionsareimposedonthekind of

point featuresthatwe use.

Object representation: Theproblemof surfaceinterpolationfrom asetof scattereddatapoints

is a completelydifferentresearchtopic on its own, thereforeit is outsidethescopeof this

thesis.We will limit ourselvesto reconstructing3D pointsthatcorrespondto geometrical

or textural featureson a rigid objectin thescene.We will alsonot concernourselveswith

therelationshipsamongthesepointsfrom apoint-patternperspective.

Motion: In order to keepour motion model as simple as that describedin Section2.5, it is

assumedthat thereis a single,unknown rigid relative motionbetweenthecameraandthe

object. Themotion is eitherconstantor varyingslowly sothat theeffectsof time-varying

motion are not realizedover short sequenceof frames. The motion must also follow a

smoothtrajectory.

The reasonwhy the proposedalgorithmwill not considerwhat kind of point featuresareused

is becauseour matchingalgorithmdoesnot rely on any visual informationor specificattributes

aboutthesefeatures. We are interestedin investigatingfeaturematchingbasedsolely on the

locationsof the features,andthe rigidity andrelative motion of the object. The advantagesof

thispostulationis thatwedonothave to beconcernedaboutwhatinformationis providedby the

specificfeatureextractorwe use,or whatlocal matchingalgorithmwe shoulduse.

3.2 Overview of the Incr ementalReconstructionAlgorithm

As mentionedin Chapter2, theminimal changesbetweensuccessive imagesin a monocularse-

quencepermittemporalfeaturecorrespondencesto beestablishedeffectively by meansof feature

tracking[56, 57, 61, 62, 63, 64]. In theproposedalgorithm,we will usewell establishedtech-

niquesto performthis task. However, sincewe arealso interestedin taking advantageof the

stereoinformationprovidedby astereosequence,thefeaturetrackingalgorithmwill bemodified


initialcorrespondences

tighterconstraints

initial motion estimates

additionalfeature points

morecorrespondences

more accurate motion

Figure3.1: Theiterative processbetweenmotionestimationandreconstruction.Initially unam-biguousstereomatchesareusedfor estimatingmotionparameters,which thenfurtherconstrainstereomatching. Increasednumberof 3D point correspondencesin turn improvesmotion esti-mation.

from theabovemonocularapproaches,asin [20], to establishaccuratestereocorrespondencesas

well.

3D reconstructionin the proposedalgorithmis an iterative processbetweenmotion estimation

andstereofeaturematchingasillustratedin Figure3.1. Sincewe do not know the relative mo-

tion betweenthe camerasandthe objectinitially, featuretrackingcanbe inaccurate,andmany

stereomatchingambiguitiesarise. However, if we canobtaininitial estimatesof motion,more

constraintscanbeappliedto featurematching,which in turn providesmorefeaturepoint corre-

spondencesfor moreaccuratemotionestimation.

Thebasicstepsof thealgorithmareasfollow:

1. At systemstart-up,stereomatchingcandidatesare identified in the first pair of images�Za � � and�Zb � � .

2. Theextracted2D featurepointsfrom the imagesthat form thefirst setof stereomatching

candidatesaretrackedindependentlyin theleft andright imagestreams.Sinceat this time,

3D motionparametersareunknown, motionof individual pointsis implicitly estimatedby

fitting their pastlocationsto asecondorderpolynomial.

3. Trackingandstereomatchingin thenext framesaredoneusingmultiplehypothesistesting.

4. Someof thefeaturepointsateachframemayhaveonly onestereomatchingcandidatewith

no ambiguities.Thesepointswill bereconstructedandform partof theobjectrepresenta-

tion.


5. If thepair of stereocorrespondencesalsohave unambiguoustemporalmatchesacrossthe

next frame,two setsof 3D points, X�+ aY Z��Z[ and X�+ aY Z� 8 � �Z[ , canbereconstructed.These

two setsof pointsprovide aninitial estimateof therigid motion.

6. The rigid motion parametersareusedin turn to further constrainthe trackingandstereo

matchingprocess,andthe procedureiteratesfor eachpair of stereoimageframesin the

sequence.

This algorithm can theoreticallyprocessan infinitely long sequenceof stereoimageswithout

additionaldatastoragerequirements,becauseall of theimagesandextractedfeaturesdonothave

to bestoredthroughoutthelengthof thesequence.Old informationis graduallydiscardedasnew

informationis incorporatedincrementallyinto asinglecurrentrepresentationof thereconstructed

object.

Figure3.2presentsa simplifiedflow chartof theoverall algorithm.Thedetailsof theindividual

stepswill be discussedin the remainderof this chapter. We have alreadyspecifiedin Chap-

ter 2 that we will be reconstructing3D pointsusingthe left CCS.Therefore,we will drop the

superscriptc in any referenceto + for simplicity.

3.3 Two DimensionalFeatureTracking

Oneimportantcomponentof the incrementalalgorithmis featuretracking. In the proposedal-

gorithm,thefirst pair of imageframesis usedto establishstereomatchingcandidates.Although

eachpairof matchingcandidatesrepresentahypothesized3D point,we cannottrackthe2D fea-

turesusinga singleconsistentmotionbecausethe3D motionparameters,d and , areunknown

at thispoint; thereforethe2D featureshave to betrackedindividually oneachof theleft andright

imagestreams.This situationis depictedin Figure3.3,which shows that thefeaturesin left and

right imagesaretracked usingtwo separatesetsof dynamicsthat do not have to be consistent

with asinglerigid motion.

Sincewe assumedthattherelative motionbetweenthecamerasandtheobjectfollows a smooth

trajectory, we canapply the Kalmanfilter [55] to performfeaturetracking. The Kalmanfilter

providesanoptimalsolutionin theleastsquaressensefor dynamicstateestimationproblems.The

stateto beestimatedis thelocationof each2D featurepoint. TheKalmanfilter incorporatesthe


2D left image features 2D right image features

Stereo matching hypotheses

Featuretracking

Featuretracking

Validated stereocorrespondences

Validated motioncorrespondences

Motion estimation

Motion parameters

3D reconstruction

Multiple hypothesistracking and stereo matching

3D structurerepresentation

Figure3.2: Flowchartof theincrementalreconstructionalgorithm

epipolarconstraint

2D dynamicsÃ

2D dynamicsÃ

epipolarconstraint

I (f+1)L I (f+1)R

I (f)L I (f)R

Figure3.3: Constraintsin 2D featuretracking.


new measuredlocationof apointateachframewith its previousestimatedlocationto recursively

updateandrefinetheestimate.We now formulatetheproblemin moredetail for the left image

stream.Similar analysesapplyto theright imagestream.

3.3.1 Motion and MeasurementModels

Under2D featuretracking,themotion of individual pointsis estimatedimplicitly. A trajectory

of a 2D point is interpolatedby fitting a secondorderpolynomialon a curve controlledby the

pastlocationsof thepoint. This is analogousto estimatingtheapparentvelocityandacceleration

of thepoint. The following zero,first, andsecondordermotionmodelsareassumedfor the2D

projectionof a3D point, + Y Z�� on theleft imageframe:

zero: < aY Z� 8 � � � < aY Z�� 8 �ÄZ��oÅ`Z�� if � �� first: < aYÆZ� 8 � � � V�< aYÆZ�� { < aYÂZ� {F� � 8 �ÄZ��oÅ`Z�� if � � Vx�second: < aYÆZ� 8 � � � %�< aYÆZ�� { %�< aYÂZ� {·� � 8 < aYÆZ� { Vx� 8 �ÄZ��oÅ`Z�� if ��Ç�%x� (3.1)

whereÅ¯Z�� is zero-meanGaussianwhitenoisewith covarianceÈ �É� . �ÄZ�� representstheerror

in themotionmodels.

Sinceno featureextractoris perfect,theactualextractedfeaturelocationmaynot exactly corre-

spondto theprojectedlocationof thepoint + Y Z�� . Theactualdetectedlocation,or themeasure-

ment,is denotedby Ê aY Z�� , andis definedas:

Ê aYÂZ�� < aYÆZ�� 8 ¥ Z�� ^ - c��Z+ Y Z��0 8 ¥ Z��where ¥ Z�� is Gaussianwhite noisewith zeromeanandcovariance

�, representingthe mea-

surementerror. Thenatureof themeasurementerrordependson thefeatureextractor. Themost

commonsourceof erroris theeffectof quantizationwhenpoint featurescannotbelocalisedwith

subpixel accuracy. In thiscase,aGaussianrandomvariablewouldbesufficient for modellingthe

error. However, someextractorsmayhaveacertainbiasthatresultin featuresthathaveconsistent

offset from their true locations.A morecomprehensive studyof the featureextractorwould be

necessaryin orderto modelthis kind of error, which is outsidethescopeof this thesis.


3.3.2 Prediction and Update

Featuretrackingusingthe Kalmanfilter involves two stages:statepredictionandupdate.The

stateto beestimatedis the2D projectionof a point, < aY Z�� . Thefirst stepis to predictwherea

featuremaybe in thenext framegiven its currentestimatedpositionandmotion. If a featureis

detectedwithin asmallsearchregion,thenthenew featureis assumedto originatefrom thesame

point,andthenew positionis incorporatedinto thestate.

Usingthedynamicmodelin (3.1),givenestimatedlocationsof afeaturepoint in previousframes,

thepredictedprojectionat frame � 8 � is

}< aYÆZ� 8 �¿Ë �� }< aYÂZ� Ë �� if � �/� �}< aY Z� 8 �¿Ë �� V¶}< aY Z� Ë �� { }< aY Z� {F�¿Ë � {F� � if � � Vx�}< aY Z� 8 �¿Ë �� %¶}< aY Z� Ë �� { %¶}< aY Z� {F�¿Ë � {·� � 8 }< aY Z� { V Ë � { Vx� if ��Ç/%x�

with anerrorcovarianceof Ì Y Z� 8 �¿Ë �� , definedas

Ì aY Z� 8 �¿Ë �� Ì aY Z� Ë �� 8 �ÄZ��Z�ÄZ�� 1 if � �� Ì aY¯Z� 8 �¿Ë �� ÎÍ Ì aY'Z� Ë �� { Ì aY'Z� {9�¿Ë � {F� � 8 �ÄZ��Z�ÄZ�� 1 if � � Vx�Ì aY¯Z� 8 �¿Ë �� Ï Ì aY'Z� Ë �� {FÏ Ì aY)Z� {9�¿Ë � {F� � 8 Ì aY¯Z� { V Ë � { Vx�8 �ÄZ��Z�ÄZ�� 1 if ��Ç�%x:

Thepredictedlocationof theextractedfeatureis simply

}Ê aYÆZ� 8 �¿Ë �� }< aYÂZ� 8 �¿Ë ��Z:

Thepredictionerrorcovariance,Ð aY Z� 8 � � , is

Ð aY¸Z� 8 � � � Ì aY¯Z� 8 �¿Ë �� 8 � :

Thenext stepin trackingis tofindwhetherthereisactuallyanextractedfeatureneartheprediction

in thenext frame.This is doneby defininga validationregion in which Ê aY Z� 8 � � canbefound

with a high probability. The validation region is definedwith the Mahalanobisdistance[60]

betweenapredictionandanactualmeasurement.


Let ¢=Ñ��Ò��Ó²� representtheMahalanobisdistancebetween}Ê aYÂZ� 8 �¿Ë �� andany extractedfeature

point Ê a³ Z� 8 � � in� a Z� 8 � � , then

¢ Ñ �ÒO��Ó²� ��- Ê a³ Z� 8 � � { }Ê aY Z� 8 �¿Ë ��0 1 Ð aY Z� 8 � � - Ê a³ Z� 8 � � { }Ê aY Z� 8 �¿Ë ��0 (3.2)

The extractedimagefeature Ê a³ Z� 8 � � is consideredto have originatedfrom }Ê aY Z� Ë �� if the

condition

¢ Ñ �Ò��Ó6��ÔÎÕ (3.3)

is satisfied.Õ is determinedby thestatisticaldistribution of thepredictionandthelevel of confi-

dencethatis required.SinceÊ a³ Z� 8 � � and }Ê aY Z� 8 �¿Ë �� haveGaussiandistributions,thedistance

in (3.2)hasa Ö distribution with two degreesof freedom.For a level of confidenceof 99%,we

canselecta thresholdÕ ��Ï :×V � N % .Graphically, (3.3)definesanelliptical region centeredaround }Ê aY Z� 8 �¿Ë �� . Applying thestatis-

tical testto all extractedfeaturepointsin�Za Z� 8 � � is analogousto searchingfor all the feature

pointsthatareinsidethiselliptical region.

Assumingthat a particular point Ê a³ Z� 8 � � is found and is associatedwith the prediction

}Ê aY Z� 8 �¿Ë �� , the new location and error covarianceof the featurepoint we are tracking can

beestimatedasfollow:

}< aY Z� 8 �¿Ë � 8 � � � }< aY Z� 8 �¿Ë �� 8FØ aY Z� 8 � � - Ê a³ Z� 8 � � { }< aY Z� 8 �¿Ë ��0��Ì aY Z� 8 �¿Ë � 8 � � � Ì aY Z� 8 �¿Ë �� { Ø aY Z� 8 � �ZÌ aY Z� 8 �¿Ë ��Z:

Ø aY Z� 8 � � is theKalmanfilter gain,where

Ø aY°Z� 8 � � � Ì aY)Z� 8 �¿Ë ��ZÐ a�Ù TY Z� 8 � �Z:

The prediction and updatestepsiterate betweenconsecutive framesto track a featurepoint

throughouta whole sequence.As more framesareused,the estimatedlocationof the feature

point becomesincreasinglyaccurateasits uncertaintydecreases.

For any projectedfeaturepointontheright imagestream,< bY Z�� , thesamemotionandmeasure-


mentmodelsapply, althoughtrackingis doneseparatelyin thetwo imagestreams.Theestimated

locationanderrorcovariancearepropagatedby thesameKalmanfilter equations.

3.3.3 Model Priors

At thebeginningof therecursive trackingalgorithm,prior valuesfor }< aY �¿Ë � � and Ì aY Z� Ë �� are

requiredto initialise theKalmanfilter. At frame1, thebestprior availableis to let

}< aY �¿Ë � � � Ê aY � �Z�Ì aY �¿Ë � � �É�

for all featurepointsin�Za � � .

Thedynamicequationsin (3.1)alsorequireknowledgeaboutthemodellingerrorrepresentedby

�ÄZ�� . Without actuallyknowing therealmotionparameters,theerrorin thedynamicmodelcan

only beestimatedatsomemaximumvalues.

Assumingthatweknow themaximumdistancebetweenany pointon theobjectandthecameras,

aswell asthemaximumangularandtranslationvelocity of therelative motion,we canmeasure

thedifferencebetweenthereal trajectoriesof 2D featurepointsandtheir trajectoriessuggested

by the zero, first, andsecondorderestimators. �� , �ÄZVx� , and ��Z� Ë �¤ÇÚ%x� are thencom-

putedexperimentallyby taking the samplecovarianceof the trajectorydifferencesover many

experiments.

3.3.4 Relation to StereoMatching and Motion Estimation

As mentionedat the beginning of this section,2D featuretracking permitseachfeaturepoint

on eachof the left andright imagesto follow a different trajectorythat may not be consistent

with a singlerigid body motion. Hencethis is not the ideal methodfor establishingtemporal

correspondencesin thelong term,but it servesasa startingpoint beforeany motionor structure

estimatesareavailable.

Assumethatapairof stereomatchingcandidates,Ê aY Z�� andÊ bY Z�� , hasalreadybeenestablished

successfully. Although2D trackingdoesnot enforcerigidity constraints,the limited searchre-


epipolarconstraint

epipolarconstraint

3D dynamicsÛ

I (f+1)L I (f+1)R

I (f)L I (f)R

Figure3.4: Constraintsin 3D featuretracking.

gionsfor thefeaturepointsin thenext framereadilyaddconstraintsto thestereomatchingprob-

lem. If bothof thesepointscanalsobetrackedandeachhasa singleassociatedmeasurementat

frame � 8 � , apairof corresponding3D pointscanbereconstructedfrom X¶}< aY Z� Ë ��Z��}< bY Z� Ë ��Z[and XÜ}< aY Z� 8 �¿Ë � 8 � �Z�;}< bY Z� 8 �¿Ë � 8 � �Z[ by triangulation. Using a minimum of four pairs

of corresponding3D points,the3D relative motionparameterscanbeestimateduniquelyusing

linear methods[73, 74, 75]. An initial estimateof the3D motion allows featuretrackingto be

furthercarriedout in amuchmoreconstrainedmanner, aswill bedescribedin thenext section.

3.4 Thr eeDimensionalFeatureTracking

If the relative motion betweenthe camerasand the object is known, thena 3D motion model

canbeusedfor featuretracking.Theadvantageof 3D trackingis thatasinglesetof dynamicsis

appliedto all of thefeaturepointsin boththeleft andright imagestreams,asshown in Figure3.4.

This impliesthat theresultingfeaturecorrespondenceswould beconsistentwith rigid bodymo-

tion; the stricterconstraintswould alsoincreasethe likelihoodof establishingcorrecttemporal

correspondences.

The formulationof the3D trackingproblemis very similar to that for 2D tracking. Again, the

Kalmanfilter will be used. The major modificationis that 2D featureprojectionswill now be

predictedandupdatedsimultaneouslyon both the left andright images.The3D point estimate

will alsorely on measurementsfrom bothimages.


3.4.1 Motion and MeasurementModels

In thecaseof 3D featuretracking,thestateto beestimatedis thepositionof 3D featurepoints.

Becausewe areprocessinga stereoimagesequence,stereocorrespondencesbetweenleft and

right imageframescanbeusedto reconstruct3D points.

Thelocationof a3D point is governedby themotionmodel

+ Y Z� 8 � � � d¡Z��o+ Y Z�� 8 �Z��Z: (3.4)

Themeasurementvectornow consistsof thecoordinatesof theextractedfeaturesonboththeleft

andright images,that is,

Ê aY Z��Ê bY Z��

� < aY Z��< bY Z��

8 ¥ a Z��¥ b Z�� ^ - ç�Z+�Y�Z��0

^ - d¡�Z+ Y Z��08 ¥ a Z��¥ b Z��

(3.5)

where is thevectorvaluedprojectionfunctionasin (2.3),and ¥ a Z�� and ¥ b Z�� areGaussian

randomvectorsrepresentingmeasurementnoisein theleft andright imagefeaturesrespectively.

Sincethe samefeatureextractor is used,the measurementnoisewould have the samestatisti-

cal distribution for both images.Using covariance�

asdefinedpreviously in Section3.3, the

covarianceof thenoisevectorin the3D caseis definedby aÍ & Í matrix

�`Ýwhere

�'Ý¡� � N vÞ²N vÞ² �

Forsimplification,thewholeimagecoordinatevectorwill bedenotedas< Y Z�� , themeasurement

vectorasÊ Y Z�� , andtheprojectionfunctionas ^ - + Y Z��0 hereafter.

In orderto tracka 3D featurepoint from onestereopair to thenext, thegoal is to find thebest

estimateof its state,i.e., its 3D location,basedon theabove motionandmeasurementmodels.


3.4.2 Prediction and Update

Thepredictedlocationof the3D pointandthepredictionuncertaintyin thenext frameare

}+ Y Z� 8 �¿Ë �� d }+ Y Z� Ë �� 8 ��I Y Z� 8 �¿Ë �� d I Y Z� Ë ��Zd 1 : (3.6)

Thepredictedmeasurementis

}Ê Y Z� 8 �¿Ë �� ^ - }+ Y Z� 8 �¿Ë ��0�:

Sincethemeasurementmodelin this caseis non-linear, an extendedKalmanfilter [55] is used

to maintainlinearity of theestimationtechnique.Thenon-linearfunction ^ hasto be linearised

abouttheestimatedtrajectory. Let ß Z� 8 � � betheJacobianmatrix representingthefirst order

approximationof ^ aboutthepoint }+ Y Z� 8 �¿Ë �� , that is,

ß�Z� 8 � � � © ^ò+S�© + à=áQâà"ãåä J#æ zZç J#è:

Thenthemeasurementpredictionerrorcovarianceis approximatedas

Ð Y Z� 8 � � � ß�Z� 8 � � I Z� 8 �¿Ë ��Zß 1 Z� 8 � � 8 �`Ý : (3.7)

In 3D tracking,anextractedfeaturepoint from eachof theleft andright imageshasto beassoci-

atedwith theprediction.TheMahalanobisdistancefor eachimageis

¢ a Ñ Zé¨��Ó²� ��- Ê a³ Z� 8 � � { }Ê aê Z� 8 �¿Ë ��0 1 Ð a²aYëZ� 8 � � - Ê a³ Z� 8 � � { }Ê aê Z� 8 �¿Ë ��0��¢ b Ñ �Ò��ì¾� ��- Ê bí Z� 8 � � { }Ê bY Z� 8 �¿Ë ��0 1 Ð b¨bY Z� 8 � � - Ê aí Z� 8 � � { }Ê aY Z� 8 �¿Ë ��0��

where Ð a²aY Z� 8 � � and Ð b!bY Z� 8 � � are V�&�V sub-matricesof Ð Y Z� 8 � � , suchthat

Ð �Ð a=a ... Ð a=b:�:v:_:�:v:_:�:v:_:�:v:_:Ð b!a ... Ð b!b


The sameÖ testasthat in (3.3) is appliedfor associatingnew featurepointswith predictions.

Assumingthat thepoints Ê a³ Z� 8 � � and Ê bí Z� 8 � � have passedthestatisticaltestin (3.3). The

new measurementvectorconsistingof thesetwo pointswill bedenotedÊ ³ í Z� 8 � � , where

Ê ³ í Z� 8 � � � Ê a³ Z� 8 � �Ê bí Z� 8 � � :

Thelocationanderrorcovarianceof the3D point areupdatedasfollows:

}+ Y Z� 8 �¿Ë � 8 � � � }+ Y Z� 8 �¿Ë �� 89Ø Y Z� 8 � �ZX�Ê ³ í Z� 8 � � { ^ - }+ Y Z� 8 �¿Ë ��0�[x�I Y Z� 8 �¿Ë � 8 � � � I Y Z� 8 �¿Ë �� { Ø Y Z� 8 � �Zß�Z� 8 � � I Y Z� 8 �¿Ë ��Z�wheretheKalmanfilter gainis

Ø Y Z� 8 � � � I Y Z� 8 �¿Ë ��Zß Z� 8 � �ZÐ y�zY Z� 8 � �Z:

3.4.3 Model Priors

Assumingthat3D trackingcommencesat frame �>P , thentheinitial estimateof a featurepoint’s

3D locationis givenby thereconstructionfrom apairof 2D featurepointsontheleft andright im-

ages.Givenapairof stereomatchingcandidates,X�Ê a³ Z� P �Z�ZÊ bí Z� P �Z[ , }+ ³ í Z� P Ë � P � refersto the3D

point reconstructedfrom the estimatesof the features’true 2D locations, X }< a³ÆZ�>PO�Z� }< bí Z�>PO�Z[ ,usingtheprocessof triangulationasdescribedin Chapter2.

The initial uncertaintyin the estimate,I ³ í Z�>P Ë �>PO� , can be set arbitrarily large. The problem

with this approachis that a large uncertaintywould result in large searchregionsfor predicted

measurementsin thenext frame,whichmayleadto temporalmatchingambiguitiesbecausemany

featurepointsmayfall into thesearchregion. Theapproachpresentedin [65], basedonanalysing

theerrorin triangulation,is usedhereto approximateI ³ í Z�>P Ë �>PO� . [76] hasamorecomprehensive

discussionof thesubject.

Let î beavector-valuedfunctionrepresentingthereconstructionfunction,suchthat

}+ ³ í Z� P Ë � P � � îïoÊ ³ í Z� P �Z�Z:


Theerror in }+�³ í Z� P Ë � P � is attributedto theerror in themeasurementvector Ê ³ í Z� P � . According

to themeasurementmodelin (3.5),themeasurementvectorhasaGaussiandistribution with zero

meanandcovarianceof�`Ý

.

To lineariseî , we definetheJacobianmatrix

� �ð© î© Ê ñ á âñ�ò�ó ä J#æ zZç J#è:

ThenI ³ í Z� P Ë � P � canbeapproximatedas

I ³ í Z� P Ë � P � � � � Ý � 1 :

This approximationrepresentsthe uncertaintyin the 3D point estimateas having a Gaussian

distribution with zero-meanandanellipsoidalconstantprobabilitycontour.

3.5 Multiple HypothesisTracking and StereoMatching

In theprevioussections,two alternative trackingschemesfor stereoimagesequenceshave been

presented.They both involve the predictionof featurelocationsin the next imageframes,and

then looking for featurepoints that are locatedwithin the searchregions. However, we have

not discussedthe casewhenmultiple featurepointsarefound within the searchregionson the

images.Moreover, theproblemof robuststereomatchinghasyet to beaddressed.

In 3D reconstruction,we areoften interestedin extractinga large numberof salientfeaturesin

animagein orderto build a denserepresentationof theobject,andeachof thesefeatureshasto

be tracked throughoutthe imagesequence.Wheneachimageis clutteredby a large numberof

features,theproblemof establishingaccuratetemporalcorrespondencesbecomemoredifficult,

becauseit is quitecommonto find morethanonefeatureinsidethesearchregion.

Therearemany statisticaldataassociationtechniquesthataddressthisproblemin multiple target

tracking [60, 62]. In particular, onemethodis to usemultiple hypothesistracking [64]. The

advantageof this technique,asshown in Figure3.5,is thatmatchingambiguitiesarenot resolved

immediately, but insteada decisionis deferreduntil moreinformationis available. Thegeneral


frame f f+1 f+2 f+3

Figure3.5: Deferralof matchingdecisionsby multiple hypothesistracking.Two measurementsareassociatedwith a single target at frame � 8 V . The track is split into two by two separatematchinghypothesesandadecisionis deferredto frame � 8 % . Theprunedhypothesisis identifiedby ( ô )procedurefor multiple hypothesistrackingis depictedin Figure3.6.

Yi andOh [20] have extendedthe multiple hypothesisframework for stereoimagesequences,

in which decisionson stereomatchingare also delayed. In this thesis,we will usea similar

approach,but sincewe aremoreinterestedin reconstructinga singlerigid object,themotionof

thefeaturepointsis governedby a differentsetof dynamicsfrom thatproposedin [20]. We will

formulatetheproblemin thecontext of thewholeincrementalreconstructionalgorithm.

3.5.1 HypothesisGeneration

For multiple hypothesistrackingandstereomatchingusinga stereoimagesequence,we usethe

four-framemodelin Figure2.11.Let é¿��ÒO��Ó$��ì beintegerindicesto theextractedfeaturepointson

theimages�Za Z�� , �Zb Z�� , �Za Z� 8 � � , �Zb Z� 8 � � respectively. Wedefinefour setsof hypotheses,

õ;ö Z�� XO÷ ö Z�¾��é¨��Ò��Z[x� õ;ö Z� 8 � � � XO÷ ö Z� 8 � ��Ó$��ì¾�Z[x�õ t�Z�� XO÷'t¾Zé¿��Ó6�Z[x� õ w¨Z�� XO÷`w��ÒO��ì¾�Z[x�where

� ÷ ö Z�¾��é¿��ÒO� is thehypothesisthat Ê aê Z�� and Ê bY Z�� arestereofeaturematches,

� ÷ ö Z� 8 � ��Ó$��ì¾� is thehypothesisthat Ê a³ Z� 8 � � andÊ bí Z� 8 � � arestereofeaturematches,

� ÷'t¾Zé¨��Ó²� is thehypothesisthat Ê aê Z�� and Ê a³ Z� 8 � � aretemporalfeaturematches,


Raw Intensity Images

Observed Featuresø

Predicted Feature Locations

Delay

Matchingù

Hypothesis Management (pruning, merging)

For each hypothesis,generate predictions

Feature Extraction

Active hypotheses at frame f

Active hypotheses at frame f+1

Generate newú hypotheses

Figure3.6: Outlineof themultiple hypothesistrackingalgorithm

� ÷ w �Ò��ì¾� is thehypothesisthat Ê bY Z�� and Ê bí Z� 8 � � aretemporalfeaturematches.

Eachstereomatchinghypothesisisassociatedwith areconstructed3D point. For instance,}+ ê Y�Z��representa3D point reconstructedfrom }< aê Z�� and }< bY Z�� , theactualestimatedlocationsof the

featurescorrespondingto theextractedlocationsÊ aê Z�� and Ê bY Z�� .Recalling that the epipolar line of any extractedfeaturepoint Ê aê � - �9�²0 1 is denotedby

� I - Ê aê 0 �û-�� R 0 1 , we definethedistancebetween� I - Ê aê 0 andany point Ò in theright image

frameusingtheperpendiculardistance,thatis,

¢=ü�Zé¨��Ò�� Ë� �

8 � � 8 R Ëý� 8 � :

At � �� , aninitial setof hypotheses,õ ö � � , is createdusingtheepipolarconstraint,thatis,

þ é þ Ò�� create ÷ ö Z�¾��é¿��Ò�� if ¢²ü�Zé¿��Ò��Ô�ÿ�� (3.8)


whereÿ is setto asmallvalueto accountfor errorsin featureextractionandquantizationeffects.

The value for ÿ canbe determinedas a function of the noisemodel for the 2D featurepoint

measurements.

Now consideraparticularstereomatchinghypothesis÷ ö Z�¾��é¨PL��ÒOPL� betweenthepoints é!P andÒLP .Eithertwo-dimensionalor three-dimensionaltrackingasdescribedin Section3.3andSection3.4

canbe usedto make the predictions }Ê aê�� Z� 8 �¿Ë �� and }Ê bY � Z� 8 �¿Ë �� . Insteadof associatinga

singlefeaturepoint to eachprediction,temporalmatchinghypothesesaregenerated.

Usingthedefinitionof Mahalanobisdistancein (3.2), temporalmatchinghypothesesarecreated

asfollow:

þ Ó"� create ÷ t Zé P ��Ó²� if ¢=Ñ�Zé P ��Ó6�¸ÔÎÕQ�þ ì¾� create ÷'w��Ò P ��ì¾� if ¢ Ñ �Ò P ��ì¾�ïÔÎÕQ:

Then,new stereomatchinghypothesescanbecreated:

þ Ó þ ì¾� create ÷ ö Z� 8 � ��Ó$��ì¾� if

� ÷'t¾Zé P ��Ó²� and� ÷'w��ÒOPL��ì¾� and

¢=ü��Ó"��ì¾�¸Ô�ÿ�:(3.9)

By the combinationof the conditions in (3.8) and (3.9), each stereomatching hypothesis

÷ ö Z� 8 � ��Ó$��ì¾� satisfiesfour setsof constraints:two frame-to-framemotionconstraintsandtwo

view-to-view epipolarconstraints.For instance,if a point é¨P hasseveralstereomatchingcandi-

dates,theapplicationof thethreeadditionalconstraintsin (3.9)would helpto rejectsomeof the

falsestereohypothesesfrom thepreviousframe.

3.5.2 HypothesisManagement

Thehypothesisgenerationstepjust describedabove doesnot applyany globalconstraintswhen

new stereomatchinghypothesesarecreated.Sincefor eachstereohypothesis÷ ö Z�¾��é¿��Ò�� several

temporalmatchinghypothesesmay be generated,it is often possiblethat morethanonestereo

hypothesisis createdwith the samematchingfeaturepoints at frame � 8 �. An exampleis


1 2�

3�

4�

I (f+1)L I (f+1)R

I (f)L I (f)R

5�

6

Figure 3.7: An examplesituationin which redundantstereohypothesesare created. Both ofthestereomatches(1,2)and(3,4)generatepredictionsthatresultin a stereohypothesisbetweenpoints5 and6. Oneof thehypotheseshasto bedeleted,andthechoicedependson which oneismorelikely to resultfrom areal3D point.

illustratedin Figure3.7.

In orderto preventthenumberof hypothesesfrom growing, whichwould increasecomputational

costs,we imposetherestrictionthatat any frame � andfor eachdistinctpair of integer indices

Zé¿��ÒO� , thereis only onehypothesisthat is identifiedby Zé¿��Ò�� . This implies that redundanthy-

potheseshave to bepruned(deleted)from thesetõ ö Z�� . Thegeneralstrategy for applyingthis

constraintis to thekeepthehypothesiswith thehighestlikelihoodof representingthetruestereo

correspondences,anddeleteall theothers.

Thestateof a stereohypothesis,in additionto theindicesof thematchingfeaturepoints,is also

identifiedby its parenthypothesis,reconstructed3D point }+QZ�� , andits errorcovarianceI

. The

parentof a hypothesisrefersto a stereohypothesisfrom thepreviousframesthatresultedin the

creationof thecurrenthypothesis.For instance,usingthenotationasbefore,for aparticularsetof

points oÊ a³ � Z� 8 � �Z�ZÊ bí�� Z� 8 � �Z� thatsatisfytheconditionsin (3.9),theparentof ÷ ö Z� 8 � ��Ó P ��ì P � is

÷ ö Z�¾��é P ��Ò P � . Theidentityof theparenthypothesissimplyreflectsthetrackhistoryof thefeature

pointsin thecurrenthypothesis,which canbeusedto determinewhich redundanthypothesisis

morelikely to bea resultof a real,existing3D featurepoint.

The likelihood of the hypothesis÷ ö Z�¾��é¨��Ò�� being the parentof ÷ ö Z� 8 � ��Ó6P��ì�PL� is assessed

by a combinationof threegoodnessof fit criteria. Let � a Z� 8 � ��Ó P � be themeasureof fitness

in termsof the left temporalmatch, � b Z� 8 � ��ì�P�� in termsof the right temporalmatch,and


� b Z� 8 � ��÷)� 8 � Ó P ì P � in termsof thestereomatches.Thenthethreegoodnessof fit criteriaare

� a Z� 8 � ��Ó P � � 1 � a Z�¾��é¨�8 �`{� 1 � �`{ ¢ Ñ Zé¿��Ó¾P_�Õ � (3.10)

� b Z� 8 � ��ì P � � 1 � a Z�¾��Ò��8 �`{� 1 � �`{ ¢ Ñ �Ò��ì�P��Õ � (3.11)

�`Á - � 8 � ��÷ ö Z�¾��Ó P ��ì P ��0 � Á��`Á - �¾��÷ ö Z� 8 � ��é¿��Ò��0 8 �`{� ÁÆ� �`{ ¢=ü��Ó P ��ì P �ÿ : (3.12)

where 1 - N � � 0 and

Á - N � � 0 arefadingmemoryfactors.

Sincefor all contestinghypotheses,÷ ö Z�¾��é¿��Ò�� , the conditionsin (3.8) and(3.9) have already

beenmet,

� a Z� 8 � ��Ó P � - N � � 0�� b Z� 8 � ��ì P � - N � � 0�� Á - � 8 � ��÷ ö �Ó P ��ì P ��0�� - N � � 0��andavalueof 1 correspondsto aperfectfit for all of them.

Thenbasedon (3.12),theparentof ÷ ö Z� 8 � ��Ó6P��ì�PL� is

÷ ö Z� 8 � ��é P ��Ò P � � arg max

÷ ö Z� 8 � ��é¨��Ò�� 1 �

a Z� 8 � ��Ó¾P�� 8 � b Z� 8 � ��ì�PL�V8 �`{ � 1 ��`Á - �

8 � ��÷ ö Z�¾��Ó P ��ì P ��0 �(3.13)

where� 1 - N � � 0 is the weight on the goodnessof fit measurebasedon temporalmatching

relative to thaton stereomatching.

After thebestparenthasbeendeterminedfor redundanthypotheses,thechild hypothesisassoci-

atedwith thisparentis keptandall otherredundantonesaredeleted.

Notethatour pruningstrategy only addressesthesituationwhenstereohypothesesusethesame

featurepointson boththeleft andright imageframes.It doesnot enforceuniquenessor one-to-

onematchingbetweenthefeaturespoints.Therefore,at theendof processingeachpairof stereo

frames,somefeaturepointsmaystill have multiple matchingcandidates.Only thosethathave a

singlematchingcandidatewill beusedto reconstructa 3D point for theobjectrepresentationin

thecurrentframe.Therestof thestereomatchingambiguitiesmayberesolvedover time.


Preliminaryresultsof the multiple hypothesistracking and stereomatchingalgorithmwill be

presentedin thenext chapter.

Chapter 4

Simulations

To demonstratethealgorithmproposedin Chapter3, simulationtestshave beenconducted.This

chapterpresentsanddiscussestheresultsof thesesimulationtests.In particular, we will demon-

stratemultiple hypothesistrackingusingtwo-dimensional(Section4.2) and three-dimensional

(Section4.3) motionconstraints,andexaminethedifferencesthey make on theaccuracy of the

reconstructedobject(Section4.4).

4.1 Description of Data

Thereareseveraladvantagesof usingsyntheticallygenerateddataover extractingfeaturesfrom

realimages:

1. Problemsrelatedto featureextractionsuchasfeaturesdisappearing,re-appearingovertime

or falsefeaturesareavoided.

2. The featureextractor’s accuracy andprecisionin locatingfeaturepointscanbe modelled

explicitly.

3. Occlusionof featurepointsdueto viewpoint variationcanbeconvenientlyignored.

4. Motion of theobjectandcamerasis preciselycontrolled.

57

CHAPTER4. SIMULATIONS 58

Figure4.1: The syntheticsatellitemodelusedin the simulations.Randomlygeneratedfeaturepointsareshown asblackdots.Somepointsonthefarsurfaceof thecylinderhavebeenoccluded,but they aretreatedasif they arevisible in theexperiments.

5. Groundtruthsfor bothmotionandobjectshapeareavailablefor performanceevaluation.

For the reasonsdescribedabove, a simplemodelconsistingof an open-endedcylinder andtwo

planarsurfaces,asshown in Figure4.1, is constructedto imitate the shapeof a satellite. Data

pointson the surfaceof the modelare randomlygeneratedto representactualfeatureson the

object. A parallelstereocamerasystemwith a configurationasin Figure2.6 is assumed.The

baselineis a translationof 150mm alongthe X-axis. Intrinsic parametersfrom real calibrated

camerasare used; therefore,the focal lengthsand principle points of the two camerasdiffer

slightly. TheimagesarealsogivenasizeofÍ�� N &��xV N pixels.

The valuesusedas the systemparametersare includedin AppendixA. Theseparametersde-

terminethe coordinatesystemtransformationsbetweenthe 3D syntheticdatapointsand their

respective 2D imageprojectionson the left andright cameras.Figure4.2 shows anexampleof

theresulting2D datapointsafterapplyingthetransformations.


100 200 300 400 500 600 700

50

100

150

200

250

300

350

400

450

x

y

(a)Featurepointson left cameraimage

100 200 300 400 500 600 700

50

100

150

200

250

300

350

400

450

x

y

(b) Featurepointson right cameraimage

Figure4.2: Samplesyntheticdatapoints.

In all of theexperimentsin thischapter, anadditionalstereomatchingconstraintisadded:In order

to ensurethat the reconstructedpointsdo not have negative depthswith respectto thecameras,

thestereodisparitybetweencorrespondingpointsis constrainedto benegative, thatis, �� .Thishelpsto eliminatesomeof thematchingambiguities.

4.2 Two DimensionalTracking

Thefirst experimentexaminesthe trackingof featurepoints,assumingno knowledgeaboutthe

motion,andusingexclusively thetwo-dimensionaldynamicmodelsasdescribedin Section3.3.

A simplemotionis usedto illustratetheprocessof resolvingstereomatchingambiguitiesusing

multiple hypothesistracking(Section3.5). The two camerasare rotated3 degreesaroundthe

opticalaxisof theleft camerabetweenconsecutive frames.Gaussianwhitenoisewith zeromean

andcovariance�� "! is addedto theprojectionsof a setof synthetic3D featurepointson the

object,to modelfeatureextractionandquantizationerrors.

Figures4.3and4.4show theactive stereomatchhypothesesandpredictionsfor a singlefeature

point from frames1 to 4. The figureson the left columnrepresentleft cameraimagesandthe

oneson theright representright cameraimages.

At #�%$ , the highlightedfeaturepoint on the left (4.3(a))hasthreestereomatchcandidates


on theright image(4.3(b))thatsatisfythebasicepipolarconstraints,two of which areincorrect

matches.The locationsof thesefour pointsat #&�'� arepredictedusinga first orderestimator

(4.3(c),4.3(d)).Sinceat #(�)� , the featurepoint associatedwith Hypothesis1 no longersatis-

fies theepipolarconstraint,it is rejectedandthe numberof active hypothesesdecreasesto two

(4.3(e),4.3(f)). The locationsof the remaininghypothesesat #*�,+ arenow predictedusinga

secondorderestimator(4.4(a),4.4(b)).Similarly, thefeaturepointsassociatedwith Hypothesis2

do not satisfytheepipolarconstraint,andthereforearerejected.Theonly remainingactive hy-

pothesisis thecorrectstereomatchfor thepoint (4.4(c),4.4(d)).Thelocationsof this hypothesis

predictedusinga secondorderestimatorat #-�/. areshown in 4.4(e)and4.4(f). Notice that

theuncertainlyregionsof thesepredictionsareslightly larger thanthosein theprevious frames.

This is becausethemeasurementnoisehasa largercontribution to themeasurementuncertainty

in thesecondorderestimatorthanthezeroandfirst orderestimators.Thesizeof theseregions

will remainvery muchconstantfor futureframes.

4.3 Thr eeDimensionalTracking

Thesecondexperimentexaminesthethree-dimensionaltrackingof thefeaturepointsasdescribed

in Section3.4 aswell astheprocessof resolvingstereomatchingambiguities.All theassump-

tionsaboutthestereocamerasystem,motion,andmeasurementnoisearethesameasthosefor

the2D trackingexample. For this experimentonly, we will assumethat themotionparameters

areknown a priori for demonstrationpurposes,suchthat an exact 3D dynamicsmodelis used

for thepredictionandestimationof featurepoint locations.Of course,our eventualgoal is not

to assumeknown motion, but to estimatethe motion parameters.However, we will leave this

discussionuntil Chapter5.

Figures4.5and4.6show theactive stereomatchhypothesesandpredictionsfor a singlefeature

point from frames1 to 4. In this case,each3D point hypothesisgeneratesa slightly different

predictionin the left frame. As a result,all the differentpredictionsfor all the hypothesesare

shown in left framesof Figures4.5 and4.6. A different featurepoint from the onein the 2D

trackingexamplewaschosenfor demonstrationpurposes.At #0�$ , thereis atotalof four stereo

matchinghypothesesfor the selectedfeaturepoint including the correctmatch(4.5(a),4.5(b)).

Similar to the2D trackingexample,the incorrectmatchesarerejectedover time andby #1�2. ,only two hypothesesremainactive (4.6(c),4.6(d)).It caneasilybeseenthat theremainingfalse


Left image Right image

(a) 35476 , active hypotheses

12

(b) 38476 , active hypotheses

(c) 354:9 , predictions

12

(d) 35479 , predictions

(e) 35479 , active hypotheses

2

(f) 354:9 , active hypotheses

Figure4.3: Frames1 and2 of a multiple hypothesis2D trackingexample. The projectionsofthe testpoint areshown as( ; ). ( < ) representactive hypotheses.Predictionsare( = ), andtheirassociateduncertaintyregionsaredefinedby circles(continuesin Figure4.4).



(a) 354:> , predictions

2

(b) 3547> , predictions

(c) 3547> , active hypotheses (d) 3847> , active hypotheses

(e) 354@? , predictions (f) 354A? , predictions

Figure4.4: (continuedfrom Figure4.3) Frames3 and4 of a multiple hypothesis2D trackingexample. The projectionsof the testpoint areshown as( ; ). (< ) representactive hypotheses.Predictionsare( = ), andtheir associateduncertaintyregionsaredefinedby circles.


match,Hypothesis3, will be rejectedat the next framebecausethereis no other featurepoint

within thesearchregionof thathypothesis(4.6(e),4.6(f)).

4.4 Incr ementalReconstruction

In the two previous trackingexperiments,ambiguitiesin stereomatchingareresolved within a

few numberof frames. It is importantto point out that, of course,not all featurepoints will

alwaysbematchedcorrectlyusingthismethod,while othersmayhavenoambiguitiesatall from

thefirst pair of frames.

Thelasttwo experimentssimply illustratehow multiplehypothesistrackingworksunder2D and

3D tracking.Wenow demonstratetheaccuracy of theactualstructureestimatesin thesametwo

separatescenarios:usingeither2D trackingor 3D trackingexclusively. In eachof thefollowing

cases,the motion is the sameas the previous two demonstrations.Furthermore,only feature

pointson the left imagewith just oneactive stereomatchhypothesisfrom the right imageare

reconstructedat eachframe. All points are expressedin termsof the left cameracoordinate

systemat frame1.

In thefirst experiment,solely2D trackingis usedthroughoutthe imagesequenceandno noise

wasaddedto thegenerated2D measurements.Figure4.7 illustratestheresultsof reconstructing

3D pointsfrom validatedstereocorrespondencesat frames1, 10,and20. Thefigureson theleft

columnshow the reconstructedB and C valuesof the points,and the right columnshow the

reconstructeddepthvaluesD on theverticalaxis,bothalongwith groundtruth.

As shown in Figure4.7,thereconstructed3D pointsaregenerallyquiteaccuratewhenthereis no

measurementnoise.At frame1, unambiguousstereomatchescanreadilybefoundfor anumber

of featurepoints.Thereasonis clearby observingthattheseparticularpointswouldhave unique

correspondencesbetweenthe left andright imagesasno otherfeaturepointswould satisfythe

epipolarconstraint. As more imageframesare available, the numberof points reconstructed

increases.

Figure 4.8 summarizesthe resultsof this experiment. The total numberof active hypotheses

at eachframedecreasessignificantlywhile the numberof reconstructedpointsincreasesin the

first five frames.Althoughtherewasonemismatchedpoint at someof theframes,thestructure



(a) 35476 , active hypotheses

123

(b) 38476 , active hypotheses

(c) 354:9 , predictions

1 23

(d) 35479 , predictions

(e) 35479 , active hypotheses

23

(f) 354:9 , active hypotheses

Figure4.5: Frames1 and2 of a multiple hypothesis3D trackingexample. The projectionsofthe testpoint areshown as( ; ). ( < ) representactive hypotheses.Predictionsare( = ), andtheirassociateduncertaintyregionsaredefinedby circles(continuesin Figure4.6).



(a) 354:> , predictions

23

(b) 3547> , predictions

(c) 3547> , active hypotheses

3

(d) 3847> , active hypotheses

(e) 354@? , predictions

3

(f) 354A? , predictions

Figure4.6: (continuedfrom Figure4.5) Frames3 and4 of a multiple hypothesis3D trackingexample. The projectionsof the testpoint areshown as( ; ). (< ) representactive hypotheses.Predictionsare( = ), andtheir associateduncertaintyregionsaredefinedby circles.


FrontView TopView

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�F$ #E�F$

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�$"G #��F$"G

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E��"G #��F�"GFigure4.7: 3D pointsreconstructedusingonly 2D featuretrackingandno measurementnoise.( H ) is groundtruth, ( I ) is reconstruction.


0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

Frame number

Nu

mb

er

of p

oin

ts

Active hypothesesReconstructed pointsMismatched pointsExisting features

Figure4.8: Summaryof resultsfor 2D trackingaloneandnomeasurementnoise.

estimatesshown in Figure4.7suggestthatthemismatchprobablyoccurredbetweentwo feature

pointsthataregeometricallycloseto eachotherin 3D in thefirst place.

Next we corruptthe2D featurepoint measurementswith Gaussianwhite noisewith zeromean

andcovariance�,�J! , in pixel units. Figure4.9 revealsthat the resultingdepthestimatesare

quite sensitive to the noise. This situationariseswhenthe baselineof stereocamerasis small

relative to thedepthof theobjectin thescene.Any minisculeerrorsin the2D featurepositions

could leadto largeerrorsin thedepthestimates.Thefurthertheobjectis from thecameras,the

moresensitive to themeasurementnoisethe reconstructedpointswill be. This problemwill be

discussedin moredetail in Chapter5.

As illustratedin Figure4.10,thenumberof reconstructedpointsdoesnot increasemonotonically

as it did in the noiselesscase. By observingthat the numberof active hypothesesalsodrops

alongandit reachesbelow thenumberof existing featuresin the images,we canspeculatethat

thefeaturetracker lost trackof someof thefeaturepoints.This mayindicatethatthe2D motion

modelsusedmaynotaccuratelyreflectthetruedynamicsof theimagefeaturesvery well.


FrontView TopView

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�F$ #E�F$

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�$"G #��F$"G

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E��"G #��F�"GFigure 4.9: 3D points reconstructedusing only 2D featuretracking and measurementnoise��! . ( H ) is groundtruth, ( I ) is reconstruction.


0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

Frame number

Nu

mb

er

of p

oin

ts


Figure4.10:Summaryof resultsfor 2D trackingaloneandnoise ��F! ).We now examinethe performanceof reconstructionusing three-dimensionalfeaturetracking

alone.Only theresultsof thecasewith measurementnoiseareshown for comparisonpurposes.

The exactly samenoisevaluesusedin the 2D trackingexampleareusedin this case,and the

outcomeis presentedin Figure4.11 andFigure4.12. The accuracy of the structureestimates

evidently improvesby faroverthelengthof thesequence.By usingthethree-dimensionalmotion

parameters,the estimatedlocationsof the 3D points themselves can actually be updatedand

improvedrecursively usingtheKalmanfilter. On theotherhand,in thecaseof 2D tracking,only

the2D featurelocationsareupdatedandimproved,but thereconstructedpointsarestill affected

by smallerrorsin the2D featurelocations.Furthermore,featuretrackingis muchmoreeffective

in this casethan the previous by using2D tracking alone. Onemajor differencewith the 3D

modelis thata singlemotion is assumedfor all thefeaturepoints,enforcingrigidity constraints

amongall thepoints. The advantageof this is thatmorematchingambiguitiescanberesolved

quickly andthemismatchrateis lower.

This lastexampledemonstratesthattheapplicationof anaccurate3D motionmodelis beneficial

to thestereomatchingproblemand3D reconstruction.It motivatesour goal to estimatethemo-


FrontView TopView

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�F$ #E�F$

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�$"G #��F$"G

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E��"G #��F�"GFigure 4.11: 3D points reconstructedusing only 3D featuretracking and measurementnoise��$ . ( H ) is groundtruth, ( I ) is reconstruction.


0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

35

40

45

50

Frame number

Nu

mb

er

of p

oin

ts


Figure4.12:Summaryof resultsfor 3D trackingaloneandnoise ��F! .tion whenit is initially unknown. In addition,it is perceivablybeneficialto incrementallyimprove

the3D motionestimatesasmorefeaturepoint correspondencesareavailable.This problemwill

be discussedin the next chapteralongwith the resultsof otherexperimentsconductedon both

syntheticandrealimagesequences.

Chapter 5

ExtensionsFor Real ImageProcessing

In theprevious two chapters,thebasicformulationof the incrementalreconstructionalgorithm

hasbeenprovided (Chapter3) and testedon a setof syntheticdatain controlledexperiments

(Chapter4). However, in order to apply the algorithmon real imagesequences,a numberof

extensionshave to be implemented. This chapterwill discussand presentthe resultsof two

additions:motionestimation(Section5.1)andaddingnew features(Section5.3).

5.1 Motion Estimation

It hasbeenshown in Chapter4 thatif the3D motionparametersareknown a priori , thetracking

andreconstructionof 3D featurepointsaremuchmoreaccuratethantrackingtheindividual 2D

projectedpoints. Unfortunately, in most real life applications,suchas the one involving free

floatingobjectsin space,themotionis unknown. Consequently, motionestimationis anecessary

intermediatestepfor obtaininga3D motionmodel.

Oneadvantageof usinga stereoimagesequencefor motion estimationis that the threedimen-

sional locationof pointscanbe first establishedby stereovision techniques.Thenmotion can

beestimatedusing3D insteadof 2D point temporalcorrespondencesbetweenimageframesand

uniquemotionestimatescanbeobtained.

72

CHAPTER5. EXTENSIONSFORREAL IMAGE PROCESSING 73

5.1.1 LeastSquaresEstimation

Motion estimationis often formulatedasa linear leastsquaresestimationproblem. Using the

motionmodeldefinedin (3.4),theobjective functionto minimizebecomes

KMLONQPORTS �UVXWZY[]\ V L #@=*$ S�^`_ NQL # S \ V L # S = RaL # Scb [edgf (5.1)

Given a minimum of threepairsof point correspondencesh \ V L # SOP \ V L #7=`$ SOi , the motion es-

timates jNQL # S and jRkL # S canbe computed.HuangandBlostein [73] presenteda solutionusing

iterative leastsquares.A non-iterative algorithmbasedon singularvaluedecomposition(SVD)

wassuggestedby Arun et.al. [74] to find aclosed-formsolution.Umeyama[75] providedmodi-

ficationsto [74] to ensurethata correctrotationmatrix insteadof a reflectionis computedwhen

thedatais noisy.

Oneproblemwith the basicleastsquaresapproachis that the 3D pointsusedin the objective

function (5.1) containnoiseasa resultof reconstructionfrom noisy 2D measurements.In or-

der to obtainaccuratemotion estimates,theseerrorshave to be accountedfor in the estimator.

ChaudhuriandChatterjee[77] analysedthe performanceof total leastsquares(TLS) methods;

however, they pointedout that the rotationmatrix computedusingstrict TLS is not necessarily

orthonormal.Furthermore,GorynandHein [78] showedthatunderorthonormalconstraints,the

resultsof TLS is equivalentto thoseof theSVD methodproposedin [74]

Wenget. al. [79] presenteda weightedleastsquares(WLS) solutionto theproblem. As shown

in Figure5.1,theamountof uncertaintyof a reconstructed3D point in eachcoordinatedirection

dependsonthepositionof thepointrelative to thecameras.In mostcases,the D coordinateof the

reconstructed3D point is theleasttrustworthy. Hencefor WLS, eachof theL B P C P D S coordinate

of the3D point is weighteddifferently whencomputingtheestimateof themotionparameters.

It wasshown thattheWLS approachachievesmuchmoreaccurateresultsthanunweightedleast

squares.

In this thesis,theWLS methodis usedfor motionestimationbecausewe have seenin Chapter4

how sensitive to noisethedepthestimatescanbe. Thedetailsof thealgorithmarepresentedin


L L RO OOR

O

Figure5.1: 3D point estimateuncertaintyarisesfrom 2D imagefeaturenoise. The further thepoint is from thecameras,thelesscertainthedepthestimate,elongatingtheshapeof theuncer-tainty region.

AppendixB. In thealgorithm,a lm;7+ vectorn anda +Q;:l matrix o L \ V S weredefinedsuchthat

N \ V �Fo L \ V S n f (5.2)

Let thematricesp and q , andthevectorsr ands befunctionsof 3D pointcorrespondencesand

theirassociatederrorcovariances.Thenanintermediateestimatefor n takestheform of

tnu� L pwvxp Szy Y r P

The parametersintn do not necessarilysatisfy orthonormalconstraints. Therefore,the final

solution, jn , is computedin a secondstep,in which the error between jn andtn is minimized

subjectto jn beinga rotationmatrix. Thentheestimatedtranslationis computedas

j{ ��s ^ q jn f

Theaccuracy of jN and j{ improvesasthenumberof point correspondencesincrease;therefore,

in our implementation,we keepa history of all the 3D point correspondencesestablishedover

previousframesfor estimatingthemotionparameters.Thatis, all of thefollowing corresponding


point pairsareusedfor motionestimation:

h j\ V L $ SOP j\ V L � SOi(| h j\ V L � SOP j\ V L + SOi�| f�f�f | h j\ V L # ^ $ SOP j\ V L # SOi f

Themotionestimatesareupdatediteratively in abatchfashionateveryframeasmorecorrespon-

dencesbecomeavailable,improving trackingandstereomatchingefficiency. Onemayalsowant

to setamaximumonthenumberof framesthatapoint’shistoryis maintainedto limit theamount

of datastoragerequired.

5.1.2 AssessingEstimateAccuracy

Oncethe 3D motion parametersare available, featuretracking can switch from using the 2D

motionmodelof individual pointsasin Section3.3,to usingasingle,rigid 3D motionmodelfor

all points,asin Section3.4.

Usinggrosslyinaccuratemotionestimatesin the3D modelwouldsignificantlyreducethetrack-

ing effectivenessof thealgorithm.Thereforewe would only wantto switchto 3D trackingwhen

the estimatesaresufficiently accurate.However, it is difficult to assesswhenthe estimatesare

“goodenough.” Thestrategy thatwill beusedin ourexperimentsis to visually inspecttheerrorin

theestimatesaftersomeframesareprocessed,andmanuallydeterminewhen3D trackingshould

beused.

5.1.3 Modification to 3D Dynamic Model

The3D motionmodelpresentedin (3.4) assumedthat theexactmotion parametersareknown.

However, sincewe only have the estimates jNQL # S and j{ L # S , the error in the estimateswould

posesomeproblemsin thepredictionstepof theKalmanfilter. Thepredictedlocationanderror

covarianceof featurepointswouldbeverydifferentfrom thetruth,causingthetrackingalgorithm

to losetrackof many points.Hencethemotionmodelwill bemodifiedasfollows:

\ V L #@=�$ S � jN}L # S \ V L # S = j{ L # S =�~ V L # SOP (5.3)

where~ V L # S accountsfor theerrorin themotionmodel.


Let

� NQL # S � NQL # SZ^ jNQL # SOP� { L # S � { L # SZ^ j{ L # SOPthen

~ V L # S � � NQL # S \ V L # S = � { L # SOP or

~ V L # S �Fo _ \ V L # Scb � n L # S = � { L # S f

If we assumethat ~ V L # S is a randomvectorwith zeromean,thepredictionis

j\ V L #A=�$�� # S ��jNQL # S j\ V L #�� # S = j{ f (5.4)

Usingthenotationin (5.2),thepredictionerrorcovarianceis

� V L #@=*$�� # S � jNQL # SO� V L #�� # S jN}L # S v =�o _ \ V L # Scb��L #�� # S o _ \ V L # Scb v = ��OL #�� # SOP (5.5)

where��

and��

aretheerror covariancesof the estimatedrotationandtranslationparameters

respectively.

Accordingto themotionestimationalgorithm,weapproximatetheerrorcovariancesas

��L #�� # S � L p v p S y Y P (5.6)�Z��L #�� # S �q ��L #�� # S q�v f (5.7)

Themodificationsin (5.4)and(5.5)areprobablyinaccuratebecauseof anumberof reasons:

H1~ V L # S is mostlikely not zero-mean,

H � � L #�� # S and� � L #�� # S areapproximations,and

H sincewe do not know thetruevalueof\ V L # S , we canonly computeo _ j\ V L #�� # Scb insteadof

o _ \ V L # Scb .


As a result,wedefine +m;�+ matrices� Y and � d asadjustmentfactorsto accountfor theseerrors,

andmodify (5.5) to get

� V L #A=�$�� # S ��jNQL # SO� V L #�� # S jN}L # S v =�� Y o _ \ V L # Scb��wL #�� # S o _ \ V L # Scb v � vY =�� d ��L #�� # S � vd f(5.8)

A motion estimationcomponentusingWLSE is addedto the algorithmproposedin Chapter3

to determinethe 3D motion parametersin betweentwo pairsof successive imageframes. The

predictionerrorcovariancedefinedin (3.6) is replacedby (5.8) to accommodatefor error in the

motion estimates jN}L # S and jR�L # S . The next sectionpresentsthe resultof this extensionin the

syntheticproblem.

5.2 Resultsof Incorporating Motion Estimation

It was shown in Chapter4 how an accurate3D motion model contributes to the accuracy of

reconstructed3D featurepointsusinga stereoimagesequence.A simulationtestwasconducted

to show the resultsof incorporatingmotion estimationinto the incremental3D reconstruction

algorithmasdiscussedin the previous section. For comparisonpurposes,we usedexactly the

samedatapoints,motion, andmeasurementnoisewith covariance�,��! asthoseusedin the

experimentspresentedin Chapter4.

For theexperimentsin thissection,theadjustmentfactors� Y and � d mentionedpreviously were

empiricallydeterminedas

� Y �� d �� G GG � GG G $

f

The valuesfor thesefactorsaffect how well featurescan be tracked throughoutthe sequence

whenthemotion estimatesareinaccurate.In Figure5.2, the predictedmeasurementsandtheir

uncertaintyregionsfor many featurepointsbeingtrackedareshown in oneframeof thesynthetic

sequence.In thisexample,sincethemotionestimatesareinaccurate,thepredictedmeasurement

locationsarevery far from thetruelocationsof thepoints. Thedeterminationof theadjustment


Figure5.2: An illustrationof how featuretrackingis affectedby theuncertaintyof predictions.

factorsis a trade-off betweencreatinguncertaintyregionsthataretoo large andlosing track of

certainfeaturepoints.In thiscase,thetrackfor theleftmostfeaturepoint in theimageis lost.

In the first experiment,we examinemotion estimationusing2D trackingonly. The 3D point

correspondencesestablishedsolely by 2D trackingwereaccumulatedandusedto estimatethe

motionparameters.For this first experiment,we will not incorporatethemotion estimatesinto

the trackingalgorithmyet. Figure5.3 shows theaccuracy of themotion estimatesover twenty

frames.

Theerrorin rotation, �NQL # S , is definedas

�NQL # S � [ NQL # S�^ jN}L # S [andtheerrorin translation,�{ L # S as

�{ L # S � [ RkL # SZ^ jRkL # S [ Pwhere

[��[is theEuclideannorm. It canbeseenthatasthenumberof 3D pointcorrespondences

increase,theaccuracy of themotionestimatesimprove.


0 2 4 6 8 10 12 14 16 18 200

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Frame number

Estimated rotation error standard deviationRotation error

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

300

350

400

450

500

Nu

mb

er

of p

oin

t co

rre

spo

nd

en

ces

(a) ��5� �� , andnumberof point correspondencesvs. framenum-ber

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

Frame number

Estimated translation error standard deviationTranslation error

0 2 4 6 8 10 12 14 16 18 200

50

100

150

200

250

300

350

400

450

500

Nu

mb

er

of p

oin

t co

rre

spo

nd

en

ces

(b) �� , andnumberof pointcorrespondencesvs. framenumber

Figure5.3: Resultsof motionestimation.


Figure5.3alsoillustratestheaccuracy of theerrorcovariances,��wL #�� # S and

��L #�� # S , estimated

using(5.7). �� L #�� # S and �� L #�� # S approximatetheerror standarddeviation of themotionesti-

mates,where

��L #�� # S � Trace_ ��L #�� # Scb�P

��Z��L #�� # S � Trace_ �Z��L #�� # Scb f

Theapproximatederrorstandarddeviationsshown in thefiguresdoseemto reflectthedistribution

of theerrorquitewell.

UsingFigure5.3asaguide,it wasdeterminedthatthe3D motionestimatesareaccurateenough

to beincorporatedinto themodified3D motionmodelin (5.3)afterframe6.

In thesecondexperiment,pointfeaturetrackingis switchedfrom usingatwo-dimensionalmodel,

asdescribedin Section3.3,to usingthemodifiedthree-dimensionalmodel,asdescribedin in the

previoussection,afterframe6 haselapsed.

The resultsof combiningboth 2D and 3D tracking to the reconstructionalgorithm are illus-

tratedin Figure 5.4. Comparedto the resultsin Figure 4.9, which were generatedusing 2D

trackingalone,theaccuracy of thereconstructedpoint featuresin Figure5.4hasbeenimproved.

Figure5.5(a)is a summaryof the resultsin this experiment.Thesummaryof resultsfrom Fig-

ure4.10for the2D trackingcasehasbeenduplicatedin Figure5.5(b)for easeof comparison.The

numbersof reconstructedpointsat eachframeasa resultof using3D motionestimatesfor fea-

turetrackingareslightly higherthanthoseof using2D trackingalone.Thesefigureseffectively

demonstratetheadvantageof usingan incrementalapproachto reconstructionasmoreaccurate

motionestimatesareavailablefor moreaccuratefeaturetracking.

5.3 Adding NewFeatures

In thesimulationexperimentspresentedin Chapter4, all of the featurepointswereassumedto

bevisible in all of theframesthroughoutthestereoimagesequence.This is of coursenot going

to bethecasefor realimagedatafor thefollowing reasons:


FrontView TopView

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�F$ #E�F$

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E�$"G #��F$"G

−600 −400 −200 0 200 400 600 800−500

−400

−300

−200

−100

0

100

200

300

400

500

X (mm)

Y (

mm

)

−800 −600 −400 −200 0 200 400 600 8002100

2200

2300

2400

2500

2600

2700

2800

2900

3000

X (mm)

Z (

mm

)

#E��"G #��F�"GFigure5.4:3D pointsreconstructedusingcombined2D and3Dfeaturetrackingandmeasurementnoise ��$ . ( H ) is groundtruth, ( I ) is reconstruction.


0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

Frame number

Nu

mb

er

of p

oin

ts


(a)Summaryof resultsfor combined2D and3D trackingandnoise 74@¡ .

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

Frame number

Nu

mb

er

of p

oin

ts


(b) Summaryof resultsfor 2D trackingexperiment(sameasFigure4.10).

Figure5.5: Comparisonof reconstructionresultsbetweenusingandnotusing3D motionestima-tion.


H Dependingonthecharacteristicsof thefeatureextractorandthecurrentreflectanceproper-

tiesof theobject,theselectionof featurepointsextractedby thefeatureextractorwill vary

from frameto frame.Thereforenotall featureswill beconsistentlypresentthroughoutthe

wholesequence.

H As theobjectundergoesmotion,somepartsof theobjectbecomeself-occludedwhile oth-

ers re-appearfrom self occlusion. New featuresthat werepreviously not visible during

the initialization stageof the algorithmwill appearanytime during the executionof the

trackingandreconstructionalgorithm.

In additionto thesetwo cases,it shouldalsobenotedthatthefeaturetrackerspresentedin either

Section3.3 or Section3.4 are not perfect. They may lose track of somefeaturepoints even

thoughthey existedin previousimageframes.In orderto maintaina dense3D representationof

theobjectthroughoutthewholeimagesequence,it is importantto initialise new tracksfor both

thenewly appearedfeaturesandthepreviously lost features.

For any 2D imagefeaturepoint appearingin the left imageframeat frame # , if theredoesnot

exist any stereomatchinghypothesisthat is madeup of thatpoint, it is considereda new feature

point. If epipolarconstraintsaresatisfied,new stereomatchinghypothesesconsistingof that

point andotherimagepointsfrom theright imageframesarecreated.

Usingthenotationpresentedin Chapter3, new tracksarecreatedusingthefollowing criterion:

For each¢Z£¤ L # SOP if ¥8¦ L # P�§�P]¨�S�©8¨�P create¥8¦ L # P�§�P]¨�S if ª¬« LO§�P]¨�S �F f

By usingthis strategy, lost featureswill consistentlybereplacedandnew featuresthatappearin

thescenewill alsobeconsideredfor reconstruction.Theextensionwill beappliedto processa

realimagesequence.

5.4 Real Image Sequence

In this section,we describea 3D reconstructionexperimenton a real stereoimagesequence

capturedin a laboratoryenvironment.Theimageswereprovidedby MacDonaldDettwilerSpace

andAdvancedRoboticsLtd.


A modelof a typical dockinginterfacefor spacemodulesis seatedon the endof a robot with

six degreesof freedom.A pair of Pulnix CCD cameraswith an imageresolutionof ®"�"G¯;:.�°"Gpixels is mountedon a stationaryplatformsomedistanceaway from the robot. Currently, only

theobjectmodelcanbemovedaround,but eventuallythecameraswouldalsobeallowedto move

whenasecondrobotis available.Figure5.6shows therobotwith theobjectmodelon theend.

Thecameraswerecalibratedby MDR andtheparametersareshown in AppendixA. Theeffects

of opticaldistortionarenegligible, thereforethedistortionfactorhasbeenignoredin computing

thestereosystem’s epipolargeometry.

We selecta sequenceof twentysuccessive stereoimagepairsandapply cornerdetectionon all

forty imagesusingthealgorithmdescribedin [26] 1. In areal–timeapplication,featureextraction

would actuallybe carriedout for eachstereoimagepair asit becomesavailable. However, for

experimentationpurposes,thirty of themostprominentcornerfeaturesareextractedfrom eachof

theimagesin thewholestereosequenceasapre-processingstep.Theidentifiedfeaturelocations

will serve asinput to theincrementalreconstructionalgorithm.

Threestereoimagepairsfrom thesequence,alongwith theextractedfeaturepoints,areshown

in Figure5.7. Becauselarge regionsof the imagesareblack, they have beencroppedto a size

of �"°"G±;��"°"G pixels for betterpresentation.As the imagesshow, the amountof objectmotion

during the twenty elapsedframesis quite significant. As a result,many of the featurepoints

eitherbecomeoccludedor disappeareddueto the featureextractor’s inability to detectthemin

differentlightingconditions.Thismayposechallengesfor longtermtracking.Adjustmentsto the

parametersof thefeatureextractoror choosingadifferentonemaycorrectthisproblem;however,

comprehensivestudiesonfeatureextractionis outsidethescopeof thisthesis.Furthermore,many

featurepointsarisefrom shadows andspecularreflection,which oftendo not conformto stereo

epipolargeometryor rigid bodymotion.Wecanseethatin arealimagesequencesuchasthis,an

actualfeaturemaynot bevisible or detectedin boththeleft andright imagesat thesameframe.

Hencewe shouldbeexpectingto seemany of thesepointsrejectedfrom thelist of reconstructed

3D points.

Theexperimentonthissequencewill examinetheperformanceof theincrementalreconstruction

algorithmusingtwo-dimensionaltrackingonly. Themotionmodelerrorcovariancesareempir-

1An implementationof this cornerdetectorandits associatedKLT FeatureTracker [56] is publicly availableathttp://robotics.Stanford.EDU/ ² birch/klt


Figure5.6: Theset-upfor capturingarealstereoimagesequence.Theobjectto bereconstructedis mountedtheendof a robot.



(a) 35476 (b) 35476

(c) 354:6´³ (d) 35476�³

(e) 354:9´³ (f) 35479�³Figure5.7: Threeout of the twenty stereoimagepairs in the real imagesequence.Extractedfeaturepointsareshown aswhite dots.


ically determinedby visual inspectionas $"�"! , µ"! and ! respectively in pixel units for thezero,

first, andsecondorderestimators.

For theexperiment,stereomatchinghypothesesaregeneratedfrom thefeaturepointsat frame1.

Thesepointsaretrackedover thetwentyframeswhile thestereomatchinghypothesesaretested.

Featurepointsthatdo not constituteexisting stereomatchinghypothesesat eachframeareused

to replacetheonesthathave beenlost or occluded.Figure5.8shows thereconstructedpointsat

threedifferenttime frames.Sincethenumberof pointsis few, it wouldbedifficult to convey any

senseof shapeby plotting thepointsalone.In orderto demonstratetheresultsmeaningfully, the

reconstructedsetof pointsareprojectedbackontotheir respective images.

In the first pair of images(Figures5.4, 5.4), it canbe seenthat someof the extractedfeature

points,asshown in Figure5.7, that seemto be obvious matchesarenot reconstructed.This is

becausetheonly constraintusedfor matchingat this point is theepipolarconstraint.Therefore

someof the featurepoints in the left imagemay have several matchingcandidatesin the right

image.Moreover, asmentionedbefore,many of theextractedfeaturepointsresultfrom shadows

andspecularreflection,whichmaynot satisfytheepipolarconstraint.

An interestingthing to noteis that in Figures5.8(c)and5.8(d),two featurepointsthatoriginate

from the robot becomereconstructed.Thesetwo points do not belongto the object of inter-

estandcertainlydo not have motion consistentwith thosebelongingto the object. Thesetwo

reconstructedpointswould becomeoutliersif they areusedfor motionestimationpurposes.

A summaryof thenumberof activestereohypothesesandthenumberof reconstructedpointsover

the twenty framesis presentedin Figure5.9. Thenumberof active hypothesesdo not decrease

over time. Onepossibleexplanationis thatmany featurepointsdisappearfrom frameto frame

andarenotpresentlongenoughfor themto betrackedconsistently. Henceateachframe,features

aretreatedasif they have newly enteredtheimagessuchthatadditionalhypothesesarecreated.

The numberof reconstructedpoints in this experimentis very low comparedto the resultsof

the syntheticdata. Therearealsosomemismatches.This may suggestthat stereoandmotion

constraintsalonearenot sufficient to resolve many of thematchingambiguitiesin a real image

sequence.



(a) 38476 (b) 35476

(c) 35476�³ (d) 3¶4:6´³

(e) 35479�³ (f) 35479�³Figure5.8: Resultsof reconstructionusingreal imagesequencewith replenishingfeatures.Re-constructed3D pointsareprojectedbackontotheleft andright images.


0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

Frame number

Nu

mb

er

of p

oin

ts

Active hypothesesReconstructed pointsExisting features

Figure5.9: Summaryof resultsfor therealimagesequence.


5.5 Conclusion

The experimentsdiscussedin Section5.2 usingsyntheticdatacombinemostof the important

elementsof thework in this thesis.Theresultsarequiteconvincingandsuggestthattheconcepts

presentedso far are feasibleandareworthwhile for investigation. Of course,nearlyan ideal

situationhasbeencreatedusingthe syntheticdatabecauseall of the featuresarevisible at all

timesandthemotionof theobjectconforms.tothemotionmodelusedfor thecomputation.When

appliedto arealimagesequence,theresultsarenotassatisfactory, whichsuggestthatthereis still

a lot of roomfor improvementin orderfor theincrementalreconstructionalgorithmto beusable

in a real-timeapplication.A moredetailedanalysisof thelimitationsandpossibleimprovements

of thecurrentwork will beprovidedin thenext chapter.

Chapter 6

Conclusions

Thischaptersummarisesthemajorcontributionsof this thesisandidentifiesthelimitationsin the

currentwork. A setof possibilitiesfor futurework is listed.

6.1 ThesisAchievements

In this thesis,I have proposedanincrementalalgorithmfor usinga sequenceof stereoimagesto

reconstruct3D featurepointsbelongingto a rigid object.

Whatmakestheproblemof 3D reconstructiondifficult is themerefactthatit encompassesmany

differentindividual subproblems,from featuretrackingto stereomatchingto motionestimation,

all of whichhavenoperfectsolutions.It is alwayselegantto addressoneissueatatime,however,

in arealworld application,all of theseissuesexist andhaveto beconsidered.In thisthesis,I have

demonstratedhow thesesubproblemsaresolved simultaneouslyin oneunified framework. The

approachis incrementalin thesensethattheiterativeprocessof stereomatching,featuretracking,

andmotionestimationenableseachsubproblemto provide moreconstraintsfor eachotherand

enhancesthesolutionprocess.Althoughthework in this thesisfallsshortof aproductionsystem

thatcouldbeusedin areal-timeapplicationin termsof accuracy andspeed,thetheoreticalframe-

work hasalreadybeenestablishedandfuturework becomesamatterof enhancingtheindividual

componentswithin that framework. The feasibility andpotentialof theproposedalgorithmhas

91

CHAPTER6. CONCLUSIONS 92

beenprovenby theresultsshown in Chapter5.

Thefollowing list highlightssomeof theothercontributionsmadeby this thesis:

H Featurematchingis specificallyaddressedandintegratedinto the system,asopposedto

beingignoredasin somepreviouswork [65, 67].

H A lot of pasteffort is aimedatsolvingthefeaturecorrespondenceproblemcompletelyin a

singlepairof images;we do this in anincrementalfashion.

H Most pastresearchusevery complex, computationallyintensive stereomatchingmethods

suchasdynamicprogramming[39] or graphmatching[15]; sinceonly stereoandmotion

constraintsareusedin theproposedmethod,wetookadvantageof thespeedof theKalman

filter for dynamicestimationof a low dimensionalproblem.Processingspeedis very im-

portantin realtimeapplications.Thecurrentimplementationin Matlab1 requiresaboutfive

secondsfor processingeachpair of stereoimages.It is believedthatanimplementationin

acompilerlanguagesuchasC wouldgreatlyimprove thespeedof thecurrentsystem.

6.2 Limitations and Futur eWork

Not too surprisingly, thework in this thesishasby no meansaddressedall theissuesnor doesit

provide an idealsolutionto the3D reconstructionproblem.Many detailshave beenconsidered

but areintentionallyomitted in this thesisbecausethey requiremoreextensive researchthat is

not feasiblegiventhetime constraintson this thesis.

Thefollowing list of potentialfutureresearchis notmeantto beexhaustive; however, it represents

the more significanttasksrequiredbeforea productionquality incremental3D reconstruction

systemcanbeachieved.

Local feature matching: The approachthat this thesishas taken for featurematchingis to

only useepipolarandmotion constraintsto evaluatepotentialmatchingcandidates.The

strengthsof this approachare that it can generallybe appliedto any typesof features,

1A high level, interpretedprogramminglanguagefor technicalcomputingand visualizationdevelopedby TheMathWorksInc.


avoiding thepitfalls whenlighting conditionsandgeometricdistortioncausethe features

to look very different,andthecomputationalcostis very low. However, asthe resultson

thereal imagesequencein Chapter5 suggest,this approachis alsooneof theweaknesses

of thework in this thesis.Someof thefeaturesthatcaneasilybematchedby, for example,

templatematching[35, 36], arenot distinguishedby the currentapproachandtreatedas

ambiguities.Sinceonly featurepointswith only onematchingcandidatearereconstructed,

many of thesematchingambiguitiesarecarriedforward to future framesashypotheses.

If we employ someotherexisting local matchingtechniquessuchasareacorrelation,as

describedin Chapter2, thecorrectmatchingratewould probablyincrease.To keepcom-

putationalcostdown, oneoption is to integratea simpleandcrudematchingalgorithm

to the existing framework, which would narrow down the numberof hypothesesandthe

remainingambiguitiesmaybeleft for themotionconstraintsto resolve.

Global constraints: Currently, no globalconstraints,suchasuniquenessanddisparitysmooth-

ness[4, 5] areimposedonthestereomatchingcomponentof thework. Onepossiblesimple

extensionis to enforcetheone-to-onerelationshipbetweenfeaturesin theleft andtheright

images.In the currentset-up,eachfeatureon the left imageis ensuredto have only one

matchingfeaturefrom the right. However, the reverseis not imposed.Someof the fea-

turemismatches,asseenin Chapter5, mayhave beenavoidedif wealsoenforcethateach

featurepoint in theright imagehasonly onematchfrom theleft image.

Robust motion estimation: Although the goal of the researchin this thesisis not to acquire

accuratemotion estimatesfor the objector camerasin the scene,dueto the dependency

of thetrackingalgorithmson theseestimates,a morerobustmotionestimatoris crucial to

furtherrestrictthesearchregionsfor point featuresin thenew frames.

Sincethe3D motionestimationalgorithmusedin this thesisperformsleastsquaresfitting

of reconstructed3D featurepoints,outliersin thedatasetswould degradetheaccuracy of

the motion estimates.The strict requirementin the incrementalalgorithmthat only fea-

ture pointswith only onestereomatchinghypothesisbe reconstructedreadily eliminates

someof the possibleoutliers. However, undersomecircumstances,mismatchesarestill

possible,leadingto the creationof incorrect3D featurepoints. Moreover, asseenin the

experimentsinvolving realimagedata,someof thereconstructedfeaturepointsdo notbe-

long to therigid bodyundergoingmotion. In thesecases,anotheroutlier rejectionscheme,

for example,aRANSAC typeapproach[80], is necessaryfor robustmotionestimation.

Sameframe iterati ve estimation: 3D reconstructionin theproposedframework involvesesti-


matingmotionanddepthiteratively, with theresultsof eachprocessbootstrappingtheother

atsuccessive imageframes.Thismethodis fastbut theaccuracy of estimatesat thecurrent

framecanfurther be improved if this iterative processis alsoperformedwithin thesame

pair of imageframes. In [66], this techniqueis usedwithin the context of a monocular

sequence,andcouldeasilybeextendedto a stereosequence.In otherwords,motionesti-

matesarisingfrom the reconstructed3D pointsin an imagepair canbeusedto constrain

matchingof other featuresin the samepair, which in turn would generatebettermotion

estimates.In real-timeprocessing,this proceduremayberepeatedwithin thesameimage

pairanumberof timesdependingonthevideoframerate,motionof object,computational

speed.

Motion limitations: As someresearchers,for instance,WaxmanandDuncan[70], havenoticed,

thereis a setof motionsfor which a stereoimagesequenceis ineffective in disambiguat-

ing stereofeaturecorrespondences.In thecontext of thework presentedin this thesis,the

multiple hypothesisstereoandtemporalmatchingalgorithmwill noteliminatefalsestereo

matchesfor thesekinds of objectmotion. Oneexampleis puretranslationparallelto the

epipolarlines of the stereocameras.Anotherwould be pure rotationalong the object’s

vertical axiswhenepipolarlineshorizontal. A moredetailedanalysiswill be requiredto

identify this setof motionsto betterunderstandthelimitationsof theproposedreconstruc-

tion method.

Criteria for using 3D motion model: Onelimitation in thecurrentwork is thatweneedto em-

pirically determinewhenthemotionestimatesis sufficiently accurateto beusedin athree-

dimensionalmotion model. Ideally, this processshouldbe doneautomaticallywithout

manualcalibration.Oneadhocsolutionis to setanabsolutethresholdfor themotiones-

timateuncertaintyunderwhich three-dimensionalfeaturetrackingwill commence.How-

ever, establishingareasonablethresholdwouldagainrequiremanualcalibration.Ideally, a

motionestimateis sufficiently accuratewhenduringtracking,theuncertaintyregionof any

predictedfeaturedoesnot overlapwith featuresotherthanthetruematch.This condition

will requirea criterion that is basedon themotionestimate,its uncertainty, the locations

of the features,and the distancesamongthe different features. Someefforts have been

initiatedto explorethis option,but furtherinvestigationis required.

Motion modelling: Thecurrent3D motionmodelin Section3.4 assumesthat thereis a single

motion that is definedwith respectto the cameraframeof reference.This modelwould

besufficient if only thecamerais in constantmotion,but is inadequateif theobjectin the


scenehasmorecomplicateddynamicssuchasprecession.Perhapsa bettermotionmodel

suchastheLCAM modelmentionedin Section2.5shouldbeused.

Misseddetectionand occlusion: Oneof themajorchallengesof onlinefeaturetrackingis deal-

ing with thesporadicappearanceanddisappearanceof featurepointsin thesequencedue

to eitherfailureof thefeatureextractionprocessor temporaryocclusion.Althoughby re-

plenishinglost features,asdescribedin Section5.3, we cansuccessfullyinitialise a new

track for a featurethathasre-appearedaftera shortperiodof absence,a continuoustrack

will increasethecertaintyof thatfeaturepoint’s 3D position.Batchprocessingapproaches

suchas[58] have theadvantageof “looking into the future” andinterpolatethe locations

of featuresthatarelikely to have disappearedtemporarily. For onlineprocessing,thereare

threequestionsthathave to beanswered:

1. How to determinewhetherthedisappearanceof a featureis the resultof missedde-

tectionor temporaryocclusion,or it beinga falsefeature?

2. How many moreframesdo we have to wait beforewe stoptrackinga disappeared

feature?

3. How to predicta feature’s locationin thenext frameif it is not visible in thecurrent

frame,andhow to determinetheprediction’s uncertainty?

[63] suggestsa statisticalmeasurecalledthesupportof existencewhich addressesthefirst

two issuesof the above list, but it hasno mentionof the last one. Sincethe goodnessof

fit criteria (3.12)developedin Section3.5 is inspiredby thework in [63], someattempts

weremadeto adoptthesupportof existencemeasureinto thework in this thesis.However,

more investigationandexperimentationis still required. Anotherpossibility is to assess

the actualuncertaintyof the missingfeaturepoint’s 3D location to answerthe first two

questions.Onesolutionthat wasconsideredfor the last questionin the list is simply to

usethe3D motionestimateto extrapolatethefeaturepoint’s location,which maybecome

increasinglyuncertainif thefeaturecontinuesto beundetected.

Complete +"·"Gg¸ reconstruction: Oneof the original motivation for developingan incremental

reconstructionframework wasit’s potentialfor constructinga complete+"·"G ¸ representa-

tion of an objectusinga stereoimagesequencealonewithout the separatestepof view

registration.This would requireaccuratemotionestimatesandrobustmethodsof dealing

with occlusion,sothat thefeaturepointscanbetrackedwith high levelsof certaintyeven

if they areoccludedby theobjectitself, until they re-appearagainfrom behindtheobject.


Although the resultspresentedin this thesisarefar from achieving this goal, it is still an

interestingdirectionto pursue.

As an endingnote,onelessonthat hasbeenlearnedby the endof this thesisis that regardless

of the amountof researchthat hasalreadybeendonein the pasttwo decades,the problemof

3D reconstructionfrom digital imagesstill remains,to someextent,unsolved. Therearemany

issuesandpossibilitiesyet to be consideredandexplored. From readingexisting literatureand

personalexperience,it hasbeenobserved that it is difficult to developa computervision system

that is completelyautonomouswithout any humancalibrationof parametersor otherkinds of

intervention. It becomeshardnot to marvel at thecomplexity of thehumanvisualandpercep-

tual systemswhich, for mostpeople,canaccomplishmany of thetasksmentionedin this thesis

without muchconsciouseffort. Nevertheless,the currentbenefitsof computervision in many

applicationswarrantthecontinuoussearchfor bettersolutions.Hopefully, thesesolutionswould

soonreacha level of robustnessfor commonusein theaerospaceindustry.

Appendix A

CameraParametersFor Simulations

and Experiments

TableA.1 andTableA.2 liststheintrinsicandextrinsicparametersof thecamerasystemsusedfor

generatingthesyntheticandcapturingthe real imagesrespectively. Theparametersaredefined

asfollow [1]:

¹ : radialdistortionL $"º mm

d S#�» : focal length(mm)L ��¼ P]½ ¼ S : principalpoint (pixels)LO¾À¿kP�¾ÂÁkS

: effective pixel sizes(mm)

p}Ã : camerarotationmatrix

rÂÃ : cameratranslation(mm)

97

APPENDIXA. CAMERA PARAMETERSFORSIMULATIONSAND EXPERIMENTS 98

Left camera Right camera¹ 0 0

#�» 12.453114 12.492101L ��¼ P]½ ¼ S (334.989980,239.873079) (328.778333,216.353769)LO¾ ¿ P�¾ Á S(0.012222,0.013360) (0.012222,0.013360)

p�Ä$ G GG $ GG G $

$ G GG $ GG G $

Å Ä®"µG

�"µ"°�. f $�.�$"µ"·"�

^ ®"µG

�"µ"°�. f $�.�$"µ"·"�TableA.1: Parametersof syntheticcamerasystemusedin thesimulationexperiments.

APPENDIXA. CAMERA PARAMETERSFORSIMULATIONSAND EXPERIMENTS 99

Left camera¹ 4.381229e-04

#�» 12.409334L ��¼ P]½ ¼ S (351.433180,215.273875)LO¾ ¿ P�¾ Á S(0.012222,0.013360)

p�ÄG f l"l"°"l"®"� G f G"G"l"°"+�. G f G�.".��.�µ^ G f G"$"·"G"®"G G f l"°"l"·"$"G G f $�.��"°"®"°^ G f G�.��"+"°"G ^ G f $�.�+�.".�� G f l"°"°"®"µ"$

Å Ä^ $"��.�+ f $"�"·"+"+"µ^ $"$"®"µ f µ"µ"�"$"+"°��.�·"µ f l"+"�"°"+"$

Right camera¹ 4.461283e-04

#�» 12.432781L ��¼ P]½ ¼ S (336.590293,215.389585)LO¾ ¿ P�¾ Á S(0.012222,0.013360)

p�ÄG f l"l"µ"µ"G"G G f G"G"$"l"G"l G f G"l�.�®�.�·^ G f G"$"µ"�"®"µ G f l"°"l"l"µ"µ G f $�.�G"µ"µ"µ^ G f G"l"+"µ"�"· ^ G f $�.�$"+"·"l G f l"°"µ"µ"�"l

Å Ä^ $"�"®"$ f µ"l"°"l"·"°^ $"$"°"� f ·"$�.�G"·"µ�"µ"�"° f +"G�.�·"µ�.

TableA.2: Parametersof PulnixCCD camerasusedin therealimageexperiment.

Appendix B

WeightedLeast SquaresEstimation of

3D Motion

This appendixoutlinesthe algorithm that is usedin this thesisfor estimationthe 3D motion

parametersasmentionedin Section5.1.TheWeightedLeastSquares(WLS) algorithm,proposed

by Wenget. al. [79], is a motionestimationalgorithmthat takesinto accounttheuncertaintyin

eachof theL B P C P D S directionsof the 3D featurepointsthatarereconstructedfrom 2D image

features.

In orderto obtainaccuratemotionparameters,adifferentweightis placedon eachof thecoordi-

natewhencomputingtheleastsquaressolutionto themotionestimationproblemusinginaccurate

3D featurepoints,asformulatedin (5.1).

To simplify notation,wewill use\ V to represent

\ V L # S and\�ÆV to represent

\ V L #Ç=�$ S respectively.

Let È V and È Æ V betheerrorsin j\ V and j\ Æ V respectively, themotionmodelis writtenas

j\ Æ V =�È ÆV � NQL j\ V =1È V S = R� NQL j\ V S = R = N È V ^ È Æ V f

Assumingthat È V and È Æ V areuncorrelated,thecovarianceof theresidualN È V ^ È Æ V is

É V � NQ� V N v = � ÆV P

100

APPENDIXB. WEIGHTEDLEAST SQUARESESTIMATION OF 3D MOTION 101

andfor smallinter-framemotion,É V canbeapproximatedas

É V � � V = � ÆV f

LettingÉ y YV betheweightingmatrix, theWLS objective function to minimize in estimatingthe

motionparametersis

KMLONQPORTS �UVXWZY LON j

\ V = Rw^ \ Æ V S v É y YV LON j\ V = Rw^ \ Æ V S f (B.1)

A closed-formsolutionto thisobjective functionis reportedin [79], andit is basedontheMatrix-

WeightedCentroidCoincidence(MWCC) Theorem.It statesthatif jN and jR minimize(B.1), the

matrixweightedcentroidsof h jN \ V = jRai and h \�ÆV i coincide:

UVXWZY É

y YV L jN j\ V = jR�S �UVXWZY É

y YV j\ Æ V f (B.2)

In orderto decoupleN

andR

andfind a non-iterative solution,we needto expresstheobjective

functionin (B.1) asa linearexpressionof theparametersinN

.

For any rotationmatrixexpressedin termsof its row vectors

N �n v Yn vdn vÊ

P

we defineavector

n�Ën Yn dn Ê


anda +�;�l matrix

o L \ V S Ë\ V Ì ÌÌ \ V ÌÌ Ì \ V

v

suchthat

N \ V �Fo L \ V S n f (B.3)

It follows from (B.2) and(B.3) that

jR �UVXWZY É y

YVy Y UVXWZY É y

YV j\ Æ V ÛVXWZY É y

YVy Y UVXWZY É y

YV o L j\ V S jnËÍs ^ qÎn f

(B.4)

SinceÉ y YV is apositive definitematrix, thereis amatrix Ï V suchthat

ÉMy YV �FÏ vV Ï V fThen(B.1) canberewritten as

K�LONQS �UVXWZY[ Ï V LON j\ V = Rw^ j\ Æ V S [ d P (B.5)

andsubstituting(B.4) asR,

Ï V LON j\ V = R�^ j\ Æ V S �FÏ V L o L j\ V S nÐ=1s ^ q�n ^ j\ Æ V S�FÏ V L o L j\ V SZ^ q S n ^ Ï V L \ Æ V ^ s SËÑp V n ^ r V f


For Ò point correspondences,define p and r as

p�Ëp Yp d...

p Uand rÇË

r Yr d...

r Uf

Minimizing theobjective functionin (B.5) is equivalentto minimizing

Ó L n S � [ L p�n ^ r S [ed Psubjectto theconstraintthat jn representsa rotationmatrix.

In orderto avoid iterative solutions,anintermediatesolutiontN

is first computed:

tN � L p v p S y Y p v r�UVXWZY L o L j

\ V S�^ q S v ÉMy YV L o L j\ V ^ q Sy Y UVXWZY L o L j

\ V S�^ q S v ÉÔy YV L j\ Æ V ^ s S f

Orthonormalconstraintsarethenenforcedby leastsquaresmatrix fitting to computethefinal jN :

minÕ� [ L jN�^ tN}S [ d P subjectto: jN is a rotationmatrix.

A methodof leastsquaresmatrix fitting by meansof quaternionsis describedin [79].

Finally, jR is determinedaccordingto (B.4).

Bibliography

[1] E. TruccoandA. Verri, IntroductoryTechniquesfor 3-D ComputerVision, PrenticeHall,

1998.

[2] O.Faugeras,ThreeDimensionalComputerVision: A GeometricViewpoint, TheMIT Press,

1993.

[3] S. Z. Li, D. P. Mital, E. K. Teoh,andH. Wang,Eds., RecentDevelopmentsin Computer

Vision.2ndAsianConf. on ComputerVision,ACCV’95. InvitedSessionPapers, Springer-

Verlag,1996.

[4] D. Marr andT. Poggio,“A computationaltheoryof humanstereovision,” Proc.RoyalSoc.

LondonB, 204:301–328,1979.

[5] J. E. W. Mayhew andJ. P. Frisby, “Psychophysicalandcomputationalsutdiestowardsa

theoryof humanstereopsis,” Artificial Intelligence, 17:349–385,1981.

[6] W. E. L. Grimson, “Computationalexperimentswith a featurebasedstereoalgorithm,”

IEEE Trans.PatternAnalysisandMachineVision, 7(1):17–34,1985.

[7] T. S. HuangandA. N. Netravali, “Motion andstructurefrom featurecorrespondences:A

review,” Proc.of theIEEE, 82(2):252–268,1994.

[8] C. P. JerianandR. Jain, “Structurefrom motion— a critical analysisof methods,” IEEE

Trans.Systems,Man,andCybernetics, 21(3):572–588,1991.

[9] J.Weng,T. S.Huang,andN. Ahuja, “Motion andstructurefrom two perspectiveviews: Al-

gorithms,erroranalysis,anderrorestimation,” IEEE Trans.PatternAnalysisandMachine

Vision, 11(5):451–476,1989.

104

BIBLIOGRAPHY 105

[10] N. Navab, R. Deriche,andO. D. Faugeras, “Recovering 3D motion andstructurefrom

stereoand2D tokentrackingcooperation,” Proc. Int. Conf. on ComputerVision, 513–516,

1990.

[11] Y. Q. Shi,C. Q. Shu,andJ.N. Pan, “Unified opticalflow field approachto motionanalysis

from asequenceof stereoimages,” PatternRecognition, 27(12):1577–1590,1994.

[12] G. SteinandA. Shashua,“Direct estimationof motionandextendedscenestructurefor a

moving stereorig,” Proc. IEEE Conf. on ComputerVisionandPatternRecognition, 1998.

[13] P.-K. Ho andR. Chung, “Stereo-motionwith stereoandmotion in complement,” IEEE

Trans.PatternAnalysisandMachineVision, 22(2):215–220,2000.

[14] B. Chebaro,A. Crouzil, L. Massip-Pailhes,andS.Castan,“Fusionof thestereoscopicand

temporalmatchingresultsby analgorithmof coherencecontrolandconflictsmanagement,”

Int. Conf. on ComputerAnalysisof ImagesandPatterns, 486–493,1993.

[15] W.-H. Liao andJ.K. Aggarwal, “Cooperative matchingparadigmfor theanalysisof stereo

imagesequences,” Int. J. of Imaging SystemsandTechnology, 9(3):192–200,1998.

[16] A. Y.-K. Ho andT.-C. Pong, “Cooperative fusionof stereoandmotion,” PatternRecogni-

tion, 29(1):121–130,1996.

[17] J.H. Duncan,L. Li, andW. Wang,“Recoveringthree-dimensionalvelocityandestablishing

stereocorrespondencefrom binocular imageflows,” Optical Engineering, 34(7):2157–

2167,1995.

[18] L. Matthies, “Dynamic stereovision,” Tech.Rep. CMU-CS-89-195,Carnegie Mellon

University, 1989.

[19] M. JenkinandJ. K. Tsotsos,“Applying temporalconstraintsto thedynamicstereoprob-

lem,” ComputerVision,Graphics,andImage Processing, 33(1):16–32,1986.

[20] J.-W. Yi and J.-H. Oh, “Recursive resolvingalgorithm for multiple stereoand motion

matches,” Image andVision Computing, 15(3):181–196,1997.

[21] C. TomasiandT. Kanade, “Shapeandmotion from imagestreamsunderorthography:A

factorizationmethod,” Int. J. of ComputerVision, 9(2):137–154,1992.

BIBLIOGRAPHY 106

[22] I. ShimshoniandR. Basri, “A geometricinterpretationof weak-perspective motion,” IEEE

Trans.PatternAnalysisandMachineVision, 21(3):252–257,1999.

[23] C.J.PoelmanandT. Kanade,“A paraperspective factorizationmethodfor shapeandmotion

recovery,” IEEETrans.PatternAnalysisandMachineVision, 19(3):206–218,1997.

[24] L. KitchenandA. Rosenfeld,“Gray-level cornerdetection,” PatternRecognition Letters,

1:95–102,1982.

[25] F. MokhtarianandR. Suomela, “Robust imagecornerdetectionthroughcurvaturescale

space,” IEEE Trans.PatternAnalysisandMachineVision, 20(12):1376–1381, 1998.

[26] C. TomasiandT. Kanade,“Detectionandtrackingof point features,” Tech.Rep.CMU-CS-

91-132,Carnegie Mellon University, 1991.

[27] B. D. LucasandT. Kanade,“An iterative imageregistrationtechniquewith anapplication

to stereovision,” Proc.7th Int. Joint Conf. onArtificial Intelligence, 674–679,1981.

[28] R. Hartley, “Triangulation,” ComputerVision and Image Understanding, 68(2):146–157,

1997.

[29] Q.T. LuongandO.D. Faugeras,“The fundamentalmatrix: Theory, algorithms,andstability

analysis,” Int. J. of ComputerVision, 17:43–75,1996.

[30] N. Ayacheand B. Faverjon, “Efficient registrationof stereoimagesby matchinggraph

descriptionof edgesegments,” Int. J. of ComputerVision, 1(2):107–131,1987.

[31] S.T. Barnard,“Stochasticstereomatchingoverscale,” Proc.DARPA ImageUnderstanding

Workshop, 769–778,1988.

[32] J. Weng, N. Ahuja, and T. S. Huang, “Matching two perspective views,” IEEE Trans.

PatternAnalysisandMachineVision, 14(8):806–825,1992.

[33] U. R.DhondandJ.K. Aggarwal, “Structurefrom stereo— areview,” IEEETrans.Systems,

Man,andCybernetics, 19(6):1489–1510,1989.

[34] T.-Y. Chen,A. C. Bovik, andL. K. Cormack, “Stereoscopicrangingby matchingimage

modulation,” IEEETrans.Image Processing, 8(6):785–797,1999.

[35] B. J.SuperandW. N. Klarquish,“Patch-basedstereoin ageneralbinocularviewing geom-

etry,” IEEETrans.PatternAnalysisandMachineVision, 19(3):247–253,1997.

BIBLIOGRAPHY 107

[36] T. KanadeandM. Okutomi,“A stereomatchingalgorithmwith anadaptivewindow: Theory

andexperiment,” IEEE Trans.PatternAnalysisandMachineVision, 16(9):920–932,1994.

[37] A. Fusiello,V. Roberto,andE. Trucco, “Efficient stereowith multiple windowing,” Proc.

IEEE Conf. on ComputerVision andPatternRecognition, 858–863,1997.

[38] Z. Zhang,R. Deriche,O. Faugeras,andQ.-T. Luong, “A robusttechniquefor matchingtwo

uncalibratedimagesthroughtherecovery of theunknown epipolargeometry,” Tech.Rep.

2273,INRIA, France,1994.

[39] Y. OhtaandT. Kanade,“Stereoby intra- andinter-scanlinesearch,” IEEE Trans.Pattern

AnalysisandMachineVision, 7(2):139–154,1985.

[40] J. Weng, “Image matchingusing windowed fourier phase,” Int. J. of ComputerVision,

11(3):211–236,1993.

[41] D. J.FleetandA. D. Jepson,“Stability of phaseinformation,” IEEETrans.PatternAnalysis

andMachineVision, 15(12):1253–1268,1993.

[42] M. R. M. JenkinandA. D. Jepson,“Recoveringlocal surfacestructurethroughlocalphase

differencemethods,” ComputerVision, Graphics,and Image Processing:Image Under-

standing, 59(1):72–93,1994.

[43] J.Weng,T. S.Huang,andN. Ahuja, “3-d motionestimation,understanding,andprediction

from noisyimagesequences,” IEEETrans.PatternAnalysisandMachineVision, 9(3):370–

389,1987.

[44] B. K. P. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence,

17:185–203,1981.

[45] H.-H. Nagel, “On theestimationof optical flow: Relationsbetweendifferentapproaches

andsomenew results,” Artificial Intelligence, 33:299–324,1987.

[46] A. Verri andT. Poggio,“Motion field andopticalflow: Qualitativeproperties,” IEEETrans.


[47] J. K. Aggarwal andN. Nandhakumar, “On thecomputationof motion from sequencesof

images— areview,” Proc.of theIEEE, 76(8):917–935,1988.

BIBLIOGRAPHY 108

[48] H. ShariatandK. E. Price, “Motion estimationwith morethantwo frames,” IEEE Trans.


[49] J.Philip, “Estimationof three-dimensionalmotionof rigid objectsfromnoisyobservations,”


[50] P. M. Q. Aguiar and J. M. F. Moura, “A fast algorithm for rigid structurefrom image

sequences,” Proc. IEEE Int. Conf. on Image Processing, 125–129,1999.

[51] T. J.BroidaandR. Chellappa,“Kinematicsandstructureof a rigid objectfrom a sequence

of noisyimages,” Proc.Workshopin Motion: RepresentationandAnalysis, 95–100,1986.

[52] G. J. TsengandA. K. Sood, “Analysisof long imagesequencefor structureandmotion

estimation,” IEEE Trans.Systems,Man,andCybernetics, 19(6):1511–1526, 1989.

[53] L. Matthies,T. Kanade,andR. Szeliski, “Kalman filter-basedalgorithmsfor estimating

depthfrom imagesequences,” Int. J. of ComputerVision, 3(3):209–236,1989.

[54] A. AzarbayejaniandA. P. Pentland,“Recursive estimationof motion,structure,andfocal

length,” IEEETrans.PatternAnalysisandMachineVision, 17(6):562–575,1995.

[55] M. S. Grewal andA. P. Andrews, KalmanFiltering: TheoryandPractice, PrenticeHall,

EnglewoodCliffs, New Jersey, 1993.

[56] J.Shi andC. Tomasi,“Good featuresto track,” Proc. IEEE Conf. on ComputerVision and

PatternRecognition, 593–600,1994.

[57] T. Tommasini,A. Fusiello,E. Trucco,andV. Roberto,“Making goodfeaturestrackbetter,”

Proc. IEEE Conf. onComputerVisionandPatternRecognition, 178–183,1998.

[58] I. K. SethiandR. Jain, “Finding trajectoriesof featurepoints in a monocularimagese-

quence,” IEEETrans.PatternAnalysisandMachineVision, 9(1):56–73,1987.

[59] D. Chetverikov andJ.Verestoy, “Trackingfeaturepoints:anew algorithm,” Proc.Int. Conf.

on PatternRecognition, 1436–1438,1998.

[60] Y. Bar-ShalomandT. E. Fortmann,Tracking andData Association, AcadmicPress,1988.

[61] S.D. BlosteinandT. S.Huang,“Detectingsmall,moving objectsin imagesequencesusing

sequentialhypothesistesting,” IEEE Trans.SignalProcessing, 39(7):1611–1629,1991.

BIBLIOGRAPHY 109

[62] I. J. Cox, “A review of statisticaldataassociationtechniquesfor motioncorrespondence,”

Int. J. of ComputerVision, 10(1):53–66,1993.

[63] Z. Zhang,“Tokentrackingin a clutteredscene,” Image andVision Computing, 12(2):110–

120,1994.

[64] I. J. Cox andS. L. Hingorani, “An efficient implementationof Reid’s multiple hypothe-

sis trackingalgorithmandits evaluationfor the purposeof visual tracking,” IEEE Trans.


[65] L. MatthiesandT. Kanade,“The cycleof uncertaintiesandconstraintin robotperception,”

4th Int. Symposiumon RoboticsResearch, 327–335,1988.

[66] N. AyacheandO.D. Faugeras,“Maintainingrepresentationsof theenvironmentof amobile

robot,” 4th Int. Symposiumon RoboticsResearch, 337–50,1988.

[67] G.-S.J. YoungandR. Chellappa,“3-D motionestimationusingsequenceof noisy stereo

images:Models,estimation,anduniquenessresults,” IEEE Trans.Pattern Analysisand

MachineVision, 12(8):735–759,1990.

[68] C. Q. ShuandY. Q. Shi, “On unifiedopticalflow field,” PatternRecognition, 24(6):579–

586,1990.

[69] J. Pan, Y. Shi, and C. Shu, “A Kalman filter in motion analysisfrom stereoimagese-

quences,” Proc. IEEE Int. Conf. on Image Processing, 3:63–67,1994.

[70] A. M. WaxmanandJ. H. Duncan, “Binocular imageflows: Stepstoward stereo-motion

fusion,” IEEETrans.PatternAnalysisandMachineVision, 8(6):715–729,1986.

[71] J.WangandW. Wilson, “3D relative positionandorientionestimationusingKalmanfilter

for robotcontrol,” Proc. IEEE Conf. on RoboticsandAutomation, 3:2638–2645,1992.

[72] C. Harris, “Tracking with rigid models,” Active Vision, A. Blake and A. Yuille, Eds.,

chapter4. TheMIT Press,1992.

[73] T. S. HuangandS. D. Blostein, “Robust algorithmsfor motion estimationbasedon two

sequentialstereoimagepairs,” Proc.IEEEConf. onComputerVisionandPatternRecogni-

tion, 518–523,1985.

BIBLIOGRAPHY 110

[74] K. S.Arun, T. S.Huang,andS. D. Blostein, “Least-squaresfitting of two 3-D point sets,”


[75] S. Umeyama, “Least-squaresestimationof transformationparametersbetweentwo point

patterns,” IEEE Trans.PatternAnalysisandMachineVision, 13(4):376–380,1991.

[76] S. D. BlosteinandT. S. Huang, “Error analysisin stereodeterminationof 3-D point posi-

tions,” IEEE Trans.PatternAnalysisandMachineVision, 9(6):752–765,1987.

[77] S. ChaudhuriandS. Chatterjee, “Performanceanalysisof total leastsquaresmethodsin

three-dimensionalmotion estimation,” IEEE Trans.Roboticsand Automation, 7(5):707–

714,1991.

[78] D. GorynandS. Hein, “On theestimationof rigid body rotationfrom noisy data,” IEEE

Trans.PatternAnalysisandMachineVision, 17(12):1219–1220, 1995.

[79] J.Weng,T. S.Huang,andN. Ahuja,MotionandStructure fromImageSequences, Springer-

Verlag,1993.

[80] M. A. FischlerandR. C. Bolles, “Randomsampleconsensus:a paradigmfor modelfitting

with applicationsto imageanalysisandautomatedcartography,” Communicationsof the

ACM, 24(6):381–395,1981.

Documents

Incremental 3D Reconstruction Using Stereo Image Sequencesocho.uwaterloo.ca/Research/Theses/moyungmasc.pdf · 2001-11-24 · Incremental 3D Reconstruction Using Stereo Image Sequences