9
Dealing with Textureless Regions and Specular Highlights—A Progressive Space Carving Scheme Using a Novel Photo-consistency Measure Ruigang Yang *† , Marc Pollefeys , and Greg Welch Department of Computer Science, University of North Carolina at Chapel Hill Chapel Hill, North Carolina, USA [email protected], [email protected], [email protected] Abstract We present two extensions to the Space Carving frame- work. The first is a progressive scheme to better reconstruct surfaces lacking sufficient textures. The second is a novel photo-consistency measure that is valid for both specular and diffuse surfaces, under unknown lighting conditions. 1 Introduction There has been a considerable amount of work on volu- metric scene reconstruction from multiple views [22, 8, 9, 4, 1, 15, 3]. Most of this work can be considered variations of the Space Carving framework by Kutulakos and Seitz [14]. Under this framework, an initial bounding volume is di- vided into a regular 3D voxel grid, then inconsistent voxels are removed until the remaining voxels are photo-consistent with a set of input images. That is, rendered images of the resulting voxels from each input viewpoint should repro- duce the actual image as closely as possible [22]. Because of the flexibility of the volumetric representa- tion and the elegant treatment of visibility, space carving approaches have been used to achieve strong results on a va- riety of both natural and artificial scenes. However such ap- proaches typically run into difficulty when applied to scenes with textureless or specular surfaces. For one of our driving applications—reconstruction of real surgical procedures for training, such surfaces are the norm rather than the excep- tion. See Figure 1. At the heart of the space carving algorithm is the photo- consistency test to determine whether or not a voxel should be removed. Most methods make this decision based solely on the input image color samples corresponding to visible voxels. In textureless regions, false positives resulting from ambiguities in front of the true surface typically result in extraneous voxels that “fatten” the reconstruction. This ef- fect is particularly pronounced when the model is viewed from an oblique angle, far away from any of the input view- * Currently at the University of Kentucky ([email protected]). This work was supported in part by the NSF grant ISS-0237533, 0121657, and by a generous 2002-2003 Link Foundation fellowship. Figure 1. Top left: our one meter-cubed cam- era rig with eight cameras looking down at a human patient model. Top right: two cam- era images. Bottom: two views of the recon- structed voxel model. The white bounding box shows the initial volume. Each voxel is rendered as a simple point with color. No in- terpolation is performed to fill holes. points. Additional constraints are often applied in an at- tempt to resolve the ambiguity. For example a typical ap- proach in stereo vision is to increase the support of the re- construction kernel. However the accompanying smooth- ing effect undermines a unique feature of these voxel based methods: the ability to reconstruct highly complex shapes. Instead we want to apply additional constraints only when there is ambiguity. To this end, we present a progressive space carving scheme. Starting from a few reliable voxels we incrementally add voxels using photo consistency mea- sures, progressively updated visibility information, unique- ness constraints, and smoothness constraints. In addition, most existing space carving methods assume a scene with Lambertian surfaces. This limitation prevents the application of these powerful methods to scenes with 1

Dealing with Textureless Regions and Specular HighlightsŠA

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Dealing with Textureless Regions and Specular HighlightsŠA

Dealing with Textureless Regions and Specular Highlights—A Progressive SpaceCarving Scheme Using a Novel Photo-consistency Measure

Ruigang Yang∗ †, Marc Pollefeys†, and Greg Welch†

Department of Computer Science, University of North Carolina at Chapel HillChapel Hill, North Carolina, USA

[email protected], [email protected], [email protected]

Abstract

We present two extensions to the Space Carving frame-work. The first is a progressive scheme to better reconstructsurfaces lacking sufficient textures. The second is a novelphoto-consistency measure that is valid for both specularand diffuse surfaces, under unknown lighting conditions.

1 Introduction

There has been a considerable amount of work on volu-metric scene reconstruction from multiple views [22, 8, 9, 4,1, 15, 3]. Most of this work can be considered variations ofthe Space Carving framework by Kutulakos and Seitz [14].Under this framework, an initial bounding volume is di-vided into a regular 3D voxel grid, then inconsistent voxelsare removed until the remaining voxels are photo-consistentwith a set of input images. That is, rendered images of theresulting voxels from each input viewpoint should repro-duce the actual image as closely as possible [22].

Because of the flexibility of the volumetric representa-tion and the elegant treatment of visibility, space carvingapproaches have been used to achieve strong results on a va-riety of both natural and artificial scenes. However such ap-proaches typically run into difficulty when applied to sceneswith textureless or specular surfaces. For one of our drivingapplications—reconstruction of real surgical procedures fortraining, such surfaces are the norm rather than the excep-tion. See Figure 1.

At the heart of the space carving algorithm is the photo-consistency test to determine whether or not a voxel shouldbe removed. Most methods make this decision based solelyon the input image color samples corresponding to visiblevoxels. In textureless regions, false positives resulting fromambiguities in front of the true surface typically result inextraneous voxels that “fatten” the reconstruction. This ef-fect is particularly pronounced when the model is viewedfrom an oblique angle, far away from any of the input view-

∗Currently at the University of Kentucky ([email protected]).†This work was supported in part by the NSF grant ISS-0237533,

0121657, and by a generous 2002-2003 Link Foundation fellowship.

Figure 1. Top left: our one meter-cubed cam-era rig with eight cameras looking down at ahuman patient model. Top right: two cam-era images. Bottom: two views of the recon-structed voxel model. The white boundingbox shows the initial volume. Each voxel isrendered as a simple point with color. No in-terpolation is performed to fill holes.

points. Additional constraints are often applied in an at-tempt to resolve the ambiguity. For example a typical ap-proach in stereo vision is to increase the support of the re-construction kernel. However the accompanying smooth-ing effect undermines a unique feature of these voxel basedmethods: the ability to reconstruct highly complex shapes.Instead we want to apply additional constraints only whenthere is ambiguity. To this end, we present a progressivespace carving scheme. Starting from a few reliable voxelswe incrementally add voxels using photo consistency mea-sures, progressively updated visibility information, unique-ness constraints, and smoothness constraints.

In addition, most existing space carving methods assumea scene with Lambertian surfaces. This limitation preventsthe application of these powerful methods to scenes with

1

Page 2: Dealing with Textureless Regions and Specular HighlightsŠA

specular highlights. Based on the observation that the re-flected colors for most real-world surfaces are co-linear inthe RGB color space [11], we have developed a novel photo-consistency measure that is valid for both specular and Lam-bertian surfaces. This new measure does not require lightcalibration or surface normal estimation, thus can be incor-porated into any existing space carving method to facilitatethe reconstruction of highly specular surfaces.

We have implemented our extensions and tested theframework on a number of data sets. We are encouragedby the results. We are able to reconstruct textureless andhighly specular surfaces, such as those shown in Figure 1.

2 Previous Work

The problem of multi-view reconstruction has receivedsignificant attention during the last few years. In particular,many voxel-based photo-consistency methods have beenproposed. Dyer [10] and Slabaugh et.al. [25] each providecomprehensive reviews of recent efforts in this area.

As mentioned in the introduction, there is considerableprevious work related to space carving methods [22, 23, 8,14, 15]. Typically voxels are traversed in a visibility com-patible order, where only previously committed voxels areallowed to occlude a current voxel. Consider dividing thevolume with a plane that separates the cameras from thescene, and then sweeping that plane from near to far (awayfrom the cameras). Any voxel within the plane cannot oc-clude another. Thus when a voxel v is visited, its visibilityin every input image has been uniquely determined. We canthen use a photo-consistency measure to decide if v shouldbe carved away or retained. A popular choice is to thresholdthe variance of the color samples, collected from v’s projec-tions in all the visible camera images. Assuming a Lamber-tian scene, a large variance implies an “inconsistent” voxel.While these methods are very efficient, they are typicallysensitive to whatever global threshold was chosen for thephoto-consistency evaluation. In practice, as a result of ran-dom noise and quantization effects, a single threshold rarelyachieves optimal results for a complex scene.

To overcome this problem, some researchers have for-mulated the voxel reconstruction problem as an energy min-imization problem. For example, Slabaugh et al. introducedan iterative post-processing approach [24]. They add or re-move surface voxels until the sum of squared differencesbetween each camera image and corresponding model im-age (rendered from the camera’s viewpoint) is minimized.More recently, Kolmogorov and Zabih introduced a graph-cut approach to optimize the volume reconstruction di-rectly [13]. In order to make the optimization tractable theyapproximate the visibility test.

Several probabilistic space carving methods have beenintroduced [9, 4, 1, 3]. Instead of making “hard” deci-sions about voxel existence, these approaches compute aper-voxel probability based on appropriate likelihoods. Intheory such formulations should consider all possible vis-ibility configurations for a voxel. To avoid the combina-

torial search, the visibility tests are typically approximatedbased on heuristics [9, 1, 4] or solved in a stochastic man-ner through hundreds of iterations [3]. We believe that anaccurate treatment of visibility is crucial for a multi-viewreconstruction. Our approach progressively reconstructs avoxel model, typically in a few iterations, with visibilitycomputed deterministically and exactly at each iteration.

Although the use of more sophisticated lighting mod-els was envisioned in the original space carving work [14],almost all existing methods use a photo-consistency mea-sure based on a diffuse (Lambertian) surface assumption.Two notable exceptions are the Surfel (surface element)sampling algorithm by Carceroni and Kutulakos [6] andthe color caching algorithm by Chhabra [7]. The formerdiffers substantially from traditional voxel-based methods.The scene is divided into a very coarse voxel grid, witheach voxel represented as a parametric surface referred toas a surfel. Under calibrated lighting, additional proper-ties such as surface normal and reflectance parameters canbe estimated. Only results from scenes with point lightsources were demonstrated. In practice, light calibrationis not always possible, especially for area light sources. Inthe color caching algorithm, Chhabra tries to characterizethe reflected light from specular surfaces in the color space.While this analysis is very similar to our thinking, it is re-stricted to the case where the reflected light passes throughthe origin of the color cube. This simplification is only validin some very limited cases, such as monochromatic surfacesunder white light. Based on an analysis of the surface colorresponse under the Phong lighting model [19] we arrivedat a general photo-consistency measure, that when condi-tioned on some typical surface–light interactions can serveas a maximum likelihood indicator.

There is also some work in stereo vision literature to re-cover a disparity map in the presence of specular reflec-tions [2, 12, 16, 17]. These methods typically try to firstdetect specular reflections and then reject them as outliersor occluders. A notable exception is the work from Magdaet al. [18], in which they propose techniques to recovershapes with arbitrary surface reflectance properties undercontrolled lighting. Our method treats specular reflectionsas “inliers” and accounts for them inherently, without theneed to control lighting.

3 Progressive Space Carving

Kutulakos and Seitz showed that even without additionalconstraints, the space carving framework will provide thetightest reconstruction using color information alone [14].They called the recovered shape the photo hull. The ideaof using color information alone has its pros and cons. Onthe one hand, in regions with rich textures, arbitrarily com-plex shapes can be recovered. On the other hand, the lackof additional constraints (regularization terms) makes spacecarving more susceptible to image noise and quantizationproblems. In addition, the fattening effect in regions thatlack color variations sometimes can be disconcerting. In

Page 3: Dealing with Textureless Regions and Specular HighlightsŠA

an attempt to preserve the positive and remove the negativecharacteristics, we employ an iterative approach to progres-sively refine the shape estimates. The basic idea is to de-fer decisions about ambiguous voxels until there is enoughsupporting evidence. The refinement includes new photo-consistency measures under an updated visibility configu-ration and local smoothness constraints.Algorithm 1 Pseudo code for a progressive space carvingScheme.Clear visibility mask MI for every image I;for every voxel v {

label[v] = UNKNOWN;weight[v] = 1.0;

}while (true) {

for every UNKNOWN voxel v {s = ∅ ; // s is the visible pixel setfor every image I {

{pi} = v’s projection in Iif (pi visible) s = s ∪ pi

}if (‖s‖ < 2) label[v] = INVISIBLEelse {

score[v]=photo_consistency(s)*weight[v];

}}n = select_consistent_voxels();// no more voxels can be selected, stopif (n == 0) break;update weight;update visibility masks;

}

Referring to Algorithm 1 we outline our progressivespace carving approach. Each voxel can have one of fourlabels, UNKNOWN, EMPTY, SURFACE, INVISIBLE. In thebeginning, every voxel is labelled UNKNOWN, has a unitweight, and is visible in every input image. At each iter-ation, we compute the likelihood for each UNKNOWN voxelas the product of its weight and its photo-consistency score.If a voxel can not been seen from at least two views, itis immediately labeled as INVISIBLE. (Note that we cantraverse the voxels in any order since the visibility config-uration is fixed during an iteration.) When this calculationis complete for all voxels, we find the most unambiguousvoxels, changing their labels to EMPTY or SURFACE. Theselected voxels are then used to update the visibility config-uration and the weights of the other UNKNOWN voxels basedon a smoothness constraint. This process is repeated untilthere are no more UNKNOWN voxels that can be selected.

In the following sub-sections we explain the two centralcomponents of our framework: the “best voxel” selectionprocess and the smoothness constraint. Before proceedingwe want to point out that the smoothness constraint is op-tional. Without it the iterative method can be thought of as aspace “peeling” algorithm. After the first iteration, any sur-face voxels visible in all cameras are selected and removed,

exposing more surface voxels that are partially occluded.These voxels will be selected in the next iteration, exposingvoxels more deeply occluded, etc. After a limited numberof iterations, the reconstructed shape will converge to thephoto hull. In configurations where a plane-sweep visibilityorder exists [22], the maximum number of iterations equalsthe number of voxel planes in the sweep direction.

3.1 Finding the most consistent voxels

Rather than using a simple threshold to decide if a voxelis consistent or ambiguous, we look at the profile of a setof related voxels. A pixel p in an image defines a line ofsight l; l will intersect a set of candidate voxels, denotedas {vi

p} where i is the index to each voxel along the lineof sight. Assuming an opaque scene, at least one voxel in{vi

p} will be the surface voxel reflecting the light that im-aged in p, thus it will have the best photo-consistency scoreif the visibility is solved correctly. Considering the photo-consistency curve of {vi

p}, if there is a single local max-imum, i.e., its consistency value is better than its left andright neighbors (assuming a higher photo-consistency scoremeans better consistency), then the corresponding voxel vis considered consistent and labeled as SURFACE. In addi-tion, any voxel in {vi

p} and in front of v will be labeled asEMPTY, i.e. carved away. If no SURFACE voxel can befound, all voxels in {vi

p} are ambiguous and their occupan-cies are left to be resolved in later iterations. This schemedoes not need a threshold, since we only look for the “best”for each pixel in the input images. It also guarantees theuniqueness constraint, i.e., one pixel only corresponds toone surface voxel.

After the likelihood values for all voxels have been up-dated, we project the resulting voxel grid onto every inputimage. (This step can be accelerated using the graphicshardware.) We then find the best voxel for each pixel in ev-ery input image. A voxel is labelled SURFACE if and onlyif it is the best voxel in all views visible. Once a SURFACEvoxel is declared, it will be projected into visible views tomask the corresponding pixels—these pixels (in v’s foot-print) will not participate in future photo-consistency com-putation, nor will new best voxels be selected for them,i.e., every pixel can only have one corresponding SURFACEvoxel. Once a voxel is labeled as SURFACE, it will not beremoved in subsequent carving.

3.2 Applying a smoothness constraint

One needs to be careful defining smoothness under amulti-view reconstruction framework. First we note thatsmoothness is a view-dependent property. Think about athin sheet of paper: when viewed from the front, the pa-per is smooth everywhere; when viewed from a 90 de-gree angle the sheet barely exists. In any case, we be-lieve it makes sense to use smoothness constraints that favorfrontal-parallel surfaces, since cameras are more likely to

Page 4: Dealing with Textureless Regions and Specular HighlightsŠA

see such surfaces compared to ones at oblique angles. Un-der a multi-view setting, a different surface assumption canbe derived for each input view, but the assumptions needto be consolidated. In this work, we apply a smoothnessconstraint with respect to every input view and the result-ing assumptions are combined in the voxel space, assumingeach one is equally likely and valid.

While there are many possible smoothness constraints,we choose to use the disparity gradient principle becauseof its relevance to the human vision system, its simplicity,and its successful use in stereo algorithms. Before we getinto the details of our formulation, we first present a briefoverview of the disparity gradient principle.

Disparity Gradient Principle Disparity is defined be-tween a pair of rectified stereo images. Given a pixel (u, t)in the first image and its corresponding pixel (u′, t′) in thesecond image, disparity is defined as d = u′ − u. Disparityis inversely proportional to the distance of the 3D point tothe cameras. A disparity of zero implies that the 3D point isat infinity.For two 3D points the disparity gradient can be defined as

DG =

∆d

∆u − ∆d/2

(1)

where ∆u = u2 − u1 and ∆d = d2 − d1. Experiments inpsychophysics have provided evidence that human percep-tion imposes the constraint that the disparity gradient DGis limited to an upper bound. In [5] the limit DG < 1 wasreported. The theoretical limit for opaque surfaces is 2, toensure that the surfaces are visible to both eyes [20]. Alsoreported in [20], is that under normal viewing conditionsmost surfaces are observed with a disparity gradient wellbelow the theoretical value of 2.

Applying the Disparity Gradient Principle Manipulat-ing Equation 1 we can arrive at

‖ ∆d ‖≤ ∆u · DG

1 − DG/2, (2)

This equation tells us that if one pixel’s disparity (depth)is known and DG is limited, then the disparity range ofa neighboring pixel is also limited. The closer these twopixels are, the smaller the disparity variation can be. Thisis why the disparity gradient principle has been used asa smoothness constraint in several stereo matching algo-rithms [5, 20, 28, 27]. However it has not been applied in avolumetric representation. One problem is that the disparitygradient is defined between a pair of images, so it is not di-rectly applicable in a volumetric setting. Here we introducea way to relate the disparity gradient to a 3D voxel grid. Foreach input image, we can assume that there is an imaginaryimage, taken from a parallel viewpoint some distance away,as shown in Figure 2. This distance should be related to theaverage distance between neighboring cameras. Assuminga pixel ~m1 corresponding to the voxel v1, the allowable dis-parity range for ~m2 given a limit on DG can be obtainedfrom Eq. (2). Then we can back-project the disparity range

onto the voxels {vi2} that intersect the line of sight from ~m2

(the slant-fill voxels in Figure 2). A voxel that is both in thedisparity range and in {vi

2} is more likely to be the voxelreflecting light to ~m2. In other word, because v1 is knownto be a surface voxel, the disparity gradient principle tells usthat neighboring voxels are also likely to be surface voxels.

Imaginary

Image Plane

1m

2m

d

0C '

0C

jC nC

1v

Image

Plane

Figure 2. Applying the disparity gradient prin-ciple in a volumetric setting. An imaginaryimage is introduced to define the disparityrange. The back-projection of the disparityrange limits the voxel search ranges. Thegray curve illustrates the weight for voxels.

In practice, there is no need to compute the imaginaryimage since the location of the iso-disparity planes can becomputed directly using d = fb/Z with f the focal length,b the virtual baseline and Z the depth (more precisely theZ-coordinate in a camera centered coordinate frame). Wewant to favor voxels that have the same depth as v1, aswell as allow small possibility that voxels may fall beyondthe disparity range at occlusion boundaries. So we use aweighting function similar to a normal distribution. Let avoxel v2’s projection in image C0 be ~m2; ~m1 is the closestpixel to ~m2 with a known depth. The disparity differencecan be obtained by projecting v2 and v1 into the imaginaryimage C ′

0 or more directly as ∆d = bf(1/Z2 − 1/Z1).The weight for v2 is then given by

Wv2=

1√2πσ

exp(−(∆d/∆u)2

2σ2), (3)

where σ = DG1−DG/2

and ∆u = ‖~m1 − ~m2‖. If we knowor choose to assume some value for σ, it will be possible toformulate the weighting process using Bayes’ rule similarto [28]. If the voxel v2 is visible in multiple images (as itshould be), then the weights from different images can besummed to arrive at a final weight.

4 A New Photo-consistency Measure

Studies in photometry have shown that the reflectedlight (radiance) from many real-world surfaces can be ap-

Page 5: Dealing with Textureless Regions and Specular HighlightsŠA

proximated as the sum of diffuse and specular compo-nents [11, 19]. This can be modeled as

I = diffuse(Ip, Od, N, L) + specular(Ip, R, V ) (4)

where Ip is the intensity of a light source, Od is the objectcolor (albedo), N is the normal vector, L is the lighting vec-tor, R is the reflection vector, and V is the viewing vector.

Now let us examine the change of intensities for a givensurface point under different viewing directions. Withoutlose of generality, we assume that the scene and lighting areboth static when images are taken, i.e., N and L are con-stant. Under this condition, from Equation 4, we can seethat the diffuse term remains a constant from any view di-rection. If the specular effect can be ignored, the color sam-ples from input images will cluster into a point in the colorspace. That is the basis for the original photo-consistencycheck proposed by Seitz and Dyer [23]. To check if a voxelexists or not, we simply need to compute the variance of thecolor samples. We call this the variance measure. A largevariance indicates they are not likely to be from the samesurface point, and thus that voxel should be carved away.

On the other hand, if the specular highlights cannot beignored, then for a broad class of surfaces such as plasticand glass, the reflected light is only modulated by the in-cident light. For these surfaces the color values observedfrom different viewpoints are co-linear in the color space.They form a half-line originating from the diffuse term andextending toward the color of the light Ip. Note that the di-rection of the line is independent of the object color. Thebasis of our photo-consistency measure is to detect such a“signature” in the color space. If the surface is diffuse, itssignature will be a point; if the surface is specular, its signa-ture will be a line. Because we do not have a priori knowl-edge about whether a voxel represents a specular point or adiffuse point, we want to design a measure that is valid forboth cases, while simultaneously providing as much disam-biguating power as possible.

Maximum Likelihood Estimation We assume that acolor sample can be classified in one of three ways: a dif-fuse color cd , an “onset” color co, or a saturated colorcs. Each case has a different a priori likelihood, denotedPd, Po, andPs respectively. We also assume that the colorsamples from the images are corrupted by zero-mean gaus-sian noise with a variance of σ2

s . For simplicity and robust-ness, we assume the color of the light is known. (It can bemeasured by imaging a white object.)

Given an “object” color C, the likelihood of observing aparticular color sample Cj is

p(Cj |C) = max

N(Cj |C, σ2s)Pd,

N(dis(Cj , line(C))|0, σ2s)Po,

N(Cj |Cs, σ2s)Ps

,

(5)where N denotes a normal distribution, Cs is the saturationcolor, usually [1, 1, 1] for normalized RGB images, anddis(Cj , line(C)) is a function to compute the distance be-tween Cj and a ray defined by C and the color of the light.

(Note that we interchange C and I because we are dealingwith color images.)

Thus the maximum likelihood estimation for an “object”color C is

max(m∏

j=1

p(Cj |C)). (6)

The photo consistency “cost” is the residual after maximumlikelihood estimation. The estimated C is assigned as thediffuse color of the voxel. We call this measure the MLEphoto-consistency measure.

Note that in Equation 5, we assume that the distributionalong the line is uniform. This is an approximation derivedfrom the Phong lighting model [19]. If we ignore ambientlight and the atmospheric attenuation factor (i.e., no fog),the Phong lighting model becomes

I = Ip[kdOd(NL) + ks(RV )n] (7)

where kd is the diffuse coefficient, ks is the specular coeffi-cient, and n is the object’s shininess.

Let the angle between the reflection vector R and theviewing vector V be Θ. It is reasonable to model Θ as arandom variable with uniform distribution in its valid range,meaning that a surface point is likely to be viewed from anydirection in the hemisphere. Thus the probability densityfunction of Θ is

fΘ(θ) =

{

1/π, −π/2 ≤ θ ≤ π/2;0, otherwise. (8)

Now we need to find the density function of the randomvariable I . This will tell us how the color samples are dis-tributed along the line. The final result is in Equation 9(details of the derivation can be found in [26]).

fI(i) =

2

π · k1−n

n

n

1−k2

n

· 1

Ipks

, I ≤ i ≤ I + Ipks;

0, otherwise,(9)

where I = IpkdOd(NL) and k = i−IIpks

. In Figure 3, weplot several density curves of I under different shininess(n) settings. In our formulation of our MLE measure, webasically divide the density function into three blocks: cd

corresponds to the left part; co corresponds to the middletransition part, which is relatively flat; and cs correspondsto highlights towards the end of X axis. Note that due tothe limited dynamic range of a camera, the saturated classis likely to include more of the middle flat part.

Simplified linearity test The maximum likelihood esti-mation presented above needs to know the a priori likeli-hoods of the three color sample classes. In case the a priorilikelihood is unknown or inaccurate, we can use a simpli-fied approach by assuming all samples belong to the onsetclass, i.e., Pd = 0, Po = 1, Ps = 0. In addition, we assumethat less than half of the pixel samples are in specular high-lights. Thus we can use the median of the color samplesis the object color C and simply compute the sum of dis-tances to the ray defined by C and the color of the light as

Page 6: Dealing with Textureless Regions and Specular HighlightsŠA

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Normalized Intensity

Pro

babi

lity

Den

sity

n = 10n = 1n = 30

Figure 3. Probability Density Functions underdifferent shininess (n) settings. The X axis isthe normalized intensity value (k), while the Yaxis is probability density with Ipks = 1.

the photo-consistency measure. Note that this simplificationwill fail in smoothly shaded diffuse surfaces with uniformcolors. In this case, everywhere is consistent, similar to theresult of applying a simple variance test to textureless re-gions. In practice, we find it still works well on scenes withmoderate textures. We call this measure the LMF (Line-Model-Fitting) photo-consistency measure. Similarly, if weset Pd = 1, Po = 0, Ps = 0, i.e., only diffuse colors arepossible, then the MLE measure reduces to the standardvariance measure.

Multiple Stationary Lights The above analysis is validfor multiple fixed lights as long as they have the same color.This is true even if their intensities are different or thereare area light sources. If the light colors are different, thenthere will be multiple lines in the color space, one for eachunique light source. In this case, it would be interesting toinvestigate if a multiple-line fitting and clustering methodwould be practical.

Moving Lights and Cameras It is also interesting toconsider data captured on a turntable. Typically in this caseboth the lights and the camera are moving together. If thelights have the same color, the image color samples willform a plane instead of a line in the color space. At thesame time, if the surface is primarily diffuse, then the colorsamples will form a half-line starting from the origin andextending toward IpkdOd. Thus it is possible to reconstructdiffuse scenes under moving lights and moving cameras.We provide some preliminary results using this finding.

5 Implementation and Results

We have implemented our progressive space carvingscheme. We also constructed a capture rig we call the cam-era cube (Figure 1). It consists of eight digital video cam-eras with VGA resolution (640 × 480). All cameras arefully calibrated and synchronized. We are able to captureand store VGA videos at 15 frames/second. Lens distor-tions are removed after capture. At this point our imple-mentation only computes the smoothness constraint for asingle view. This simplification is justified given our cam-era arrangement where all 8 cameras look down at the scenefrom above. In the results shown here, we computed thesmoothness constraint from the vertical direction, lookingdown; similarly, surface voxels were extracted from a top-

down vertical sweep. We further define a robust measure toreject potential outliers: if in the color cube space the av-erage distance/pixel to the color ray is over a threshold, wereject that voxel. We used a very generous number—100levels (assuming 8 bit/channel), and this number was fixedthroughout all experiments. Experiments have shown thatVDPC is not sensitive to this threshold.

Direction of

the light

Diffuse

color

Saturation

point

0 1

1

1

B

GR

Figure 4. Left: Reflected colors from a sur-face point forms a line; due to limit dynamicrange, it becomes three connected line seg-ments in the RGB cube space . Right: Sam-ple density distribution with object color [0.3,0.5, 0.1], σs = 10/255, and a priori probability[0.4, 0.4, 0.2]

We use Powell’s method [21] to optimize the function inEquation 6, assuming the color of light is white (which itis). Note that due to the limited dynamic range provided bythe cameras, the half-line defined by C has three segmentsin general (left in Figure 4). One channel will first saturate,then the second, and finally the last channel. In Figure 4,we show a sample probability density distribution on theright. Similarly, we compute the distance to the three linesegments in the LMF measure.

There are two places where we exploit the computergraphics hardware to accelerate computation. The firstplace is the visibility update. We render each surface voxelas a cube for every input image, thus we can get the exactfootprint to update the visibility mask. The second place isthe computation of the smoothness weight. For each voxel,we need to project it to each input view and find the closestpixel that has a corresponding surface voxel. We use Delau-nay triangulation to create a 2D mesh of marked pixels ineach view, and render the color-coded mesh. The renderedimage serves as a look-up table to find the closest pixel inO(1) time. We found orders of magnitude speedup afterthis optimization.

Page 7: Dealing with Textureless Regions and Specular HighlightsŠA

Experiments We captured and reconstructed a varietyof real-world scenes. Unless otherwise indicated, all werereconstructed at a resolution of 256 × 256 × 128. We firstcaptured a teapot with rich textures (Figure 5) and tested ourphoto-consistency measure without applying the smooth-ness constraint. See Figure 6. Since we know nothing aboutthe surface materials or the scene lighting, except that thecolor of lights is white, we tried different settings of thea priori likelihood (denoted as ~P ) for the MLE measure.With a 0.1 granularity, there are about 50 different combi-nations. The most visually pleasing result is shown in Fig-ure 6(a), where ~P = [0.5, 0, 0.5]. We also show the resultwith ~P = [1.0, 0, 0] in Figure 6(b), which is equivalent tousing the standard variance measure. Figure 6(c) shows theresult with ~P = [0, 1.0, 0], which is very similar to the re-sult from the LMF measure. Comparing to the best one, ithas more stray voxels when viewed from the side.

Figure 5. Three of the eight images capturedsimultaneously by our camera cube (see topleft in Figure 1). They are cropped to showmore details. The teapot roughly took a 300×300 area in every image.

Our second data set consists of a teapot and a book withsubstantial textureless regions (Figure 7). We used the LMFmeasure and applied the smoothness constraint through afew iterations, shown in Figure 8. We stopped at the fourthiteration when newly selected SURFACE voxels were lessthan 2% of the total SURFACE voxels. We compare it withthe result using the variance measure in Figure 9.

In Figure 10, we show a high resolution reconstruction ofa hand. It is computed using the variance consistency mea-sure through four iterations. Note the textureless regionsare nicely reconstructed. We further captured a dynamicsequence in which a surgeon was explaining a medical pro-cedure. The torso model was constructed using the LMFmeasure, while the hand and other moving parts were con-structed using the variance measure. They are compositedtogether and shown in Figure 11.

Our last data set, courtesy of the Mitsubishi Electric Re-search Lab, is a teapot captured on a turntable. The lightwas not static with respect to the teapot. We did not knowthis when we first tried our method. In Figure 12, we showthe results using the LMF and the variance measure. On thetop of the teapot where highlights exist, neither method pro-duces meaningful results. However on the side of the teapotwhere there are virtually no highlights, the LMF measureperforms much better than the variance measure, since un-der moving lights the reflected lights of a diffuse point forma line (not a point) in the RGB color space.

(a) ~P = (0.5, 0, 0.5), vi-sually best from 50 combina-tions of ~P

(b) ~P = (1.0, 0, 0) (same asthe variance measure)

(c) ~P = (0, 1.0, 0)(similar to the LMF measure)

Figure 6. Reconstruction results from data inFigure 5. We used the MLE measure underdifferent a priori assumptions, no smooth-ness weight was applied.

Figure 7. Three of eight images (teapot andbook).

6 Conclusions

We present two major extensions to the Space Carv-ing framework. The first is a progressive scheme to bet-ter reconstruct surfaces lacking sufficient textures. The sec-ond is a novel photo-consistency measure that is valid forboth specular and diffuse (Lambertian) surfaces, withoutthe need of light calibration. We applied our method suc-

Page 8: Dealing with Textureless Regions and Specular HighlightsŠA

Figure 8. Reconstruction results using data inFigure 7. From left to right, we show the pro-gressively refined results after iteration oneto three. We used the LMF consistency mea-sure and set the disparity gradient limit to 0.8.

Figure 9. Comparison of different consis-tency measures. Left column (top and sideviews): LMF measure; right column: vari-ance measure. Both results were obtained af-ter four iterations. All other parameters werekept the same.

cessfully to a variety of real-world scenes.

Looking to the future, we plan to further investigate thephoto-consistency under more complicated lighting condi-tions, such as moving lights with moving cameras, or mul-tiple light sources with different colors. We also want to

Figure 10. Hi-resolution progressive recon-struction on a 512 × 512 × 256 grid using thevariance measure.

Figure 11. A dynamic sequence captured bythe camera cube. A surgeon was explaininga medical procedure. The moving part wasreconstructed and rendered with the staticmodel shown in Figure 1.

Figure 12. Experiment with moving lights.Left: LMF measure; right: variance measure.

explore the possibility of estimating more parameters pervoxel, such as surface normals and surface materials, so asto make “re-lighting” possible.

References

[1] M. Agrawal and L. Davis. A Probabilistic Framework forSurface Reconstruction from Multiple Images. In Proceed-ings of Conference on Computer Vision and Pattern Recogni-tion (CVPR), 2001.

[2] D. Bhat and S. Nayar. Stereo in the presence of specular re-flection. In Proceedings of International Conference on Com-puter Vision (ICCV), page 1086C1092, 1995.

[3] R. Bhotika, D. J. Fleet, and K. N. Kutulakos. A ProbabilisticTheory of Occupancy and Emptiness. In Proceedings of Eu-ropean Conference on Computer Vision (ECCV), pages 112–132, 2002.

[4] A. Broadhurst, T. Drummond, and R. Cipolla. A Probabilis-tic Framework for Space Carving. In Proceedings of Interna-tional Conference on Computer Vision (ICCV), pages 388–393, 2001.

Page 9: Dealing with Textureless Regions and Specular HighlightsŠA

[5] P. Burt and B. Julesz. A Gradient Limit for Binocular Fusion.Science, 208:615–617, 1980.

[6] R. L. Carceroni and K. N. Kutulakos. Multi-View Scene Cap-ture by Surfel Sampling: From Video Streams to Non-Rigid3D MotionShape and Reflectance. In Proceedings of Inter-national Conference on Computer Vision (ICCV), 2001.

[7] V. Chhabra. Reconstructing specular objects with im-age based rendering using color caching. Master’s thesis,Worcester Polytechnic Institute, 2001.

[8] B. Culbertson, T. Malzbender, and G. Slabaugh. GeneralizedVoxel Coloring, volume 1883 of Lecture Notes in ComputerScience, pages 100–115. Springer-Verlag, 1999.

[9] J. de Bonet and P. Viola. Poxels: Probabilistic Voxelized Vol-ume Reconstruction. In Proceedings of International Confer-ence on Computer Vision (ICCV), pages 418–425, 1999.

[10] C. R. Dyer. Volumetric scene reconstruction from multipleviews. In L. S. Davis, editor, Foundations of Image Under-standing, pages 469–489. Kluwer, 2001.

[11] D. Forsyth and J. Ponce. Computer Vision: A Modern Ap-proach, chapter 6, page 119. Prentice Hall, 2003.

[12] H. Jin, A. Yezzi, and S. Soatto. Variational multiframe stereoin the presence of specular reflections. Technical ReportTR01-0017, UCLA, 2001.

[13] V. Kolmogorov and R. Zabih. Multi-camera Scene Recon-struction via Graph Cuts. In Proceedings of European Con-ference on Computer Vision (ECCV), pages 82–96, 2002.

[14] K. Kutulakos and S. M. Seitz. A Theory of Shape by SpaceCarving. International Journal of Computer Vision (IJCV),,38(3):199–218, 2000.

[15] K. N. Kutulakos. Approximate N-view Stereo. In Proceed-ings of European Conference on Computer Vision (ECCV),pages 67–83, 2000.

[16] Y. Li, S. Lin, H. Lu, S. Kang, and H.-Y. Shum. Multibase-line Stereo in the Presence of Specular Reflections. In Inter-national Conference on Pattern Recognition, pages 573–576,2002.

[17] S. Lin, Y. Li, S. B. Kang, X. Tong, and H.-Y. Shum. Diffuse-Specular Separation and Depth Recovery from Image Se-quences. In Proceedings of European Conference on Com-puter Vision (ECCV), pages 210–224, 2002.

[18] S. Magda, T. Zickler, D. Kriegman, and P. Belhumeur.Beyond Lambert: Reconstucting Surfaces with ArbitraryBRDFs. In Proceedings of International Conference on Com-puter Vision (ICCV), pages 297–302, 2001.

[19] B.-T. Phong. Illumination for computer generated pictures.CACM, 18(6):3111–317, June 1975.

[20] S. Pollard, J. Porrill, J. Mayhew, and J. Frisby. Disparity Gra-dient, Lipschitz Continuity, and Computing Binocular Corre-spondance. In O. Faugeras and G. Giralt, editors, RoboticsResearch: The Third International Symposium, volume 30,pages 19–26. MIT Press, 1986.

[21] M. Powell. A Fast Algorithm for Nonlinearly ConstrainedOptimization Calculations. Numerical Analysis, 630, 1978.Lecture Notes in Mathematics, Springer Verlag.

[22] S. M. Seitz and C. R. Dyer. Photorealistic scene recon-struction by voxel coloring. In Proceedings of Conferenceon Computer Vision and Pattern Recognition (CVPR), pages1067–1073, 1997.

[23] S. M. Seitz and C. R. Dyer. Photorealistic Scene Reconstruc-tion by Voxel Coloring. International Journal of ComputerVision (IJCV),, 35(2):151–173, 1999.

[24] G. Slabaugh, B. Culbertson, T. Malzbender, and R. Schafer.Improved Voxel Coloring via Volumetric Optimization.Technical Report 3, Center for Signal and Image Processing,Georgia Institute of Technology, 2000.

[25] G. Slabaugh, B. Culbertson, T. Malzbender, and R. Schafer.A Survey of Methods for Volumetric Scene Reconstructionfrom Photographs. 1, Center for Signal and Image Process-ing, Georgia Institute of Technology, 2001.

[26] R. Yang. View-Dependent Pixel Coloring – A Physically-Based Approach for 2D View Synthesis. PhD thesis, Departof Computer Science, University of North Carolina at ChapelHill, 2003.

[27] R. Yang and Z. Zhang. Eye Gaze Correction with Stereo-vision for Video-Teleconferencing. In Proceedings of Euro-pean Conference on Computer Vision (ECCV), pages 479–494, 2002.

[28] Z. Zhang and Y. Shan. A Progressive Scheme for StereoMatching. In M. P. et al, editor, Springer LNCS 2018: 3DStructure from Images - SMILE 2000, pages 65–85. Springer-Verlag, 2001.